Securing Spark JDBC + thrift connection (SSL) @ AWS EMR (demystified)

To secure the thrift connection you can enable the ssl encryption and restart the hive-server2 and thrift service on emr master instance.

Following are the list of step to do so:
1. Create the self-signed certificate and add it to a keystore file using:
$ keytool -genkey -alias public-dnshostname -keyalg RSA -keystore keystore.jks -keysize 2048

Make sure the name used in the self signed certificate matches the hostname (use public dns name since you are connecting from outside of VPC) where Thrift server will run.

2. List the keystore entries to verify that the certificate was added. Note that a keystore can contain multiple such certificates:

$ keytool -list -keystore keystore.jks

3. Export this certificate from keystore.jks to a certificate file:
$ keytool -export -alias  public-dnshostname -file example.com.crt -keystore keystore.jks

4. Add this certificate to the client’s truststore to establish trust from where you want to connect. since you are connecting from local instance, copy the certificate “example.com.crt” to your local instance from emr master node and then import it.

$keytool -import -trustcacerts -alias  public-dnshostname -file example.com.crt -keystore truststore.jks

5. Verify that the certificate exists in truststore.jks:
$keytool -list -keystore truststore.jks

Once the certificate is imported, make the following changes in /etc/hive/conf/hive-xml site.
+++
hive.server2.transport.mode : http
hive.server2.use.SSL : true
hive.server2.keystore.path : path/to/your/keystore/jks
hive.server2.keystore.password : “keystorepassword”
+++

Restart hive-server2 and thrift server
$ sudo stop hive-server2 && sudo start hive-server2
$ sudo -u spark /usr/lib/spark/sbin/stop-thriftserver.sh && sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh

check whether service started successfully and also verify that master instance is listening on port 10001
+++
$ sudo netstat -tulpan |grep 10001
tcp        0      0 :::10001                    :::*                        LISTEN      12494/java
+++

Once service is started then you can make connection using  jdbc driver as below

jdbc:hive2://emr-dnsname:10001/default;hive.server2.transport.mode=http;ssl=true;sslTrustStore=/pathto/truststore.jks;trustStorePassword=”password

Need to learn more about aws big data (demystified)?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s