EMR and watch dog: service-nanny?

Want to have a watchdog to start the service if it crashing for any reason?

There are many ways to solve this. Some of them are these. just for the record, the reason i needed it, is b/c i need to start the Spark Thrift service fore JDBC which crashes every-time there is an out of memory.

Solution 1: Linux CRON

Using the good old cron entry which executes a small script in every 5 min(easily customizable).  This checks if this process is there. If not start this process. If you need this Thrift server to be started only in Master node then you can step to do that.You can use Script runner for running a custom script (which is stored in s3) [1]
Advantage of this is simple to code and maintain. Additionally is not dependent on a particular EMR version/service.

Example to create an EMR cluster with script runner step.

aws emr create-cluster –name “Test cluster” –-release-label emr-5.16.0 –applications Name=Hive Name=Pig –use-default-roles –ec2-attributes KeyName=myKey –instance-type m4.large –instance-count 3 –steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[“s3://mybucket/script-path/my_script.sh”]

Solution 2 service-nanny (Note: This is untested)

This solution utilizes the service-nanny which is a service watchdog in all EMR cluster.
Create a service-nanny configuration (/etc/service-nanny/yourservice.conf) This conf file will have some basic info regarding the process. So you can create a conf file. Put this in s3. Download it via step. (If you only want to execute in Master node). Once the files are in place, then restart the service-nanny. You can start and stop service nanny using the command below :

sudo /etc/init.d/service-nanny stop
sudo /etc/init.d/service-nanny start

You can see some sample about service-nanny in this path /usr/lib/service-nanny/example. The possible disadvantage for this would be if EMR decided to remove service-nanny in some future release you may need to fall back to Solution 1.

Note: Solution 2 is untested. So please test it thoroughly before using this in production.

resources:
[1] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

 

—————————————————————————————————–

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s