Where Dynamicity Meets the Cloud

How to do some service discovery on Amazon EC2

Most of the applications/frameworks/application servers usually rely on multicast in order to the discovery of other service instances. Unfortunately multicast does not work on Amazon EC2.

For Elastic Grid, we decided to face this issue from the beginning because we believe discovery of services is a key point in order to ease usage of Amazon EC2. In Elastic Grid, we use an Application Monitor who is in charge of provisioning the applications/services. The monitor provision services on Application Agents who are the receptacle for services.

The agents need to connect with the monitors so that the monitor knows which agents are running and their capabilities. We saw some suggestions on the EC2 forums which we will review shortly and explain what we ended up with for Elastic Grid.

First solution is to use Amazon SDB: each agent inserts in a SDB Domain the IP address it is running from. Then the monitors polls the SQS Domain regularly in order to find out if there is some new agents or if some of them are gone. The first problem with this solution is that your reaction can’t be faster than the polling interval. The second problem is that SDB now becomes another requirement for running Elastic Grid. Finally the worst is what happens if an agent dies? Its “record” in SDB would still be there, so the monitors would need to purge the SDB Domain from “dead records”.

Second solution is to use EC2 launch meta-data: when you ask to start an EC2 instance you can give some launch meta-data that this instance will be able to retrieve at boot time (the process of retrieval of the user meta-data is explained in the Developer Guide). The idea is that you start first the monitor, then retrieve the IP address of that monitor instance. Next, you start some agents using a launch meta-data whose value is the IP address (usually the private IP) of the previously started monitor. That is the strategy most of the solutions available on EC2 use. The problem with this solution is that if the monitor dies (and you restart it somewhere else), how do you make sure the currently running agents will be updated?

Now, back to how Elastic Grid tackles this problem. What we need is a way, at boot-time for an EC2 instance to find out where the other EC2 instances are, and more importantly what kind of “profile” they are (monitor or agent). First of all, have a look at the output we have when we do a DescribeInstances query using the EC2 command-line tools:

RESERVATION	r-05e5286c	154066937112	default
INSTANCE	i-2271a74b	ami-c140a5a8	ec2-75-101-202-32.compute-1.amazonaws.com
    ip-10-251-199-99.ec2.internal	running	eg-gsg-keypair	0		m1.small
    2008-06-30T09:41:43+0000	us-east-1c	aki-a71cf9ce	ari-a51cf9cc

The default which appears in bold above is the name of the security group we used when we launched that instance. What this means is that at any time you call get the list of running instances and know in which security groups they are into.

The solution we came up with is to create some empty security groups (meaning security groups with no rules) used as tags. For Elastic Grid, we use two groups: one called eg-monitor and another one called eg-cybernode (the technical name for an agent).

When we start a monitor, we simply make sure it is started using that security group. Here is for example the shell script for starting an Elastic Grid monitor:

ec2run ami-c140a5a8 -g eg-monitor -g elastic-grid -g default -k eg-gsg-keypair -f ec2params.config

Starting an agent is pretty much the same, except we use the eg-cybernode group instead:

ec2run ami-c140a5a8 -g eg-cybernode -g elastic-grid -g default -k eg-gsg-keypair -f ec2params.config

When the Elastic Grid instance is started, it runs a DescribeInstances command, and scan each instance for its security groups. It the instance running is a monitor, it will only register with the other monitors (they peer themselves for failover). It the instance is an agent, then it will register with all the monitors it find.

Here is the output from DescribeInstances when there is a monitor running and an agent.

RESERVATION	r-05e5286c	154066937112	default,eg-monitor,elastic-grid INSTANCE	i-2271a74b
    ami-c140a5a8	ec2-75-101-202-32.compute-1.amazonaws.com	ip-10-251-199-99.ec2.internal
    running	eg-gsg-keypair	0		m1.small	2008-06-30T09:41:43+0000	us-east-1c	aki-a71cf9ce	ari-a51cf9cc
RESERVATION	r-bae528d3	154066937112	default,elastic-grid,eg-cybernode INSTANCE	i-df71a7b6
    ami-c140a5a8	ec2-75-101-238-91.compute-1.amazonaws.com	ip-10-251-199-131.ec2.internal
    running	eg-gsg-keypair	0		m1.small	2008-06-30T10:00:41+0000	us-east-1c	aki-a71cf9ce	ari-a51cf9cc

In fact this works so well that we use also this solution as a way to create logical cluster. For each cluster, we create a specific security group and only allow traffic to happen from EC2 instances within the same cluster group.

I hope this small how-to will help you, and feel free to post comments about alternatives you may have identified and or shortcomings we may have missed with our solution.

Tags: , ,

2 Responses to “How to do some service discovery on Amazon EC2”

  1. Alfredo Ramos Says:

    LOL

    That is really funny, using the security groups as metadata, what a hack!

    The things that EC2 force developers to do by their lack of features.

    You could probably used SDB (Simple DB) and make each instance to register an entry with a time stamp. The time stamp should be renewed by your instance, say… each 5 minutes. Any other instance interested in finding others could simply query the DB entries whose timestamp is less or o equal to 5 minutes.

    It would be better if AWS were more responsive to their client requests.

    Alfredo

  2. jeje Says:

    Probably a hack, but probably one of the most used solutions I would say.
    Actually our implementation evolved quite a bit since that post (gosh about a year ago ;-)) and the cluster topology can easily change over time without any problem now.

    Your solution is another “hack” ;-) we though of too but we preferred to avoid extra charges to SDB and more importantly to NOT require our customers to sign up on SDB in order to use Elastic Grid.
    Our approach actually is based on leases (we use Jini & Rio underneath) and if an agent/cybernode can’t renew the leases, then the monitor consider them as being dead. The same thing happen the other around. So it’s quite like the idea of updating nodes statuses in SDB except we don’t need to persist that stuff much.

    But if actually the AWS team was able to allow multicast on their network most of those problems would not be anymore ;-)

Leave a Reply


Eco Technology

Elastic Grid, LLC. has adopted as a mantra the idea that any viable business can be done while helping others. So Elastic Grid, LLC. commits to give a percentage of all its benefits for non-profits organizations. Additionally Elastic Grid products will enable users to easily give extra money to those organizations and provide discounts to our customers helping them.

We Can Help

Our cloud-computing services can help you in your next project. Contact Us