Processing Pipeline - SQS vs RabbitMQ

Apr 2017 - code

part 1

Our data pipeline needs a message broker for a Celery job queue; RabbitMQ and Amazon SQS are two good choices. They are designed to scale, guarantee message delivery, and seem to have good support.

RabbitMQ has a solid reputation as a broker, with diverse and robust features. It works very well with Celery. The only real disadvantage is that it relies on custom configuration and deployment. SQS is a message queue as a service, self-scaling and configurable from a web UI. It also plays well with other AWS offerings. However, it can be pricier and slower at scale, and less flexible to use.

Long term, RabbitMQ seems like a clear winner, but there are more factors to consider:

Maintenance, reliability, security, and scaling would all be taken care of on SQS - making it a very appealing option, provided that 3 assumptions were reasonable:

To confirm, I ran some tests on AWS, swapping out the broker. I had a Flask gateway and a Celery worker on separate EC2 instances, and used Celery to manage jobs. I ran 5 trials for each broker, at 2 different times during the day. Each trial had 300 requests for 1 second jobs.

I had total request time as one metric, using the requests package’s built in profiling method. I also had a metric for total time taken for processing, using worker logs. Agreeing with general concensus, RabbitMQ is distinctly faster in both reading and writing.

Writing/requesting time (s):

Reading/processing time (s):

However in both cases, their uncertainty ranges overlap and other components of the pipeline could easily become more dominant bottlenecks as compared to the brokers themselves.

As a cash-strapped startup, cost was another important factor, and estimates were made with the test results.

Scaling:

Costs:

So we start to see a slight cost advantage for RabbitMQ starting at around 20 million requests. We assume that at smaller scales, SQS will be advantageous (lacking overhead costs), and at larger scales, RabbitMQ wil become more advantageous (using more cost-effective, large instances). Since the scaling estimates indicate that this switch will occur ~11 months from now at the earliest, SQS appears to be a cost-effective choice for a reasonable amount of time in the future.

Since it has been shown that RabbitMQ’s 2 main advantages, better performance and lower costs, will likely not be important factors for at least another year, we have decided to use AWS SQS for now. Actually, one of the main advantages of Celery is that it shouldn’t be too hard to change the broker that we’re using later on.

Rollout results are discussed in part 3.

top