We at Levvel increasingly find ourselves working with RabbitMQ. We see both established, large companies and nimble, younger technology companies embracing this platform. And for good reason. It’s mature, highly performant, well supported (Java, .NET, node.js, Ruby, Erlang clients all exist), and much cheaper than many alternative messaging platforms. This post will give a quick introduction to the technology, and follow on posts will dig deeper into examples and more specific challenges we have faced.
RabbitMQ is an implementation of the AMQP protocol built on Erlang. This post will not dig deeply into AMQP, but it is a lightweight and flexible protocol for messaging. Messaging is a powerful programming paradigm that decouples clients from the underlying services. It’s similar to the way Starbucks works; you send a message to the cashier, not knowing who will make your coffee or when it will be done. You pay and then you wait. The next customer in line can then place her order without having to wait for your coffee to be made. Contrast this with a more traditional RPC-based approach like the local diner where a short order cook processes your food until it is complete and then begins on the next customer.
For folks familiar with JMS, there are some similarities but also some differences. The key concepts in Rabbit are exchanges and queues. Clients fall into two groups–producers and consumers. Producers send messages to exchanges, and consumers retrieve messages from queues. An example of a producer would be a mobile app that allows a customer to send a customer service request. The mobile app would bundle the request into a message, connect to RabbitMQ via a channel, and send the message. A consumer would be a trouble ticketing system that listens to a queue on the Rabbit server. Within Rabbit, an exchange receives the message and inspects its contents to determine that it needs to go to a queue called newRequests (the exchange is similar in role to a postal employee that sorts through the mail and determines where a given letter should be sent). The trouble ticketing system will have subscribed to the newRequests queue and Rabbit will ensure that the message gets delivered to the trouble ticketing system before it removes the message from the queue. The following diagram from the official RabbitMQ website illustrates this concept.
Great, so we have the core concept down. I will spend the remainder of this post talking about more advanced considerations. Again, there will not be a ton of technical detail, but there will be follow on posts that dig deeper into many of these concepts.
One of the primary concerns that drives adoption of a messaging platform is the requirement for guaranteed message delivery. If I send an order into a system, I expect it to be delivered at some point. Typically a system will use a messaging platform because it is recognized that there will be an imbalance for short periods of time during which messages are produced much faster than they can be consumed. This means that queues will grow temporarily during spikes (such as Black Friday, Tax Day, election day, end of month, etc), but they will shrink as consumers process the messages. The expectation is that these messages will remain in the queue until they are processed properly.
Picture a queue with 50,000 messages in it. Now picture the underlying server that is running the software housing that queue crashing. With RabbitMQ, the queues are stored in an Erlang database called mnesia. So if the server crashes, when it comes back up the messages will not have been lost and the consumers will begin processing those messages once again. However, if that server crashed and it took 20 minutes for it to be properly restarted, the system may find some upset end users. Because of this, RabbitMQ supports clustering and replication. A Rabbit cluster allows for multiple servers to be associated with one another. Again, this post is not long enough to dig into the details, but a Rabbit cluster cannot span a WAN boundary. But it can live on separate virtual or physical servers. Queues are then replicated across nodes in a cluster. One queue is designated the master and all others within that replication scheme are designated slaves. Exchanges deliver messages first to the master, which then replicates the messages to each slave. Once replication is complete, consumers read messages from the master. The master then removes each copy of the message from each slave.
If at any point a master crashes, the eldest slave is promoted to master, and consumers will continue reading messages from that master. So in the last example, when the master node goes down with 50,000 messages in it, there is a slave node that takes over and continues serving those 50,000 messages to consumers without any interruption in service. The failed server will eventually restart. Rabbit will remove every message from that queue on restart and it will typically join the cluster as a slave. Rabbit provides many options for synchronizing the new slave node with the other nodes in the cluster, and a later post will dig into the details of this since it is one of the key drivers of performance in Rabbit MQ.
As mentioned earlier, Rabbit clusters cannot span a WAN boundary. There are many use cases in which messages need to be replicated across a WAN or other network partition. To support these, Rabbit offers its shovel plugin and federation. Again, a future post will cover these topics in more detail, but they allow for creating more advanced failover or hub and spoke topologies within the messaging platform.