Testing RabbitMQ Clustering using Docker – Part 1

This is the first in a set of posts I am going to write about testing RabbitMQ clustering and high availability. The target audience for these articles is technical staff familiar with Linux, AMQP, RabbitMQ, general clustering, high availability, Docker, message queues, devops. If you’re not familiar with these you can still learn from this post, but I highly recommend checking out John’s earlier posts on: RabbitMQ, an Introduction and Extending a RabbitMQ Cluster across a WAN.

The goal of this post is to get a functional RabbitMQ cluster running across a set of Docker Containers that will support running messaging and high availability simulations. In this first post, I will cover setting up the cluster from scratch. Let’s begin by refreshing ourselves with a 3-node reference architecture for RabbitMQ clustering without a load balancer. This cluster is running in a data center, and our client applications, websites, services, monitoring tools and other clients connect over the default RabbitMQ port 5672 to the one of the brokers in the cluster. Each RabbitMQ broker runs on a Host node that could be a baremetal machine or a vm with network connectivity to the other Host nodes in the cluster. It will be out of the scope of these posts as to how RabbitMQ and the Erlang Port Mapper Daemon (empd) communicate across Host nodes to establish quorum, syncing, persistence, and durability. We are only interested in testing and hardening our RabbitMQ cluster as producers and consumers for use cases like Federation (RabbitMQ Cluster across a WAN).

RabbitMQ Clustering in a Data Center

Figure 1 – Static RabbitMQ Cluster Reference Architecture

SO WHAT CHANGES WHEN WE MOVE TO CO-LOCATED DOCKER CONTAINERS HOSTING OUR CLUSTER?

RabbitMQ Clustering on Docker

Figure 2 – RabbitMQ Cluster running in Docker Containers

We are going to run the RabbitMQ brokers clustered across Docker Containers on a single host. One of the bigger differences is that Docker is handling network routing at the Host-level to take traffic from a set of TCP ports: 5672, 5673, and 5674 and maps them to each internal Container’s port where the RabbitMQ broker listens on TCP port 5672. This is one of the many great features that Docker supports out of the box, and you can read how Docker handles this process of exposing specific external Host-level ports and connecting them to internal Container-level ports in the Docker docs. Applications can communicate to any of the external ports 5672, 5673, or 5674 to use the cluster, but for now let’s just keep it simple and say we will only use 5672 when interfacing with the cluster as producers and consumers. This kind of configuration is similar to how RabbitMQ simulates clustering using detached servers locally on a single machine (https://www.rabbitmq.com/clustering.html), but with Docker the advantage is that we can…

STOP WORRYING ABOUT THE HOST SYSTEM

By allowing us to ignore the networking pieces, we can focus on creating a single Container image to host a RabbitMQ broker, and build it so that each broker attempts to join the preconfigured cluster on startup. The advantage to using a Docker Container to host a RabbitMQ broker is that we can pull, push, maintain, version and deploy a Docker Container out of Docker Hub (or our own registry) to entirely different environments, hosts, and even cloud providers. Containers become versioned infrastructure units that are decoupled from a Host system. After running production web site deployments using PaaS offerings like OpenShift and OpsWorks that are agnostic to the underlying hosting systems, I find that Docker’s ability to host clusterable resources and services like message queues, redis, and memcache a great choice to remove the static hosting overhead commonly associated with clustering technologies. If the system can run Docker, then it can host whatever Container you want to deploy (provided the host has resources). So let’s build our RabbitMQ cluster. Quick note, you can find all of the code references and samples on the GitHub repository: https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker

To start with let’s install Docker, Docker Compose, RabbitMQ, and the other dependencies:

$ sudo yum install -y http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.6/rabbitmq-server-3.5.6-1.noarch.rpm
$ /usr/sbin/rabbitmq-plugins enable rabbitmq_mqtt rabbitmq_stomp rabbitmq_management  rabbitmq_management_agent rabbitmq_management_visualiser rabbitmq_federation rabbitmq_federation_management sockjs
$ sudo yum install python-setuptools git-core
$ sudo pip install --upgrade pip
$ sudo pip install pika==0.10.0
$ sudo yum install docker-engine

START DOCKER

$ sudo service docker start

Optional – Make Docker start on a reboot

$ sudo chkconfig docker on

INSTALL DOCKER COMPOSE

$ sudo pip install -U docker-compose

CONFIRM DOCKER IS WORKING

$ docker images -a
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
$
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$

INSTALL RABBITMQ ADMIN

wget https://raw.githubusercontent.com/rabbitmq/rabbitmq-management/rabbitmq_v3_5_6/bin/rabbitmqadmin -O /tmp/rabbitmqadmin
sudo chmod 777 /tmp/rabbitmqadmin
sudo mv /tmp/rabbitmqadmin /usr/bin/rabbitmqadmin

Confirm the RabbitMQ Admin tool is ready

$ which rabbitmqadmin
/usr/bin/rabbitmqadmin
$

Now that we have the system ready, we are going to build a Base Container Image and then extend it into a RabbitMQ Node Server Image that will handle the RabbitMQ broker, clustering start script, admin tools, and debugging helpers. Building a Docker Container Image requires just one file named “Dockerfile” that outlines a set of steps it will run to build the Container Image from scratch (and extend existing ones).

Here’s the Base Container Image Dockerfile in the repository https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/baseimage/Dockerfile and its contents:

FROM centos

# Install base deps
RUN yum install -y net-tools pwgen wget curl tar unzip mlocate logrotate

# Install base the EPEL repo
RUN rpm -Uvh http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm

# Install RabbitMQ deps
RUN rpm --import https://www.rabbitmq.com/rabbitmq-signing-key-public.asc
RUN yum install -y erlang
RUN yum install -y http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.6/rabbitmq-server-3.5.6-1.noarch.rpm

# Allow triggerable events on the first time running
RUN touch /tmp/firsttimerunning

To build the Base Image you can run the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/1_build_cluster_base_image.sh script or run this command from the same directory as the Base Image Dockerfile (You can change levvel and rabbitclusterbase to names you want. Docker refers to these parameters as username/imagename respectively):

docker build --rm -t levvel/rabbitclusterbase .

Now let’s confirm the Base Image named “rabbitclusterbase” is available with the command:

$ docker images 
REPOSITORY                 TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
levvel/rabbitclusterbase   latest              0a7f23864156        21 seconds ago      516.5 MB
docker.io/centos           latest              ce20c473cd8a        2 weeks ago         172.3 MB
$

Now we can extend this Base Image into hosting the RabbitMQ broker configured to join the cluster on startup. Here is the RabbitMQ Node Server Dockerfile in the repository https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/server/Dockerfile and its contents:

FROM levvel/rabbitclusterbase
MAINTAINER Your-Name Your-Email

# Create directories
RUN mkdir /opt/rabbit
RUN mkdir /opt/simulator
RUN mkdir /opt/simulator/tools

# Add the files from the local repository into the container
ADD rabbitmq.config     /etc/rabbitmq/
ADD rabbitmq-env.conf   /etc/rabbitmq/
ADD erlang.cookie       /var/lib/rabbitmq/.erlang.cookie
ADD startclusternode.sh /opt/rabbit/
ADD debugnodes.sh       /opt/rabbit/
ADD tl                  /bin/tl
ADD rl                  /bin/rl
ADD rst                 /bin/rst

# Add the simulator tooling
ADD simulator_tools/start_node.sh   /opt/simulator/tools/
ADD simulator_tools/stop_node.sh    /opt/simulator/tools/
ADD simulator_tools/join_cluster.sh   /opt/simulator/tools/
ADD simulator_tools/leave_cluster.sh  /opt/simulator/tools/
ADD simulator_tools/reset_first_time_running.sh /opt/simulator/tools/

# Set the file permissions in the container
RUN chmod 644 /etc/rabbitmq/rabbitmq.config
RUN chmod 644 /etc/rabbitmq/rabbitmq-env.conf
RUN chmod 400 /var/lib/rabbitmq/.erlang.cookie
RUN chmod 777 /opt/rabbit/startclusternode.sh
RUN chmod 777 /opt/rabbit/debugnodes.sh
RUN chmod 777 /bin/tl
RUN chmod 777 /bin/rl
RUN chmod 777 /bin/rst
RUN chmod 777 /opt/simulator
RUN chmod 777 /opt/simulator/tools
RUN chmod 777 /opt/simulator/tools/start_node.sh
RUN chmod 777 /opt/simulator/tools/stop_node.sh 
RUN chmod 777 /opt/simulator/tools/join_cluster.sh
RUN chmod 777 /opt/simulator/tools/leave_cluster.sh
RUN chmod 777 /opt/simulator/tools/reset_first_time_running.sh 

# Set ownership permissions on files in the container
RUN chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie

# Expose ports inside the container to the host
EXPOSE 5672
EXPOSE 15672
EXPOSE 25672
EXPOSE 4369
EXPOSE 9100
EXPOSE 9101
EXPOSE 9102
EXPOSE 9103
EXPOSE 9104
EXPOSE 9105

# Run this to autostart the cluster nodes
CMD /opt/rabbit/startclusternode.sh

Before we can build the RabbitMQ Node Server image, you will need to manually download the files from the repository https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/tree/master/server or just run the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/2_build_cluster_node_image.sh script that handles it for you.

Under the hood, the 2_build_cluster_node_image.sh is running this command and using the files from the server directory to copy, add, and configure into the Server’s Container Image:

docker build --rm -t levvel/rabbitclusternode .

Now let’s confirm the RabbitMQ Server Container Image named “rabbitclusternode” is available:

docker images

If things are working, you should see these Docker Container Images:

levvel/rabbitclusternode
levvel/rabbitclusterbase
docker.io/centos

Now we can start the cluster using Docker Compose. To do this we need to build a Docker Compose file outlining the three RabbitMQ Container Server Nodes, the IP address mappings, the clustering links, the hostnames of the nodes, the image to use, and other environment specifics like RAM versus DISC mode for the broker. Here is the sample from the repository https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/cluster/docker-compose.yml and the contents are here:

rabbit1:
  image: levvel/rabbitclusternode
  hostname: rabbit1
  ports:
    - "5672:5672"
    - "15672:15672"

rabbit2:
  image: levvel/rabbitclusternode
  hostname: rabbit2
  links:
    - rabbit1
  environment: 
   - CLUSTERED=true
   - CLUSTER_WITH=rabbit1
   - RAM_NODE=true
  ports:
      - "5673:5672"
      - "15673:15672"

rabbit3:
  image: levvel/rabbitclusternode
  hostname: rabbit3
  links:
    - rabbit1
    - rabbit2
  environment: 
   - CLUSTERED=true
   - CLUSTER_WITH=rabbit1   
  ports:
      - "5674:5672"
      - "15674:15672"

You can either run the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/3_start.sh to start the cluster or manually create this docker-compose.yml file and run the command from the same directory:

docker-compose up –d

The Docker Compose output should state that it is creating the cluster:

Creating cluster_rabbit1_1…
Creating cluster_rabbit2_1…
Creating cluster_rabbit3_1…

Once it finishes you can see the RabbitMQ Containers are running with the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/list_running_containers.sh or with:

docker ps -a

The output should show something similar to:

$ docker ps -a
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS              PORTS                                                                                  NAMES
bab6c45afa9e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   4 seconds ago       Up 2 seconds        4369/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:5674->5672/tcp, 0.0.0.0:15674->15672/tcp   cluster_rabbit3_1
45aa08a76a4e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   4 seconds ago       Up 3 seconds        4369/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:5673->5672/tcp, 0.0.0.0:15673->15672/tcp   cluster_rabbit2_1
f0403eaba029        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   5 seconds ago       Up 4 seconds        4369/tcp, 0.0.0.0:5672->5672/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp   cluster_rabbit1_1
$

Now we can examine the Cluster’s status with the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/rst file or using the RabbitMQ Admin Tool with the command:

rabbitmqadmin list nodes name type running

The output of both should be something similar to:

+----------------+------+---------+
|      name      | type | running |
+----------------+------+---------+
| rabbit@rabbit1 | disc | True    |
| rabbit@rabbit2 | ram  | True    |
| rabbit@rabbit3 | disc | True    |
+----------------+------+---------+

At this point we have a setup a RabbitMQ cluster that is ready to start testing.

LET’S TEST HOW RESILIENT RABBITMQ CLUSTERING IS

We can stop a running RabbitMQ Container using https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/end_node_2.sh or by running this command from the same directory as the docker-compose.yml file:

docker-compose stop rabbit2

Now if we run the Docker Container level check we should see the Container hosting the RabbitMQ Node 2 has stopped running with the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/list_running_containers.sh or the command:

docker ps -a

The output should show something similar stating that the node 2 instance exited:

CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                        PORTS                                                                                  NAMES
bab6c45afa9e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Up 2 minutes                 4369/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:5674->5672/tcp, 0.0.0.0:15674->15672/tcp   cluster_rabbit3_1
45aa08a76a4e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Exited (137) 2 seconds ago                                                                                          cluster_rabbit2_1
f0403eaba029        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Up 2 minutes                 4369/tcp, 0.0.0.0:5672->5672/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp   cluster_rabbit1_1

We can confirm the Cluster no longer has Node 2 as a running member with the https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/rst script or by the command:

rabbitmqadmin list nodes name type running

The output of both should be something similar to:

+----------------+------+---------+
|      name      | type | running |
+----------------+------+---------+
| rabbit@rabbit1 | disc | True    |
| rabbit@rabbit2 | ram  | False   |
| rabbit@rabbit3 | disc | True    |
+----------------+------+---------+

We have now simulated a RabbitMQ Cluster single broker outage (Like a production crash event). At this point, the cluster should still work for messaging and with almost no impact to the existing exchanges, queues and functions. We will begin to aggressively test this in the next blog. For now let’s make sure the rudimentary exchange and queue checks work with the scripts https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/exchanges.sh and https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/queues.sh or with the commands:

rabbitmqadmin list exchanges name type durable auto_delete internal policy vhost arguments
rabbitmqadmin list queues name node durable auto_delete messages consumers memory state exclusive_consumer_tag policy arguments

Since we have not created any broker entities the exchange output should be similar to:

+--------------------+---------+---------+-------------+----------+--------+-------+-----------+
|        name        |  type   | durable | auto_delete | internal | policy | vhost | arguments |
+--------------------+---------+---------+-------------+----------+--------+-------+-----------+
|                    | direct  | True    | False       | False    |        | /     |           |
| amq.direct         | direct  | True    | False       | False    |        | /     |           |
| amq.fanout         | fanout  | True    | False       | False    |        | /     |           |
| amq.headers        | headers | True    | False       | False    |        | /     |           |
| amq.match          | headers | True    | False       | False    |        | /     |           |
| amq.rabbitmq.log   | topic   | True    | False       | True     |        | /     |           |
| amq.rabbitmq.trace | topic   | True    | False       | True     |        | /     |           |
| amq.topic          | topic   | True    | False       | False    |        | /     |           |
+--------------------+---------+---------+-------------+----------+--------+-------+-----------+

And the queue output should be similar to:

No items

Let’s restore services on RabbitMQ Node 2 and confirm nothing broke with the script https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/start_node_2.sh or with the command from the same directory as the docker-compose.yml file:

docker-compose start rabbit2

We should see Node 2 is running again at the Container level with the script https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/list_running_containers.sh or the command:

$ docker ps -a
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS              PORTS                                                                                  NAMES
bab6c45afa9e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Up 2 minutes        4369/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:5674->5672/tcp, 0.0.0.0:15674->15672/tcp   cluster_rabbit3_1
45aa08a76a4e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Up 5 seconds        4369/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:5673->5672/tcp, 0.0.0.0:15673->15672/tcp   cluster_rabbit2_1
f0403eaba029        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   2 minutes ago       Up 2 minutes        4369/tcp, 0.0.0.0:5672->5672/tcp, 9100-9105/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp   cluster_rabbit1_1
$

We should also see Node 2 running at the Cluster level with the script https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/rst or with the command:

$ rabbitmqadmin list nodes name type running
+----------------+------+---------+
|      name      | type | running |
+----------------+------+---------+
| rabbit@rabbit1 | disc | True    |
| rabbit@rabbit2 | ram  | True    |
| rabbit@rabbit3 | disc | True    |
+----------------+------+---------+
$

Now we can rerun the exchanges.sh or the queues.sh and confirm the Brokers and entities are still in sync.

Again since we did not change any broker entities the exchange output should be similar to:

$ ./exchanges.sh

Displaying Cluster Exchanges

+--------------------+---------+---------+-------------+----------+--------+-------+-----------+
|        name        |  type   | durable | auto_delete | internal | policy | vhost | arguments |
+--------------------+---------+---------+-------------+----------+--------+-------+-----------+
|                    | direct  | True    | False       | False    |        | /     |           |
| amq.direct         | direct  | True    | False       | False    |        | /     |           |
| amq.fanout         | fanout  | True    | False       | False    |        | /     |           |
| amq.headers        | headers | True    | False       | False    |        | /     |           |
| amq.match          | headers | True    | False       | False    |        | /     |           |
| amq.rabbitmq.log   | topic   | True    | False       | True     |        | /     |           |
| amq.rabbitmq.trace | topic   | True    | False       | True     |        | /     |           |
| amq.topic          | topic   | True    | False       | False    |        | /     |           |
+--------------------+---------+---------+-------------+----------+--------+-------+-----------+

$

And the queue output should be similar to:

No items

At this point you can stop the cluster and return the resources back to the host with the script https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker/blob/master/4_stop.sh or by running this command from the docker-compose.yml directory:

docker-compose stop

You can confirm the cluster is no longer running with the command:

$ docker ps -a
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS                            PORTS               NAMES
bab6c45afa9e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   3 minutes ago       Exited (137) 3 seconds ago                       cluster_rabbit3_1
45aa08a76a4e        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   3 minutes ago       Exited (137) 3 seconds ago                       cluster_rabbit2_1
f0403eaba029        levvel/rabbitclusternode   "/bin/sh -c /opt/rabb"   3 minutes ago       Exited (137) 3 seconds ago                       cluster_rabbit1_1

WELL THAT’S IT FOR THE FIRST POST! LET’S WRAP UP WHAT WE DID:

  1. We created two Docker Containers from scratch (Base, RabbitMQ Server)
  2. We started our own RabbitMQ cluster using Docker and Docker Compose
  3. We simulated a critical failure in our RabbitMQ cluster
  4. We fixed our outage and restored our RabbitMQ cluster back to normal operation

Thanks for reading! And I hope you found it valuable. Please check back for our next post in this series, which will focus on message simulation and testing strategies for hardening your RabbitMQ cluster. We will be extending the repository (https://github.com/GetLevvel/testing-rabbitmq-clustering-with-docker) to include cluster testing strategies so let us know if you have specific test simulations you would like to see or questions on getting started. If your organization would like assistance determining your RabbitMQ clustering strategy, please reach out to us at Levvel and we can get you started.

- Jay

Jay Johnson

Principal Consultant

IT Professional with 10+ years of experience in architecture, design and implementation of large distributed, real-time systems across a variety of environments. Focused on executing aggressive timelines by leveraging my expertise in technology, process, and best practices.

GitHub Portfolio: https://github.com/jay-johnson

Related Posts