How to scale a web application with one docker container

July 27, 2014

Reading time ~7 minutes

The problem

Anyone who has worked in/for a startup will know you aren’t just (your job role) you job is (your job role) + everything else tech related. Back end programmers are writing javascript, front end developers are configuring nginx, you get the jist.

I am part of a new startup GetWork2Day and currently we have 10 people working for the startup, I am the only one who is doing anything technical, which means I am responsible for: Backend, front end, database, server architecture and everything else. Our current server configuration is a basic Ubuntu server, with nginx and everything else needed, its behind a load balancer to handle scaling and it does the job. However, I don’t want to manage a server full time and when we get more staff, I don’t want people have no server experience, poking around with my server, ill come back from a few days off and can foresee the problems already. I need a service where I can just push code to github and it deploys for me.

Wait, what about….

Elastic Beanstalk, Heroku, Digital Ocean.

Well all of thes are great, they really are, but running things outside of the normal remit causes problems, GetWork2Day uses very unique and obscure libraries for some parts, so I wanted more control over the deployment, also we can’t afford to run on any of the three major players.

So how can I homebrew this?

Well, I first looked at Puppet and Bamboo. Both of these are great bits of software, but they are overpriced (for us) and they also work on a fixed IP address (from what I was reading) to connect to your application. If you’re application is behind a load balancer, you IP is always random and so I needed something where my application reached out and connected to my git server, downloaded the source code (if there was new commits), rebuilt it and then re-launched the application. This led me to jenkins.

Why Jenkins?

Jenkins is free and open source and with its great array of plugins, I can make it do whatever I want (within reason). What I did was create a basic jenkins setup with a git project, set it to check for new git commits every hour, if there was new code to download, jenkins would pull the changes, copy the code to the running location with rsync and then restart the local application server. Meaning now all I have to to is push code to the master branch (or however you configure jenkins) then, jenkins will automatically sort everything out for the re-deployment. Even if my application had scaled across 20 servers, each server would stay up-to-date with the latest code, without any further integration from me (or anyone). Jenkins can also run a custom command or script after a job is cloned/build completed successfully, mine is:

# After successfull build

# rsync the build dir to the app dir
rsync -av --delete --delete-excluded --exclude ".pyc" /var/lib/jenkins/jobs/g1/workspace/ /home/app/

# restart supervisor
supervisorctl restart tangable:

What does this have to do with Docker?

Well, with the current setup, I still have a server to maintain and I didn’t wan’t that, I wanted this entire server bundled into one build script, and now I am going to tell you how I built this entire framework with a script in under 40 lines, with the brilliant Dockerfile.

The basics

Ok, so lets start:

# Set base image
FROM ubuntu
MAINTAINER Luke Crooks "luke@pumalo.org"

Now, we have the basics set, we can install some packages:

### APT SECTION ###
# Update aptitude with new repo
RUN apt-get update
# Install software 
RUN apt-get install -y git nginx supervisor wget python-virtualenv curl 
RUN apt-get install -y python-dev postgresql-server-dev-9.3 rsync
# Install libaries
RUN apt-get install -y libjpeg62 libjpeg62-dev zlib1g-dev libpng12-0
RUN apt-get install -y libtiff5 libgif4 libgeos-dev

Now we have the basics we need, we can go ahead and start configuring. In the same folder that you are building your Dockerfile, you will also need the copy of your ssh key (used at github or bitbucket to verify your credentials).

### SSH SECTION ###
# Make ssh dir
RUN mkdir /root/.ssh/
# Copy over private key, and set permissions
ADD id_rsa /root/.ssh/id_rsa
RUN chmod 0600 /root/.ssh/id_rsa
# Add bitbuckets key to known_hosts
RUN ssh-keyscan bitbucket.org >> /root/.ssh/known_hosts

Now we can clone our private repositories, one for all our docker configuration, the other for our application:

# Clone the conf files into the docker container
RUN git clone git@bitbucket.org:Username/docker-conf.git /home/docker-conf
# Clone the repo locally
RUN git clone git@bitbucket.org:Username/application.git /home/app

Awesome, now we have all the configuration filed we need, we can remove the default files and copy your custom configuration files to where they need to be:

# remove default nginx configs
RUN rm /etc/nginx/sites-available/default && rm /etc/nginx/sites-enabled/default
# Copy new settings
RUN cp /home/docker-conf/configs/nginx/nginx.conf /etc/nginx/nginx.conf
RUN cp /home/docker-conf/configs/nginx/sites-available/site.conf /etc/nginx/sites-available/
# Enable the new site with a symbolic link
RUN ln -s /etc/nginx/sites-available/site.conf /etc/nginx/sites-enabled/site.conf

Next up, we are going to install Jenkins and copy over our local config files:

# Install Jenkins
RUN wget -q -O - http://pkg.jenkins-ci.org/debian/jenkins-ci.org.key | sudo apt-key add -
RUN sh -c 'echo deb http://pkg.jenkins-ci.org/debian binary/ > /etc/apt/sources.list.d/jenkins.list'
RUN apt-get update
RUN apt-get install -y jenkins

# Copy over configs
RUN rm /etc/default/jenkins
RUN cp /home/docker-conf/jenkins/default /etc/default/jenkins 
RUN chmod +x /home/docker-conf/jenkins/after-build
RUN rm -rf /var/lib/jenkins/*
RUN cp -a /home/docker-conf/jenkins/root/* /var/lib/jenkins/
# Reset file permissions after moving files
RUN chown jenkins:jenkins -R /var/lib/jenkins/
RUN chmod 775 -R /var/lib/jenkins/

# Give jenkins permissions to manage the supervisor service
RUN echo "jenkins        ALL = NOPASSWD: ALL" >> /etc/sudoers

Now this is fairly python specific, but the same principles will apply for other languages, we need to create an environment to install application dependencies, and install the dependencies from the requirements file:

# Install the python requirements from requirements.txt
RUN virtualenv --no-site-packages "/home/env"
RUN /home/env/bin/pip install -r /home/app/requirements.txt
# Lastly install the app into the virtualenv
RUN /home/env/bin/python2.7 /home/app/setup.py install

Not normal docker behaviour

Usually a docker container is meant to one run process/command, e.g just an application, or a web server or whatever you want. But, if you use a program such as Supervisor you can configure it to run multiple programs, e.g. in old money, you can run an entire LAMP stack inside a docker container. Explaining how to configure Supervisor is outside the scope of this post, but check out their website for some guides.

### SUPERVISOR ###
# Now copy the supervisor configs over
RUN cp /home/docker-conf/configs/supervisor/supervisord.conf /etc/supervisor/supervisord.conf
RUN supervisord -c /etc/supervisor/supervisord.conf

# Web app configs
RUN cp /home/docker-conf/configs/supervisor/site.conf /etc/supervisor/conf.d/app.conf
# Nginx configs
RUN cp /home/docker-conf/configs/supervisor/nginx.conf /etc/supervisor/conf.d/nginx.conf
# Jenkins confgs
RUN cp /home/docker-conf/configs/supervisor/jenkins.conf /etc/supervisor/conf.d/jenkins.conf

Now docker is running our application, nginx and jenkins, all we need to do is expose port 80 and tell docker to run the supervisor command on start:

# Expose the python app to world, note jenkins will not be accessible as we
# are only expising port 80, not 8080 which is where jenkins runs.
EXPOSE 80
CMD ["supervisord", "-n"]

In Summary

So in summary, we have just created one script that builds an Ubuntu image, installs all dependencies, configures the build process, manages the application and web server and takes care of everything else. It is also capable of running behind a load balancer, so if your docker container was scaled into 20 different instances, they would all stay up to date, without having to configure a management application to deploy new code to different IP addresses.

Note

This is not usually the norm for docker deployment, but it works and it works well. Use at your own peril! No databases configuration is shown here as we are using a remote RDS amazon database. You should never host a databased behind your own load balancer as they are destroyed and created frequently.

Python TSP

A couple of useful scripts to solve the Travelling Salesman Problem in Python (using third party API's). Continue reading

Why I stopped using Django

Published on July 21, 2014

PostgreSQL cheat sheet

Published on July 20, 2014