Deploying code with packages

2012-05-15

Startups iterate quickly. As a devops professional (apparently that's what they call me now), I've found that the ability to turn over code deploys, new services, and configuration changes very fast is critical to keeping your customers (the engineering team) happy. This process is generally referred to as Continuous Integration or CI. A lot of new tools have popped up over the last few years to get code from VCS to prod servers faster along the lines of Capistrano, Fabric, and... probably some other stuff. These tools are great for creating quick-and-dirty, reproducible-ish deployments without a ton of thought. However, this approach is completely wrong.

There is quite a bit of prior art in the area of code deployment in the form of operating system packages. Everything on a fresh install of a server probably came from a system package. Kernels, compilers, editors, network utilities, were all packaged by the OS maintainers and installed by an automated system. Your code is not special, so why treat it's deployment specially?

The problem with OS packaging is that it's a bit of an arcane art. Systems like RPM, Apt, and Ebuild are fraught with outdated documentation, links to dead blogs, buried mailing list posts, and "just ask the dude on IRC, he's pretty cool". But you can't ignore that all of these systems have essentially solved every type of code deployment problem you're likely to face, and as lazy devops/sysadmin people, we hate reinventing the wheel, so here we go.

For the sake of brevity, sanity, and inanity, I'm going to assume that you're deploying a Python web app in a Git repo and deploying to a recent Debian or Ubuntu system. Our goal is to be able to install a package using Puppet, Chef, CFengine, or something else that puts all of your code in a standard location every time, on every server.

Ok, the basics. Debian packages are .deb files. A .deb file is basically a compressed tar archive containing a built binary of your app with some extra metadata about versioning and dependencies. The dpkg-deb utility allows us to examine and unpack .deb files. Take a look at an existing package's metadata with dpkg-deb -I /var/cache/apt/archives/something_1.2.3-1.deb. Anytime you apt-get install a package, it caches the downloaded archive in /var/cache/apt/archives. If you've installed a lot of things, you'll have a lot of .deb files here. Feel free to poke around, take a look at man dpkg-deb for more options and tools for playing with .deb archives.

So, why can't we just pack up our source directory into a .deb file and call it a day? That'll get our files onto the server, and it'll work the same every time, right? Sure, but then you'd be skipping a lot of the really useful and important parts of the Debian packaging system. As a rule, nothing makes it into upstream Debian unless it can be built from a special type of package called a source package. Source packages provide all of the instructions, dependencies, and versioning information required for the build system to generate a .deb for (theoretically) any architecture or distribution using Debian-style packaging from nothing but source code.

Great, so how do you create a source package? First, you'll need some tools.

sudo apt-get install devscripts pbuilder git-buildpackage dh-make apt-utils

Ok, so what are these things? pbuilder will create a chroot jail (google it, I'll wait) around a completely clean and sterile, minimal Debian install, and build your package inside the jail using a bunch of tools, rather than directly on your system with god-knows-what installed on it. This ensures that your final built package is reproducible, that some dude in Siberia that wants to hack on your code or port it to a different system gets the same resulting binaries that you did. git-buildpackage wraps around pbuilder to provide intelligent management of package sources within a git repository, and dh-make generates boilerplate for new source packages.

Let's start packaging. This is a Python app, so we'll package it the standard Python way, using distutils or setuptools. A simplistic setup.py is enough to get started: http://docs.python.org/distutils/introduction.html#a-simple-example

The Debian maintainers have spent a LOT of time figuring out the best way to get Python code onto a system in sys.path without resorting to patching every upstream package drastically. All you need to do to get a source package is to add a debian/ directory at the root of your repository. Of course, there are already several tools that will setup the boilerplate debian/ for you, the most common of which is called dh_make. There are other similar tools specific to different languages, feel free to Google them.

~/myfirstwebapp-0.1 $ export DEBEMAIL="jeremy@example.com"
~/myfirstwebapp-0.1 $ export DEBFULLNAME="Jeremy Grosser"
~/myfirstwebapp-0.1 $ dh_make --createorig
~/myfirstwebapp-0.1 $ rm debian/*.{ex,EX}

First, we set a couple environment variables to tell the scripts what to fill in for the maintainer and author fields. You probably want to just put this in your .bashrc. dh_make is kinda old and janky, so your source directory has to be named - or it'll whine about it. You also have to pass --createorig to generate an .orig.tar.gz (more on this later), which is pretty irrelevant when using git-buildpackage, but again we do it to keep dh_make from complaining. When dh_make asks you what kind of package to create, just type "s" and press enter for now, the other types are designed for different packaging scenarios, like kernel modules. You may want to try CDBS if you're packaging an Ant library or some other standardish build scripts. We also deleted a bunch of the example files that dh_make created because for simple packages they're completely unnecessary. Feel free to browse around the examples if you've got more complicated projects in mind, they are well commented.

At this point, you have something vaguely resembling this:

~/myfirstwebapp-0.1 $ ls -1 debian/
changelog
compat
control
copyright
docs
README.Debian
README.source
rules
source

Of these, changelog, compat, control, and rules are absolutely required. The rest are technically optional, but you probably don't want to get rid of them.

debian/changelog is exactly what it sounds like, a changelog of what you've done with this source package. Each section of this file represents a package version. Note that all of the spacing and formatting in this file is very specific and your build will break if you change spaces to tabs, for example. Let's take a closer look at the version number: (0.1-1). Version numbers in packages hold a lot of meaning to the system. The first part 0.1 is the upstream package version. If you had downloaded this package from some website, this is the upstream author's version. The second part -1 is the Debian version. Occasionally it's useful to be able to revise the packaging metadata or build scripts for a package before the upstream maintainer has tagged a new release. In those cases, this number simply gets incremented. There are a lot of different ways to represent a version here, they're all enumerated in the completely difficult to comprehend Debian Policy Manual, section 5.6.12 (http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version). All you really need to know is that a version with a "-" in it came from an upstream source (eg. you didn't write it), and anything that's just a version number like "2.5.1" with no "-" part, is known as a native package. In most cases, you'll wanna stick with a non-native package and keep the debian version separate from the upstream version.

debian/compat basically just tells apt that you're using a not ancient spec for package layout. You will probably never need to change this.

debian/control provides a lot of the metadata that's used for managing how your package appears to the user and the rest of the system. Let's dive a bit deeper...

Source: myfirstwebapp
Section: unknown
Priority: extra
Maintainer: Jeremy Grosser <jeremy@example.com>
Build-Depends: debhelper (>= 8.0.0)
Standards-Version: 3.9.2
Homepage: <insert the upstream URL, if relevant>
#Vcs-Git: git://git.debian.org/collab-maint/myfirstwebapp.git
#Vcs-Browser: http://git.debian.org/?p=collab-maint/myfirstwebapp.git;a=summary

Package: myfirstwebapp
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}
Description: <insert up to 60 chars description>
 <insert long description, indented with spaces>

As you can see, there are two sections here. The first one describes the source package and the dependencies needed to build it, where the second one describes the resulting binary package and dependencies required to install it. Let me repeat that: there are two dependency lists, one for building and one for installing. The rest of the fields here are fairly self-explanatory and some are even optional (but highly recommended). There are a lot of different things that you can specify in the control file, there's a whole section dedicated to it in the Debian Policy Manual. http://www.debian.org/doc/debian-policy/ch-controlfields.html

So what belongs in your Build-Depends list? debhelper (>= 8.0.0) is listed by default, because you'll basically always want it. Debhelper gives you all kinds of useful shortcuts and anything prefixed with dh_ probably comes from here. Yeah, you want that, it makes your life easier. We're building a python package, so let's add... python! Amazing how that works. If your package had C extensions, you'd also want to include python-all-dev in this list. If you're using setuptools, you'll need that too, it's called python-setuptools. Ideally you'd also specify a minimum version number for each of these things (except python, but that's a whole can of worms, just leave it without a version). I tend to leave explicit versions off of the Build-Depends while getting a new package working, then go back and add them later, once I know exactly what I need and what's available.

In the binary package section, we don't need to change a lot. Unless you're building C extensions or have some other dependency on x86 architecture, change that to "all", which causes the build system to create one binary package that can be installed on any system, like a home router with an ARM chip, for example. If you left this as "any", you'd have to rebuild the binary for each target architecture.

It's worth noting, your binary package name does not have to be the same as your source package name. It's actually fairly common for a source package to be something like "sqlalchemy" that creates a binary named "python-sqlalchemy". In the case of Python packages, the convention is that libraries and module are prefixed with "python-" where services and tools that just happen to be written in Python aren't prefixed.

A single source package can define several binary packages. For example, the "sqlalchemy" source package creates a "python-sqlalchemy-doc" binary, so that you can install the docs separately. Multiple binary packages are outside the scope of this document.

At this point, you should have something like this:

Source: myfirstwebapp
Section: misc
Priority: extra
Maintainer: Jeremy Grosser <jeremy@example.com>
Build-Depends: debhelper (>= 8.0.0), python
Standards-Version: 3.9.2
Homepage: http://example.com/dist/myfirstwebapp-latest.tar.gz
Vcs-Git: https://github.com/synack/myfirstwebapp.git
Vcs-Browser: https://github.com/synack/myfirstwebapp

Package: myfirstwebapp
Architecture: all
Depends: ${shlibs:Depends}, ${misc:Depends}, python-bottle, python-jinja2
Description: My first web app!
 There are many like it, but this one is mine. Accessories sold separately, batteries not included.

debian/copyright lists out who maintains this code, who wrote it, who owns the rights to it, how they've licensed it, etc, etc. Technically you don't need it, but somebody somewhere on the internet cares about this sort of thing. If you ignore it, prepare for an onslaught of trolls.

debian/docs is simply a list of documentation files that come with your package. They'll get installed somewhere under /usr/share/doc and nobody will ever read them before emailing you. How sad. In most cases, you'll just put something like "README.txt" in here and call it a day.

debian/README.Debian and debian/README.source are where you'd put any notes specific to the packaging of this application. If you don't have anything interesting to say, just delete them.

debian/rules is a Makefile that needs to support a bunch of different targets like "build" and "install". This is where the real work of the package comes in. The build target needs to do any sort of compiling, linking, and whatnot to turn your code into something useful, then the install target comes along and lays it all out under debian/tmp/ as close to the FHS standard (http://www.pathname.com/fhs/) as possible. The good news is that you don't have to write any of that! Debhelper is here to... uh... help! The example rules file generated by dh_make is basically all you need:

#!/usr/bin/make -f
# -*- makefile -*-
# Sample debian/rules that uses debhelper.
# This file was originally written by Joey Hess and Craig Small.
# As a special exception, when this file is copied by dh-make into a
# dh-make output file, you may use that output file without restriction.
# This special exception was added by Craig Small in version 0.37 of dh-make.

# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1

%:
    dh $@

With one slight addition, this will work perfectly for us. Change that last line to dh --with=python2 $@ and you're done. This tells debhelper that there's a setup.py in the root of your source package and it behaves as expected, just like that.

debian/source/format hints to the build system whether this is a native or non-native package.

Native: 1.0

Non-native: 3.0 (quilt)

quilt is a patching system that allows your debian packaging to apply patches to the upstream tarball before building it. We're not using it here, but you have to say that anyway. If you need to edit something from upstream and you can't get a patch submitted there, you can create a debian/patches directory with a bunch of patch files in it, and the build scripts will take care of getting that all squared away during the build. If you need this, go read the quilt manpages.

That's all. You now have a basic Debian source package. As we're managing this repo with git-buildpackage, there are some conventions for where to commit it. The ideal case is that you have an upstream branch and a master branch, where upstream is simply a copy of the latest upstream release, and master has the debian/ directory added to it. When a new upstream release comes out, you merge it into the upstream branch, then merge that to master, update debian/changelog (using git-dch or debchange), make sure the dependencies in debian/control are still accurate, and commit to master. This whole process becomes quite a bit easier if the upstream package maintainer also uses git, or if you use a tool like git-import-orig to pull in new release tarballs. It's good practice to also tag each upstream release, as well as your debian package releases. git-dch and git-import-orig will do this for you, given the right arguments.

We're going to be setting up a brand new pbuilder environment, which means building a new tarball that represents a clean debian install for our architecture and distribution. Every time a build runs, it'll untar this into a chroot jail and run the build there.

pbuilder create

This command will take a while and download all of the packages necessary to build a minimal debian system using a tool called debootstrap. If you wanted to build for a different architecture or distribution than the one you're currently runnning, you'd edit /etc/pbuilderrc or ~/.pbuilderrc to point it at the right mirror, dist, tarball location, build result location, etc, etc. The pbuilderrc(5) lists all of the options you can use there. For now, let's just stick with the defaults.

sudo git-buildpackage --git-builder='pdebuild --debsign-k jeremy@example.com --auto-debsign'

This kicks off the building of our package, using the sources in the current working directory. It needs root permissions in order to create the chroot jail. chroot is a privileged syscall. It goes through a lot of different stages, roughly:

  1. Runs debian/rules clean to make sure our source directory isn't dirty
  2. Create a source package consisting of a .dsc, .debian.tar.gz, .orig.tar.gz, and .changes files from our upstream and master branches.
  3. Setup a new chroot jail and untar the base tarball into it
  4. Create and install a virtual package that installs all of our Build-Depends into the jail
  5. Copies our source package into the jail
  6. Runs debuild inside the jail, which calls dpkg-buildpackage. The manpages for these scripts tell you more about what they do. All you need to know is that they parse debian/control and debian/changelog, call your rules targets and spit out a .deb
  7. Signs the resulting package with your GPG key, identified by the debsign-k argument to pdebuild. You may be prompted to enter your key's passphrase at this point.

If you're lucky, you waited a while, saw a lot of text scroll by, and the build finished successfully. Your built package is now in /var/cache/pbuilder/result

$ ls -1 /var/cache/pbuilder/result/myfirstwebapp_0.1*
myfirstwebapp_0.1-1_all.deb
myfirstwebapp_0.1-1_amd64.changes
myfirstwebapp_0.1-1.debian.tar.gz
myfirstwebapp_0.1-1.dsc
myfirstwebapp_0.1.orig.tar.gz

Pretty awesome, right? Now you just need a Debian repository to upload it to.

A Debian package repository is nothing more than a bunch of files arranged in a specific way, with a files like Packages and Release that tell apt what versions of what packages are available and where to find the .deb files to download. You can browse around most official Debian distributions to get a feel for how a large repo is laid out: http://ftp.us.debian.org/debian/. For our purposes, let's keep it a bit simpler. We're just going to generate a Packages and Release file in the same directory as our binaries.

mkdir -p /var/www/mybuntu
cd /var/www/mybuntu
cp /var/cache/pbuilder/result/myfirstwebapp_0.1* .
apt-ftparchive packages >Packages
apt-ftparchive release >Release
gpg --yes --abs -u jeremy@example.com -o Release.gpg Release

You'll need to run this (everything except the mkdir -p part) every time you add a new package to the repository. It's probably a good idea to write a script so you don't forget all that stuff. apt-ftparchive supports much more complicated layouts and configuration files, I encourage you to read the manpages for that so you can do clever things like manage multiple distributions (separating production and development packages is really handy).

Now you just need to add this apt repository to your clients. Edit /etc/apt/sources.list.d/mybuntu.list on some other server or even your repository machine, if so inclined.

$ cat /etc/apt/sources.list.d/mybuntu.list
deb http://packages.example.com/mybuntu/ ./
$ sudo apt-get update
$ sudo apt-get install myfirstwebapp

Ta da! Your webapp is now installed in the python path! Any scripts you defined in your setup.py are in /usr/bin, and everything is happy and reproducible. You can build new releases of your package, put them in the apt repository and now they're available to all of your servers. Combine this with a config management tool like Puppet where you can do things like ensure => latest and your code gets updated on every puppet run. Need to roll back code because you broke everything? apt-get install myfirstwebapp=0.1-1 or just remove it completely: apt-get remove myfirstwebapp

These tools fit quite nicely with existing CI tools like Jenkins, allowing you to do automated testing, building, and deployment of a package every time somebody commits to master.

Next post - Hacking the RXS-3211