Why? Mostly as a learning experience. For as long as I can remember, my blog has been a place to experiment with new ways of doing things and understand the benefits and limitations of different architectures. In the past, I've used PHP/Apache/MySQL, Python/MySQL, Python/SQLite, Bash generating static files, Python generating static files, Perl CGIs, and half a dozen other mechanisms for turning words into web pages. In recent years, I've gravitated toward lower maintenance systems that require less attention to stay running. Most recently, it's been a Python app using the Clay framework running on App Engine with Google Cloud Datastore as the database. This configuration has been extremely stable with virtually no interaction on my part. No servers to manage, no scaling to worry about, and extremely low cost as I rarely get enough traffic to run over App Engine's free tier, especially with Cloudflare's caching in front of it.
The code is fairly simple and a little ugly. Each post consists of a title, slug, created_date, modified_date, and content blob all stored under a single key in datastore. Posts are queried from datastore by slug or modified date, depending on which URL scheme the request used. The resulting list of posts is passed to a Jinja2 template with a pile of custom filters defined that spits out HTML to be sent as a response. All responses are set in memcache before being sent to the client and responses are always served from memcache if possible to minimize datastore queries and CPU time spent rendering templates. As the blog content rarely changes, this results in extremely high cache hit rates.
The only App Engine specific pieces are a few dependency libraries installed using app.yaml and the google.appengine.ext.db module, which provides a simple ORM for interacting with datastore. The db module has been deprecated for quite a while and replaced with ndb, which offers more functionality and a cleaner API. I had expected to port everything over to ndb, but was surprised to find that you cannot use the ndb library outside of App Engine, even though there's a public datastore API. The public API uses the gcloud.datastore library, which shares no code with google.appengine.ext.ndb and is a far cry from an ORM. gcloud.datastore is little more than a wrapper around datastore's protobuf interface. There is an ongoing effort to bring ndb functionality to gcloud.datastore, but it appears to be waiting on a few new methods to be added to the public datastore API.
The blog is a fairly simplistic use case for datastore, so I decided to move forward with porting the handful of queries over from the ORM to the GQL interface. As I began testing these changes, I ran into authentication errors when trying to query datastore. There are three authentication methods supported by the gcloud library: Explicit credentials, credentials from a file, or a service account. Service accounts are a really nice feature of Google Cloud, allowing you to automatically provide access to other Google services from Compute Engine instances running in your project. I found that my development instance didn't have the proper scopes enabled for datastore access when it was created, so I had to delete the instance (keeping the disks) and re-create it with the service auth checkbox checked. After spending a few hours debugging an unrelated issue where I had broken the boot scripts on my dev instance, I tried to access datastore with the service account again and still got an unauthorized error. I did a fair bit more debugging, watching HTTP requests fly back and forth, ensuring that the gcloud library was getting an OAuth token using the service account and passing that in a header to the datastore service, but never managed to get this to work. I suspect there's something broken in either the gcloud library or the public datastore service preventing service accounts from authenticating properly. I generated a new service account in the Cloud Console, exported it as a json file, and passed that to the gcloud.datastore.Client.from_service_account_json classmethod to authenticate my app instead. This worked on the first try. Unfortunately, now I have a file full of secrets to worry about...
The next step was to get my app running in a Docker container. This was fairly straightforward as my app exposes itself as a WSGI callable, so I could run it directly with uwsgi. A fair bit of trial and error was required to get all of the dependencies to build and install properly, but I eventually ended up with this:
FROM debian:jessie COPY config/sources.list /etc/apt/sources.list RUN apt-get update && apt-get install -yf \ dnsutils python-dev python-pip python-yaml python-memcache \ python-openssl python-crypto python-cryptography \ python-jinja2 && rm -rf /var/cache/apt/archives/*.deb COPY . /app RUN pip install -r /app/requirements.txt ENV CLAY_CONFIG=/app/config/production.json CMD /usr/local/bin/uwsgi \ --http-socket :8080 \ --wsgi-file application.py \ --callable application \ --pythonpath lib \ --chdir /app \ --static-map /static=static \ --static-map /robots.txt=static/robots.txt \ --enable-threads EXPOSE 8080
I installed several of the dependencies as debian packages to avoid compiling those libraries from source and having to worry about development headers. Deleting the apt archives makes the layers a little bit smaller. The static-map arguments replicate the functionality of a few static routes I had in my App Engine app.yaml.
Once I had a working Docker container, I tagged and pushed it to Google Container Registry, which is a handy private Docker registry that is available to any instance in your project using it's service account.
After a few more iterations of debugging and testing, I launched a Kubernetes cluster using Container Engine. This is incredibly easy:
gcloud container clusters create testing
While I was waiting for the cluster to launch (it takes a few minutes) I skimmed the Kubernetes docs and wrote service and pod definition files. A Kubernetes pod is essentially a list of containers to be run at the same time on a single host. In the simplest case, a pod can just run a single container, however I wanted to run a memcached container alongside every uwsgi container, just to keep things balanced out. This way, whenever I scale up more uwsgi instances, I get more memcache instances too. While cache performance probably isn't a huge issue now, it might be later, and it doesn't hurt to plan ahead a bit. If things become memory constrained, I can split memcached into a separate pod and scale it independently. Replication Controllers ensure that a number of replicas of a given pod are running at any time, so we wrap our pod definition with a replication controller. It is possible to have pods without RCs, but then the pods won't be restarted automatically if they are terminated unexpectedly.
apiVersion: v1 kind: ReplicationController metadata: name: steel-v7 spec: replicas: 2 template: metadata: labels: app: steel version: v7 spec: containers: - name: memcache128 image: "memcached:latest" command: ["/usr/local/bin/memcached", "-m", "128", "-v"] ports: - containerPort: 11211 - name: uwsgi image: "gcr.io/projectname/steel:v7" ports: - containerPort: 8080
In order for the uwsgi containers to be able to connect to the memcache containers, I need to define memcache as a service. Services map ports to pods and populate environment variables in every container that tell your app what IP/port to connect to for each service. Additionally, one of the Kubernetes add-ons that Container Engine launches automatically, SkyDNS, creates DNS records that make discovery even easier. In my application, I can simply call memcache.Client(['memcache:11211']) and the client will be connected to a semi-random memcache instance. While this isn't as efficient as passing the list of all memcache shards, it works well enough for this use case.
apiVersion: v1 kind: Service metadata: name: memcache labels: app: steel spec: type: NodePort selector: app: steel ports: - port: 11211 protocol: TCP name: memcache --- apiVersion: v1 kind: Service metadata: name: http labels: app: steel spec: type: LoadBalancer selector: app: steel ports: - port: 80 targetPort: 8080 protocol: TCP name: http
The memcache service is defined using the NodePort type, which means that this port will be available to all containers inside the cluster, but will not be exposed to the network outside the cluster. The http service uses the LoadBalancer type, which in a vanilla Kubernetes cluster would pick an available public interface and load balance connections for you. When running on Container Engine, Kubernetes creates a Google Load Balancer instance and configures it to load balance across the pods for you! These are simple layer 4 load balancers, so don't expect anything too fancy, especially if clients are pipelining connections.
Newer releases of Kubernetes have Ingress resources, which sit in front of Service resources and can provide some layer 7 load balancing functionality. I haven't experimented much with Ingress resources yet, as my app doesn't really need anything like that, but it'll become attractive to me once Ingress supports SSL termination and IPv6.
Launching the app is now as simple as passing those two files to the kubectl create command. After a minute or two, all of the container layers will be downloaded and started on the cluster nodes and a load balancer will be created with an external IP address. You can follow along with kubectl get events -w and use kubectl describe service http
A note about selectors: Selectors are how services know which pods to send traffic to. In my case, every pod with the app=steel selector will run both memcache and uwsgi, so that's the only selector I need. If I were to split memcache into a separate pod, I'd probably change it's selector to app=memcache and update the service selector accordingly.
At this point, I spent a fair amount of time just playing with kubectl. There are a lot of interesting things you can do, like rolling updates to replace containers with a new version, autoscaling to monitor pod CPU usage and launch new replicas as needed, and adding and removing cluster nodes to see how the cluster responds to the underlying instances disappearing. It's a lot of fun to watch!
So, now that my app is running on Container Engine, I have effectively the same functionality as I did on App Engine. The performance different is negligible (about 2ms faster than App Engine), and now instead of falling under the App Engine free tier, I'm paying about $15 a month to run a single g1-small Kubernetes node. This isn't really better in any way except that I'm no longer restricted to the App Engine APIs and I can scale up very quickly. As I said at the beginning, this was mostly a learning experience, but overall it feels like a more flexible way of running an app. I could easily see porting several of my other projects to Kubernetes and running them all on the same cluster, or bringing in some other services that have been developed on Kubernetes, like Vitess to provide additional services. At some point I plan to try deploying the same app on Kubernetes on AWS and see how the experience differs. I filed a feedback report on the issues I encountered with service auth, and hopefully I'll be able to get that working soon. If not, I need to move my service account credentials into a Secret resource, which is a nice abstraction Kubernetes provides for distributing secrets to a ramdisk that is mounted within each container's namespace.