Hosting Django Apps in Kubernetes [WIP]

Situation

At ungleich we are hosting quite a lot of Django applications. As of 2022-06-03, most of them are still deployed on a traditional VM based setup.

We are using this blog entry to document a possible blueprint and the progress of migration at ungleich.

General design

Our Kubernetes clusters usually use ArgoCD for deployments, so Django applications should potentially be defined the same way.

Most of our kubernetes applications are defined in helm charts and thus the "general django application" should probably also be defined in a helm chart.

Freedom of choice

While as a hoster it might be tempting to define a specific image that Django applications should be using (like python), but we want to give us and our customers the freedom to choose the image they use themselves. It might potentially even come from a private registry.

Interface definition

All of our Django applications are using Postgresql for storing data. Postgresql is used by quite some other applications that we deployed in k8s, so this is a no-brainer. Django hosting at ungleich, even in k8s, will be based on Postgresql.

Static data of Django applications can easily be stored on a PVC. This has the drawback that filesystem PVCs based on ceph block devices are usually RWO and thus in case of restart, there will be a short downtime.

This is, generally speaking probably accepted, like a deploy would have caused a short downtime on a VM as well.

However alternatives would be a shared filesystem (such as NFS/CephFS), but they are usually slower than dedicated block devices - so reliability can be traded against speed. Maybe we offer both options or add an NFS server as an option to our Django Hosting.

Django startup / processes

On startup, Django will need to ensure the database schema has been upgraded to the latest version. so something like python manage.py migrate should probably be called in an InitContainer for most apps. We could specify that the customer provided container supports multiple commands:

/init - anything that needs to be done once on startup
/run - something that runs the actual site

Some django apps however utilise Celery or Django Q for async tasks. We don't know which system is used, but it would be easy to add a flag to the hosting whether or not a third container should be utilised. Thus we could define:

If async container is defined, enable it and run /async (or similar)

Secrets

Django applications usually have some kind of secrets and most Django applications have DIFFERENT types of secrets. Thus defining a specific environment variable does not seem to be a smart idea.

Instead, we should probably offer to store secrets in something like SealedSecrets.

Database connection information is provided by default and is cluster/app specific.

Deployment

We are likely going with a git-ops style deployment in which everything is defined for the Django app. This repository is read write for each client/customer.

This probably includes:

(Possible encrypted) Secrets
Image definitions
Maybe even (part of) a pod definition?

Some parameters are likely to be stored in a different, ungleich only writable repository, such as:

Size of RAM
Number of CPUs
Storage size (Postgresql, Static files, ...)

Status

This document is still WIP and will be used as a basis for deploying our own Django apps first.