How to build an OpenStack alternative: Step 4, adding a database

Posted on 2020-01-14 by ungleich virtualisation team

This time we describe how to store information in a database and why we selected etcd as the primary database.

The previous time we described how to generate MAC addresses, a key element of uncloud.

More Data

We now have a couple of running VMs, we want to remember which VMs are running and also add more information. Who owns a VM? And later also where is the VM running.

Database

We decided to use etcd as our primary database. The main reason for it is that we don't want to add a single point of failure into uncloud and we don't need guarantees provided by standard SQL.

An alternative we still consider is postgresql. While it is not inherently distributed (at all), it also supports storing JSON and has quite a sophisticated messaging system.

Refactoring: phasing in a database

So far we used a couple of python and shell scripts to create the base of uncloud. Now that things become a bit more serious, we needed to refactor our code. Shell and python scripts are cleaned up and become python a proper python module, which we lovely call uncloud.hack.

Python, ETCD and JSON

We decided to use python-etcd3 to access etcd from the python world, as it supports the API version 3.

For the data format we decided to use JSON, as it is easy to read.

Each VM is identified by a random UUID, so we don't need to store a counter for VMs.

Status

At this point uncloud can create VMs and the VMs are registered in etcd as the database. So while we don't have logic yet for (automatic) VM migration, the information about VMs is already stored in a distributed database.

So if one of our hosts vanishes, we can in theory already redeploy the existing VMs.