Migrating a Single Instance Application to Kubernetes

Migrating a Single Instance Application to Kubernetes

In this blog post we will dive into the migration of a single instance application to Kubernetes. As I had written in the previous post, we already split most of our monolith backend into microservices. We also had adopted the mono repository pattern (which later becomes important when setting up CI/CD) and started to use RabbitMQ as eventbus. But all of this was still running only on a single machine, with only one instance. It was time to scale.

Kubernetes is a great tool for this, really. Although it is complicated and surely it was a bit of an overkill for ImperialPlugins, I wanted to explore new technologies and at the end it worked out great. It also really made deployments much easier and faster.

In this part I will talk about how we migrated our whole infrastructure from a single instance monolith to a Kubernetes based microservice architecture.

Let’s dive in!

Setting up the cluster

Setting up a Kubernetes cluster with cloud providers is really expensive. Seeing that the monthly price was about the factor 10-15 higher for AWS and Azure compared to bare-metal servers with the same specs, we decided to use our own VPS servers for a bare-metal cluster instead. Based on ImperialPlugins’ size it wouldn’t make much sense to spend that much money on cloud infrastructure.

It turned out that Kubernetes is really strongly focused on deploying to cloud providers. However, there are also great open source implementations available for various cloud oriented functionalities. We will talk about how create the cluster and how to set up these open source service replacements.

The first thing you want to do is to get the cluster up. We already had one VPS server so we bought another one. Each of the 2 servers has 8 cores and 32 GB of RAM, and they cost about $30 per month in total. Yes, they do not have dedicated CPU cores or dedicated RAM but that does not matter for our use case, beside that this is not the case on cloud machines either (unless you are ready to pay a lot of money).

RKE

First I created a new kubeuser account with sudo group, created a private key for login and disabled password logins on all servers. This user will be used to create and manage the cluster with RKE.

RKE stands for Rancher Kubernetes Engine, it is a tool from Rancher to easily set up and manage Kubernetes clusters. You can download the latest release from here. You will have to install it on your local machine, not on the server.

Once you have downloaded RKE, you need to create a cluster.yml to use it. The cluster.yml is used to describe the nodes, services and some other stuff. Here is an example one:

cluster_name: rancher

nodes:
  - address: XXXXXX
    ssh_key_path: ./keys/kubeuser_id_rsa
    user: kubeuser
    role: [controlplane,worker,etcd]
  - address: XXXXXX
    ssh_key_path: ./keys/kubeuser_id_rsa
    user: kubeuser
    role: [worker]
  - address: XXXXXX
    ssh_key_path: ./keys/kubeuser_id_rsa
    user: kubeuser
    role: [worker]
services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h
network:
  plugin: weave
ingress:
  provider: nginx

If you want to use istio with port 80 and 443 (which is likely the case if you want to deploy http(s) applications) you should change the default nginx ingress port, otherwise they will conflict with each other.

ingress:
  provider: nginx
  extra_args:
    http-port: 8080
    https-port: 8443

After setting up the cluster.yml, all you have to do is to run rke.exe up. This will also generate a kube_config_cluster.yml for kubectl. You must run this command every time you want to update your cluster.yml. Do not forget to back up your generated cluster.rkestate file, which contains information and credentials to access your cluster. Without this file, rke can not update your cluster.

MetalLB

As mentioned earlier Kubernetes is really cloud focused. The first place where you would notice is this are services with the LoadBalancer type. In fact, on bare metal clusters, External IPs will always remain as <pending>. This happens because Kubernetes has no built-in LoadBalancer, instead it relies on a cloud provider to provide one for you. Luckily, some folks made MetalLB which allows you to set up your own load balancer.

To install MetalLB, all you have to do is to create a metallb-config.yml and to run the following commands:

$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
$ kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
$ kubectl apply -f metallb-config.yml

Helm

Helm is a package manager for Kubernetes. We will use Helm v3 to install the other services mentioned in this post.

There are many ways to install Helm. The easiest way is by running the installer script:

$ curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

Storage

Another issue on bare-metal clusters is persistence and storage. Out-of-the-box Kubernetes only provides the hostpath storage class. It does not support any kind of replication or dynamic provisioning. Without these it is useless for most use cases.

Lucky, a lot of bare-metal solutions exist:
– longhorn (also has native Rancher integration)
– Portworx (commercial solution)
– OpenEBS
– rook
etc

I decided to use OpenEBS, which in turn uses NFS. It also did not require a separate partition like rook for example does. The installation is also very easy.

First you need to install iscsi on all nodes. For ubuntu, you would do it like this:

$ sudo modprobe iscsi_tcp
$ sudo apt-get update
$ sudo apt-get install open-iscsi
$ sudo systemctl enable iscsid && sudo systemctl start iscsid

After that, you need to add the following to your RKE cluster.yml (again, for Ubuntu):

 services 
   kubelet:
     extra_binds:
      - /var/openebs/local:/var/openebs/local 

Then run rke.exe up again to update your cluster.

Finally, install OpenEBS itself and set it as the default storage class:

$ helm install -n openebs openebs stable/openebs --version 1.10.0
$ kubectl patch storageclass openebs-jiva-default -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

cert-manager

cert-manager is the de facto standard TLS certificate management solution for Kubernetes. It will be needed if you want to use https for your services. To install it, run the these commands:

$ kubectl create namespace cert-manager
$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15-alpha.3/cert-manager.crds.yaml
$ helm repo add jetstack https://charts.jetstack.io`
$ helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v0.15-alpha.3

We are using Cloudflare for managing DNS, which cert-manager supports for DNS challenges. This way we can use Let’s Encrypt with the DNS ACME challenge, which also allows us to use wildcard certificates. Thus, since all of our services are under the imperialplugins.com domain, we only need to manage one certificate.

Rancher

Rancher provides a nice dashboard to manage your Kubernetes cluster:

Installing it is really easy, thanks to helm:

$ kubectl create namespace cattle-system
$ helm install rancher rancher-latest/rancher --namespace cattle-system --set hostname=rancher.example.com --set ingress.tls.source=letsEncrypt --set [email protected]

That’s it! Wait some minutes for rancher to start, then just visit https://rancher.example.com (or https://rancher.example.com:8443 if you changed the default nginx ingress port). For more information, read the rancher docs.

Istio

Istio is used to create service meshes. A service mesh allows you to control the flow of traffic and to secure and monitor your services. Many different solutions exist for this, like Linkerd and Consol. We decided to go with Istio since it also supports complex deployment strategies like A/B tests or Blue/Green deployments.

Thanks to Rancher, installing Istio is really easy. Open the dashboard, go to your cluster, then click on “Tools” on the top right on the nav bar, and select Istio. Configure it to your liking and click enable on the bottom. That’s it!

I recommend you to set “gateways.istio-ingressgateway.sds.enabled” to “true” in the Custom Answers section to be able to mount cert-manager certificates in secure gateways. You should also enable the ingress gateway and set it’s service type to LoadBalancer.

Service Topology

One problem which we can face with nodes spread across different regions is huge delays. For example, this can happen if a service on node A tries to connect to the database instance on node B instead of the one on the same node.

Kubernetes 1.17 has added support for service topologyKeys. You can read more about it on roc’s blog post, who has co-authored the feature, and on the Kubernetes docs. This feature – which was just added recently – made it very easy to solve this problem.

All of our services except for the frontend and api gateways use the following topology keys:

   topologyKeys:
    - "kubernetes.io/hostname"
    - "*"

This way, service A will first try to connect to an instance of service B on the same node, and if it does not find one, it will try to use any other instance of it instead. You can also configure more complex scenarios, like prioritizing same node, same datacenter, same region and only then any other instance.

If we combine this with pod anti-affinity, we can schedule service instances preferrably to different nodes, and get a cluster architecture like this:

ImperialPlugins Cluster Architecture

Database

Before getting into CI/CD and deploying our application, the last thing left to add was the database. We are using PostgreSQL, so I had to deploy a PostgreSQL cluster. I decided to do it with Kubernetes.

I am no database expert, neither do we have one in our small 3 developers team. Deploying a PostgreSQL cluster on Kubernetes is also maybe not the best idea. But there is one place where Kubernetes really shines in such scenarios: operators.

Kubernetes operators allow you to manage and orchestrate service deployments automatically and replace humans who would do this. This also applies to databases.

Initially I tried to create a Postgres cluster with the Zalando postgres-operator. Sadly after days of trying it, it was just not possible to get it up. The reason for that was that during my setup process a fatal bug existed in the operator which prevented it’s deployment.

Deeply frustated by this bug, I searched for an alternative and came across the CrunchyData postgres operator, an equal or even better operator. CrunchyData is a consultant for Enterprise PostgreSQL, so the surely do know what they are doing.

Here are some features their operator can handle for you:

  • High availability
  • Disaster recovery
  • Failover
  • Monitoring
  • Incremental and full backups
  • Easy scaling
  • Node affinity and pod anti-affinity

Installing the CrunchyData Postgres Operator

$ kubectl create namespace pgo
$ curl https://raw.githubusercontent.com/CrunchyData/postgres-operator/v4.3.0/installers/kubectl/postgres-operator.yml > postgres-operator.yml

Database Storage

Fast databases need fast storage. OpenEBS is not fast enough for this and we also do not need replication for the database volumes. What could be faster than directly using the host storage? So let’s use it instead.

As mentioned earlier, the built-in hostpath does not support dynamic provisioning, which is a problem. So instead of using it, I found a better alternative, once again made by Rancher.

The local-path provisioner fixes the problem of dynamic provisioning. It does not support volume capacity limits, but this is not a requirement in our case. I decided to use it as the storage solution for the postgres pods.

So let’s install the local-path provisioner first:

$ kubectl apply -f https://github.com/rancher/local-path-provisioner

After that, we have to edit the postgres-operator.yml accordingly. Replace STORAGE_1 with these:

 name: STORAGE1_NAME
 value: "hostpathstorage"
 name: STORAGE1_ACCESS_MODE
 value: "ReadWriteOnce"
 name: STORAGE1_SIZE
 value: "10G" # change to your needs
 name: STORAGE1_TYPE
 value: "dynamic"
 name: STORAGE1_CLASS
 value: "local-path" 

Also, do not forget to change the default admin password. After that, deploy the operator:

$ kubectl apply -f postgres-operator.yml

Installing pgo

Pgo is the client to interact with the postgres operator. You can install it on your local machine or on a server. I installed it on the master server, since I use it for kubectl too.

$ curl https://raw.githubusercontent.com/CrunchyData/postgres-operator/v4.3.0/installers/kubectl/client-setup.sh > client-setup.sh
$ chmod +x client-setup.sh
$ ./client-setup.sh

After that you will have to port forward the operator:

$ kubectl -n pgo port-forward svc/postgres-operator 8449:8443 &

Creating the Postgres cluster

When setting up the postgres cluster, I enabled the pgbouncer feature and set replication mode to sync. We already faced connection exhaustion issues in the past, so the easy pgbouncer installation for connection pooling becomes really handy. I decided to use synchronous replication because we are running an eCommerce marketplace, where any transaction loss must be avoided. We can not risk loosing orders, products, wallet transactions or other important information. I also used the default pod anti-affinity (preferred), which tries to avoid having multiple postgres instances on the same node.

$ pgo create cluster -n yournamespace your-cluster --service-type=LoadBalancer --sync-replication --pgbouncer --replica-count 2

You can get the postgres user password like this:

$ echo "$(kubectl get secret -n yournamespace your-cluster-postgres-secret -o jsonpath="{.data.password}" | base64 --decode)"

Continuous Integration (CI) and Continuous Delivery (CD)

While searching for a cloud-native CI/CD solution, I tested various products. Here I will share my experience with them. One of our hard requirements was that there must be a free edition available for on-premise installations. Hence, I did not look into solutions like GitHub actions, TravisCI, CircleCI, etc. I also did not look into GitLab CI/CD since we want to keep our code on GitHub.

Registry

Before getting started, we must deploy a private Docker registry. Setting up a CI/CD system does not make much sense if you can not store your artifacts (in our case, the Docker images) somewhere.

Create your docker-registry-values.yml and then deploy your own registry with Helm:

$ kubectl create namespace docker-registry
$ helm install docker-registry stable/docker-registry -n docker-registry -f docker-registry-values.yml

Mono repository

Since we are applying the mono repository pattern (see Part I for more), one fundamtenal requirement is that the CI/CD system must also support it. It must allow different projects to share the same repository and it must not trigger project builds if the commit is unrelated to them.

Our project structure looks like this:

/
|--- .git/
|--- jenkins
     |--- (some shared jenkins groovy files)
|--- service1/
     |--- src/
     |--- k8s/
          |--- service1-service.yml
          |--- service1-deployment.yml
     |--- Dockerfile
     |--- Jenkinsfile
     |--- ...
|--- service2/ 
     |--- src/
     |--- k8s/
          |--- service2-service.yml
          |--- service2-deployment.yml
          |--- service2-somethingelse.yml
     |--- Dockerfile
     |--- Jenkinsfile       
     |--- ... 
|--- ...  

TeamCity

Before migrating to Kubernetes we were using Teamcity. This process was more like just CI. TeamCity automatically built the Docker images but did not deploy them. Since we were not using any container orchestrator but docker-compose instead, we had to deploy the updated images manually.

I am sure TeamCity somehow supports deploying to Kubernetes, it also supports the mono repository pattern in a built-in way but setting up many projects with different branch configurations became complicated. Also TeamCity limits the agent count to 3 on the free version. It was time to find an alternative, ideally an open source and cloud-native one with no restrictions.

Jenkins X

I really wanted to try out Jenkins X, but failed to set it up with GitHub after it asked me to authorize my account. It simply is not possible on a headless server to get past that step. I would have to create a desktop linux virtual machine, install Jenkins X and then copy the config from there, which I really didn’t want to do. It also seemed very complicated, but maybe that’s just because I couldn’t look deeper into it. It is also very opionated to GitHub repositories.

Drone CI

Unlike Jenkins X, installing Drone CI was really straightforward and I even managed to set up a project with it. Sadly that’s all. Drone CI has no support for mono repositories and that’s something crucial for us. It is flawed in the way that it is fundamentally based on the idea that one repository corresponds to exactly one project. Like Jenkins X, it is also very opionated to GitHub repositories.

We needed support for nested build configs and a way to prevent building unrelated projects on a commit. There was a no built-in way of doing this and the third-party extension for this did not work out well. There was also no way of having different build numbers per same repository project.

GoCD

GoCD was really promising. I really wanted it to work, but it sadly did not even start after deployment, due a fatal bug that is still not fixed at the time of writing. I don’t know if GoCD would have worked with the mono repository pattern.

Jenkins

Jenkins was the final CI/CD solution I tried. I really wanted to avoid it, because it was not cloud-native and because I remembered it as clunky old application (as you can guess, last time I used Jenkins was approx. 5-6 years ago). Much to my surprise, Jenkins really changed a lot. With the Blue Ocean extension, Jenkins finally got a modern React based UI. It also got groovy based declarative and scripted pipelines now. These were just perfect for the complex CI/CD requirements we have!

If Jenkins is known for anything, it’s for the sheer amount of it’s plugins. It almost has an integration for everything. Here are some plugins we are using:

Docker-in-Docker Agents

The proposed Jenkins build pipeline was easy: Build Docker images, push them to registry, deploy to Kubernetes. Sounds easy, right? Well…

One problem is that Jenkins and the agents themselves run in Docker containers. This is problematic, because running Docker-in-Docker containers requires a special setup.

The first requirement is, that the agent must have the Docker client preinstalled. To my disappointment, there wasn’t any up-to-date agent image that do this. So I had to create my own one, which is based on the official inbount agent.

The other requirements are that the container must run in privileged mode and that it must mount /var/run/docker.sock from the host. To do both of these go to Manage Jenkins -> Manage Nodes and Clouds -> Configure Clouds -> Kubernetes -> Pod Templates -> default -> Pod Template Details. Keep in mind that this means anyone who can execute code in the build pipeline can also take control over the host machine. Since we are only building with our own trusted code, this is not a problem to us.

Jenkinsfile

At first I tried to set up the build pipelines using declarative pipelines. To my disappointment, I had to create the same Jenkinsfile over and over again. Being a big fan of the “Don’t Repeat Yourself”-principle, I wanted to have a shared pipeline instead. It turned out that declarative pipelines can not have shared pipelines or stages. So I tried my luck again with scripted pipelines, which worked out great.

I made a shared pipeline called “dockerBuild” which is responsbile for building and deploying Dockerfiles of updated services. This tremendously reduced the effort required to set up new projects for Jenkins.

All it takes to set up a new project for Jenkins is a Jenkinsfile like this:

There are more configuration options to the dockerBuild pipeline like branch specific settings. By default, all branches get built but only staging and master branches get pushed to the Docker registry and get deployed to Kubernetes. This customizable workflow allows to add new deployment environments with just 2-3 lines of code, just by specifying the branch configuration in the Jenkinsfile.

GitOps

Like mentioned earlier, we are using the Jenkins Kubernetes Continuous Delivery plugin made by Microsoft. Using this plugin we apply the GitOps pattern. The dockerBuild pipeline will apply any yaml inside each updated service’s k8s folder (see the folder structure) and deploy the updated service this way.

Sadly the plugin does not support Custom Resource Definition‘s yet, so I was not able to set it up for the deployment of Istio virtual services. This wasn’t a big problem, since virtual services rarely get updated. However it also does not support newer Kubernetes features such as the service topologyKeys mentioned earlier, so I had to patch these manually.

Versioning

Since we’re applying the GitOps pattern, we can’t really talk about traditional versions. However, we still need to tag our images in a distinguishable way for deployment and rollback reasons, so the dockerBuild pipeline just tags them with their build number. For example, a image could be tagged and deployed like this: imperialplugins/frontend:v_323. Maybe I will integrate this with git tags in the future.

Pipeline

The full build and deployment pipeline looks like this:

Jenkins Blue Ocean overfiew of the frontend pipeline
  • Stage 1: Build Docker image from Dockerfile in the service’s root directory
    • Send to Discord that that a new build has started, show changelog
    • Build environment, build commands, tests etc. are handled by the Dockerfile, which is agnostic of the CI/CD system used
  • Stage 2: Push Docker image to registry
    • Check if branch should push to registry
    • Push to our own registry using credentials from Jenkins
  • Stage 3: Deploy to Kubernetes
    • Check if branch should deploy to Kubernetes
    • Using configuration substitution, replace the container image in the deployment yamls with the freshly pushed image
    • Apply the yaml configurations
    • Send to Discord that the build has been succesfully deployed

As you can see we have build status notifications on Discord. We will get notifications about started builds, deployed builds and failed builds. This is possible thanks to the Discord Notifier plugin, which provides related pipeline functions.

This is how our build feed looks like on Discord:

One thing that was really interesting to see was when I had to trigger a full rebuild of all projects. With TeamCity – due to the limited 3 build agents – this would have taken something between 45 to 60 minutes. With Jenkins and the Kubernetes plugin, which instantly spun up as many agents as scheduled builds, everything was built and deployed in just 12 minutes!

Conclusion

Kubernetes is complicated and has a steep learning curve. It took me 2 months to get to this point, where I have migrated our staging environment to Kubernetes. While migrating, I came across 2 fatal bugs (Zalando Postgres-Operator and GoCD) and some other minor bugs (such as missing service topologyKeys support in the Jenkins Continuous Delivery plugin). I have noticed that a lot of software is still not Kubernetes ready or does not support bare-metal clusters.

Regardless of all the complications, I do not regret any time I have invested into this. It was fun and I have learnt and explored many new technologies and concepts. Migrating to Kubernetes also made deploying new services much easier. As you can see, deploying something like Jenkins only needs one line now – thanks to helm. Deploying a multinode Postgres Cluster – also only one line. This easiness also applies to building and deploying our own services: all we do is commit code, and Jenkins will automatically build and deploy it to the correct environment.

Next up

In this article, I only focused on the infrastracture. Code-wise, not much has changed. If you remember from the previous blog post, some things are still left:

  • Removing shared access to tables between services (a private HTTP API with an API gateway should be used instead)
  • Database migrations for microservices
  • ELK / seq integration for log management
  • Monitoring with Prometheus
  • Unit Tests (yes, ImperialPlugins still does not have them)

 

No Comments

Add your comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.