From a basic to an advanced perspective
The leading-platform for containers orchestration
This article assumes that you have some experience with bare-metal, virtualization terminology and microservices.
We will start from zero knowledge about the container-world, describing the beginning of the era to understand what drove the need for containers, the first solutions to manage container-centric infrastructure, to finally talk about Kubernetes and advanced usage like Self-Healing, Autoscale and Limits.
A new tech era
First, let’s talk about terminology. One of the first quirks in containers and containers orchestration you will face is the new vocabulary. It can be tedious to learn even before really beginning to dig into the tech part.
To understand all the spectrum, we will begin by a quick history recap.
Historically, large architectures with High Availability were dispatched across bare-metal machines, running directly “on the metal”, or virtually through Virtual Machines running on hypervisor, themselves running on on-premise infrastructure or directly into the public Cloud. But in an era where microservices and Continuous Deployment are the way to go, containers technology appeared.
At first they were used mainly by developers directly on their computers, to test their code in a full-environment without the inconvenience of running multiple local Virtual Machines. Then developers started to push their containers on the servers used in production as a way to have exactly the same things that run in their computer and on the targeted production environment.
Containers are a restricted area where developers put all third-party technologies required by their app. Thereby, it removes the need for developers to ask for SRE assistance on any possible gap between development and production environments. Containers will run the same locally as it would on a remote server. To build and run containers, one of the leading-tech is Docker.
To do so, the developer establishes the list of the third-party components he needs, puts them in a Dockerfile using a particular syntax, tells Docker to build it as an image, which will then be launched by Docker to start… a container.
The developer can then publish this image on a private or public registry (think of a registry as an AppStore for Docker built images), and just tell his favorite SRE that he needs to have this specific image launched. That’s it. No more need for the developer to specify what version of PHP, Python, Ruby, Apache, Nginx or whatever third-party technology the application needs to run. Therefore, no more need for the SRE to prepare and configure them.
It does sound great at the beginning, as it reduces the time spent in back and forth between the developers and the SRE team for debugging and expressing needs, but it comes with limitations:
- Deploying small architectures by pushing a few containers is indeed fast and convenient, but when it comes to bigger platforms, the SRE skillset is still greatly needed to ensure proper design and High Availability;
- Since all the required dependencies to run the application are now self-constricted in the container, the developer becomes responsible of the security of the platform, and thus, he needs to upgrade all these pieces together regularly to maintain the security of his app, as he cannot benefits from traditional OS upgrades and OS software management;
- Finally, as new containers continue to be pushed, the platform can become highly complex and sometimes end up a real mess, even more so if the SRE was completely out of the loop.
Some technologies were born to help this kind of architecture to fit with Docker approach, like Docker-Compose (which allows to describe a full Docker architecture in YAML), Docker Swarm by the company behind Docker itself or… Google internal, closed-source attempt, called Borg. Borg was the initial internal project that led Google to Kubernetes and they opensourced it in 2015, with the first 1.0 version.
To better understand the need for orchestration, let’s consider that you recently started to use Docker.
You’re pushing new containers regularly and as a result your architecture is becoming more and more complex. You now have more than 20 containers deployed through Docker-Compose and things were fine until then. But now your product needs more and more maintenance, better scaling (both vertical and horizontal), high(er) availability and better distribution.
At this point, you need a new solution to help you with those tasks: an orchestration tool. Kubernetes is the most well-known and used solution on the market, and even though the learning curve to be able to use it to its full capacity is quite shallow, you don’t need to know all the Kubernetes documentation to begin with!
One of the hardest parts is actually to properly build and manage the Kubernetes cluster itself, but luckily, you can rely on Public Cloud providers or even MSP like Iguana Solutions to take care of that for you.
It means that today you can focus on learning how to use Kubernetes to deploy your application on it and improve your workflow and services.
Kubernetes actually allows to modernize your containerized services with:
- Infrastructure-as-code: contrary to Docker-Compose, you are not simply describing the deployment you want, but also real parts of infrastructure like Network, load-balancing, exported storage, high availability, auto-healing, way of upgrading, scaling, etc.;
- High-availability: you can now deploy a dedicated LoadBalancer for your app without needing to configure yourself (or by a SRE) a static HAProxy or something similar. Kubernetes also allows you to choose on which node(s) you are going to deploy some or all of your containers and even auto-migrate them if something bad happens to a part of the underlying infrastructure;
- Scaling: K8S (short for Kubernetes) allows you to scale both vertically (multiple instances of the same microservice running in parallel in the same Kubernetes’ worker) and horizontally (multiple instances of the same microservice running in parallel across multiple Kubernetes’ worker) and automatically following your specifications, your application architecture, or your underlying infrastructure;
- Self-healing: you can specify what to do in case your app encountered a specific problem, reducing the need of “incident procedures” to be handle by a human, but instead having a platform that can literally heal itself in some cases;
- Automated-rollout: you can progressively deploy your new code without catastrophic updates. Kubernetes can ensure, following your testing rules, the speed at which you deploy your app in the whole architecture scale, and even plan automating rolling-back if something weird happens, as Kubernetes can manage multiple versions of a running app;
- Abstracted-storage: for your stateful apps, you can let Kubernetes manage NFS, EBS, Ceph, Flocker, iSCSI, and simply have a persistent volume;
- and more…
If you already work with containers on your own machine during the development stage, or if you already have a production platform running Docker, maybe you already have the first thing that we need to run a container: a Docker image, built from a Dockerfile.
In case you don’t, a Dockerfile is just a plain-text file containing Docker and shell commands. For example, here is the Dockerfile of a container image dedicated to contain an Apache HTTP service:
# A basic apache server. To use either add or bind mount content under /var/www
MAINTAINER Jonathan Marsaud: 1.0RUN apt update && apt install -y apache2 && apt clean
ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]EXPOSE 443
Actually, you will also maybe want some Apache modules like a PHP interpreter, but you can easily guess how you could add it: RUN directive in Dockerfile lets you run shell commands, tied to the OS from which you start. Here it’s Debian latest Stable, Debian 10 (Buster); but it can also be another GNU/Linux distribution, *BSD, or even Windows (with partial support for these two).
The FROM directive permits to derive your image from another minimal one, here we’re using the official one from the Debian project.
ENV allows you to define some environment variables to easily pass some parameters to your process without having to touch the configuration file (like apache2.conf for Apache).
CMD is just the path to the binary, followed by the argument list, specifying how to launch it.
EXPOSE is normally used to make your container available from outside on a specific port. Here, it’s an HTTP daemon, so the HTTPd usually listens on TCP-80 (HTTP) and TCP-443 (HTTPS). But running a process inside a container even if it’s listening on a port doesn’t permit by default to access to it from outside of the container, you will also need to expose its port (imagine that Apache is listening by default on 80 and 443, but you only expose 443 as this example, your 80 port will be unreachable from outside the container, even if Apache is listening for traffic for it). One of the reasons this could make sense for exemple, except for bad/default configuration, is if you need to reach your Apache on 80 (clear traffic) from inside, but only want to expose the HTTPS listener for your API.
You could think of the act of exposing a port as being a sort of firewall, and you would not be far from the truth. (Docker basically manipulates iptables on Linux to do some NAT/PAT rules to let your traffic reach the container properly).
In the Kubernetes “world”, the atomic unit inside a cluster is not a container by itself, but a Pod. A Pod is a logical group of containers acting together, you can for example imagine that our previous Apache container is tied to another one like PHP-FPM. In a Pod, the equivalent of the Dockerfile is a YAML manifest (in fact, all Kubernetes resources, that we will see in the next part, are described in YAML syntax).
In a Kubernetes cluster infrastructure, you have two components: the control-plane, sometimes referred as “controllers”, which host the core-processes of Kubernetes, and the workers, sometimes referred as simply “nodes”, which are running your Pods.
- name: shared-data
- name: apache-container
- name: shared-data
mountPath: /data- name: mysql-container
- name: shared-data
mountPath: /data- name: phpfpm-container
- name: shared-data
At its basic form, a Kubernetes manifest is readable without needing any experience. It’s a standard YAML syntax (indentation-style based) with some keywords. In this example, these containers have something in common: the storage. We’ll see in detail in the next part how the storage is managed through Kubernetes.
We have not yet seen the real benefits of Kubernetes. So far we just have a group of containers, a Pod, but nothing “intelligent” to do with it.
But now, we are going to control them.
In fact, the previously seen manifest is just an example of what a Pod is, but is rarely used directly in this way.
Ideally, you would use a Controller like Deployment/ReplicaSet resources to deploy and manage them.
Here is a partial example (note that we removed the volumes for now, we will see later why) for our previous Pod:
- name: apache
- containerPort: 80containers:
- name: homemade-phpfpm
- containerPort: 9000containers:
- name: mysql
- containerPort: 3306
And here is a quick explanation of the different fields:
- .kind: indicates the type of Kubernetes’s resources, here we are using a Deployment, the base Controller for your Pods.
- .metadata.name: the name of the Deployment, here web-deployment.
- .spec.replicas: the amount of Pods replicas that will be deployed and handled by this Deployment.
- .spec.selector: defines how the Deployment finds which Pods to manage (deployed by itself, or if you begin to manage your Pods with this Deployment afterward).
- .spec.template.metadata.label: the app-label associated with created Pods.
- .spec.template.spec.containers: as our previous Pods manifests, here lay our 3 containers.
By default, all your Pods are ephemeral. If you need a persistent storage when you respawn your Pods, you should consider using PersistentVolume and PersistentVolumeClaim, two other resources of Kubernetes.
- A PersistentVolume (PV) is a resource provisioned by your Kubernetes’ cluster administrator, ready to be used, or a dynamically created one, using StorageClasses (drivers that can talk to a storage API to automatically create new one).
- A PersistentVolumeClaim (PVC) is a user’s request to use this PV in your Deployment/Pods configuration. When an user claims for PV, if no specific PV is indicated, Kubernetes will automatically choose a PV to satisfy your request. If dynamic provisioning is enabled, Kubernetes will try to create it according to access given to it on your cloud provider.
In this case, you will have an additional YAML manifest of kind: “PersistentVolumeClaim” describing the specific size of the volume you want and what type of access you want (Read, Write, only for one Node, or maybe multiple one).
With these informations, the Kubernetes scheduler will choose accordingly the concording PV.
→ Here is an example of a PV using NFS remote-storage. NFS is one of the rare StorageClass that supports ReadWriteMany option, you can find a list of what type of access-mode is supported by your used StorageClass here.
This is for example purpose only as usually, your Kubernetes’ provider or administrator will have one pre-provisioned for you.
→ This YAML manifest declares that you claim a 10GiB volume. The last line storageClassName can be omitted: it specifies what kind of PV we want, corresponding to the “storageClassName” declared in the previous PV YAML manifest, but you can let Kubernetes choose.
Now that we have claimed our storage, we can use it as a Volume in our Deployment (remember that we remove it temporarily in the previous Deployment example).
You can add volumes and volumeMounts like this to each of your container:
- name: apache
- mountPath: "/var/www/html"
- name: data
We now have all our containers images and YAML manifests ready for a simple app in Kubernetes. But actually, nothing is accessible outside of the Kubernetes cluster for now. We will need to expose this app through Service.
We will see in the next part that each of our deployed Pods through our Deployment controller have a PodIP. This IP is a virtual one, only routed locally, and thus only accessible in two manners:
- The main usage, to have traffic inside the whole cluster, between your Pods;
- Through a special user command, for internal-use only, that we will see in the Run part of this article (kubectl proxy).
To expose a service outside, there are several ways. The main one is through a Service, and it proposes several options.
One of the basic one, is through a NodePort. A NodePort allows you to access your Service through any of the Kubernetes’s workers of your cluster on the given port. So, for example, in a Kubernetes cluster with 3 workers, if you have a Pod “my-happ” running on worker #1 and #2, and you exposed it through the NodePort 3188, it will be available on <worker-1-ip>:3188, <worker-2-ip>:3188 but also <worker-3-ip>:3188 even if worker #3 does not run your Pod!
But it’s not real High Availability as you have “x” endpoints if you have “x” workers, and if you only use one of the three endpoint in your code, and the node handling it is out, it will not be able to contact your Pod even if it’s always running in another worker.
So for this, you can deploy a LoadBalancer type Service.
This kind of Service automatically asks your Cloud Provider to create a dynamic LoadBalancer (like Elastic Load-Balancer on AWS) according to your app information in the Service resource.
Be aware that this type of LoadBalancer Service declaration is tied to the ability of your Kubernetes cluster to talk to your CloudProvider, like valid IAM Policy or compatibility.
Here is an example YAML manifests for our Service tied to our Deployment/Pods:
- port: 80
- port: 443
- .spec.selector: will match your app-name described in Deployment/Pods.
- .spec.ports: port will be the exposed port by the LoadBalancer, targetPort the internal port of your Pods.
By default, your Kubernetes controllers will try to auto-restart all containers in your Pods when they exit unexpectedly. By default, after five tries, it will start displaying a CrashLoopBackOff error, which means that it cannot be restarted properly and insta-crash again.
Self-healing in Kubernetes can be configured in such a way that it corresponds to your app behaviour, and thus, be more clever than a simple “Pod restart”. Self-healing have three parts:
- Startup probes: control what Kubernetes must wait for before marking your containers as normally started;
- Liveness probes: control when Kubernetes should restart one of your containers (like an application marked as Running properly by the previous startup probes, but unable to make any progress then, like a deadlock);
- Readiness probes: control when Kubernetes add or remove your entire Pods from a Service (if one of the containers of your Pod is marked invalid by the readiness probe, it’s all your Pod that is removed from the Service).
You can for example add to our previous Deployment the following, in the .spec.containers hierarchy:
With this startupProbe, the application will have a maximum of 5 minutes (30 * 10 = 300s) to finish its startup. After the startupProbe, the livenessProbe is executed, with different parameters (faster checking, with one test every 10 seconds, and marking liveness as invalid at the first failure).
For readinessProbe, sometime you will maybe want your app to not handle new requests but maybe finish ones that have already been proceeded. So the Pod will not be killed, but if the readiness check failed, the Pod will be out of the Service handling traffic to other Pods replicas.
With Kubernetes, you can adjust the number of Pods running in your cluster, according to CPU usage, or some other metrics (beta features for these last).
Here, we are selecting our apps web via the .spec.scaleTargetRef.kind and .spec.scaleTargetRef.name. Then, we give the amount of CPU usage that triggers auto-scaling up, or down.
This kind of resource is called HorizontalPodAutoscaler (horizontal means that it can scale across all your nodes, and not only scale vertically, like parallel instance on the same node), actually, it leads the scaling parameter of your previously created Deployment when you can modify the number of Pod replicas manually.
- minReplicas: specify the number of Pods that must be “always running” (even if the threshold is not reached).
- maxReplicas: the number of Pods when to stop autoscaling.
Other metrics can be used to autoscale, but some are Beta features, and need the kube-state-metrics app to be deployed to your Kubernetes cluster.
You can limit the amount of resources that your containers tied to your Pods can use, for example limit CPU and RAM usage.
You can also configure the behaviour of the Kubernetes controller when one of your containers/Pods encounters this Limit:
- An enforcement action, like deny any more resource if it reach the limits;
- A reactive action, like an OOM-killer: the process will be terminated if it exceeds its limits.
One more concept, that is generally used in conjunction with Limits, is Requests. Requests lets the Kubernetes controller know on which nodes it can place your Pod. A container with just a Request and no Limit will be authorized to use more if the node has this resource available. You can consider Requests like a “minimum resources to have to run properly”.
You can add this part to your Deployment/Pods to add Requests and Limits in your .spec.containers.name field:
- CPU resources are expressed in vCPU or hyperthreaded CPU. 0.25 means one quarter of a vCPU/hyperthread, and 0.5 a half. These units are sometimes expressed in “one hundred milliCPU”, like 250m. 250m is equivalent to 0.25.
- RAM resources are expressed in standard units (power-of-ten, like K/M/G/T, or power-of-two, Ki/Mi/Gi/Ti).
We now have all our YAML manifests but in fact… We didn’t apply any of these.
All operations, like getting the status of resources, push manifests, get logs of Pods and information automatically filled by Kubernetes, are done through a Command-Line interface called kubectl.
The official website of Kubernetes has a great interactive training that comes with a sandbox cluster, and let you interact with your cluster for a couple of hours now that you have the basic infrastructure concepts.
We hope that this article drawn your attention to Kubernetes, and lets you have a more concrete approach of what it is, and what it does.
If you’re looking for the right partner to start your Kubernetes journey, contact us.
Written by: Jonathan Marsaud, System Engineer Expert at Iguana Solutions.