At the moment Kubernetes is one of the most exciting technologies in the world of DevOps. Recently a lot of hype has formed around it for one simple reason, and this reason is the mighty containers.


Infrastructure

Once upon a time, and it wasn't so long ago, we still ran applications on server hardware. We had a website that spins on a hardware server and a database that stores the website's state. Users visit the site, everything was cool, everything worked.

The website become successful and our company grows. Then the website becomes more and more complex and evolves into a platform with a bunch of services.

Then along came virtual machines (VMs), and we could run multiple operating systems and applications on a single machine. This empowered company to run ten times or more server instances on a single server. But what if we could run even more server programs on a single server if we left behind the bulk of the VM's operating system? That would give us, even bigger cost savings and flexibility. And the containers came along.

Docker

How Docker was born

Docker containers have been the de facto development standard for quite some time now, but interestingly enough, Docker is not a pioneer in the container world.

History

Docker is based on namespaces and cgroups technologies (the first one provides isolation, the second one provides process grouping and resource limitation), so in terms of virtualization, it's not much different from LXC/OpenVZ which historically came first. Same native speed, same isolation methods based on Linux kernel machinery. However, at a higher level, it's a very different story. The highlight of Docker is that it allows you to deploy a fully virtual environment, run an application on it, and easily manage it.

Like a virtual machine, the Docker runs its processes in its own pre-configured operating system. However, all the docker processes run on a physical host server sharing the processors and available memory with all the other processes running in the host system. The approach used by the Docker is in the middle between running everything on the physical server and full virtualization offered by virtual machines. This approach is called containerization.

Docker internals

Starting with Linux containers, the excitement moved to Docker containers, while increasing the need for container orchestration.

Back to our story. At some point, the company has realized that the current server is not sufficient, no matter how big it is (vertical scaling). The history moves to the next phase — the company buys more servers and divides the load between them (horizontal scaling).

The IT team tasks are getting harder and harder. They need to bother harder with problems like updates, patches, monitoring, security, backups, durability, reliability, resiliency... which means pagerduty at 6 am on Sunday...

Cloud!

Cool, now all we have to worry about is application logic. But the complexity of the highly available distributed system didn't go anywhere — it's just moved to the complexity of orchestrating and connecting the infrastructure components. Deployments, service discovery, gateway configurations, monitoring, replication, consistency...

It was a short story as we have come to this day. Now we have a distributed system with a bunch of services inside a bunch of containers in the cloud.

Kubernetes

Don't flatter yourself, even though I've been working with Kubernetes I only know the tip of the iceberg. So here will be not a deep dive, but rather a smooth flight over existing subsystems and concepts.

Kubernetes is a sophisticated mechanism designed to make systems scalable, resilient, and easy to deploy. It allows us to automate container orchestration — starting, scaling, managing containerized applications including cluster management, scheduling, service discovery, monitoring, secrets management, and more.

Kubernetes allows to automate the start and rollback of deployments, Kubernetes manages the resources and can scale the required resources for the applications depending on how much they need in order to avoid waste of resources. So Kubernetes essentially is process automation. Applications in Kubernetes are rolled out and tested without administrators. The developer writes a script and then the cloud magic happens. So, in the ideal world of Kubernetes, all operational support of the software lies on the shoulders of programmers, and administrators make sure that the layer of cloud infrastructure — that is, Kubernetes itself - works steadily. That's why companies go to the clouds to completely remove the routine of their administration and do only development.

In my opinion, one of the coolest features of Kubernetes is that it helps standardize the work with Cloud Service Provider. No matter which CSP we are talking about, working with Kubernetes always looks the same. The developer, in a declarative manner, tells the Kubernetes what he needs, and Kubernetes works with system resources, helping the developer to abstract from the implementation details of the platform.

Originally, Kubernetes was a Google project that took into account the shortcomings of Borg, the ancestor of Kubernetes. Google uses Kubernetes to manage its gigantic infrastructure of millions of containers. At some point, Google gave the Kubernetes to the world, namely the Cloud Native Computing Foundation. So far, Docker has added the Kubernetes as one of the orchestrators — Kubernetes is now part of the Docker Community and Docker Enterprise Edition.

Kubernetes has many names among which are kube or k8s (I like cool acronyms with numbers and will use it further).

The Kubernetes itself is one huge piece of abstraction. These abstractions are mapped to virtual or physical infrastructure. To understand how it works, we must first understand its basic components.

Pod

The pod is the smallest unit that can be launched on a cluster node. It is a group of containers that should work together for some reason. The containers in the pod share port numbers, Linux kernel namespaces, and network stack settings. So when you scale your application within k8s, you should increase the number of pods rather than the number of containers in one particular pod. By default containers inside pods are restarted automatically fixing intermittent problems for you. So even on these basic level k8s keeps your containers running and with a little additional effort, you can tune it to get even more reliability.

Quite often (but not always) there is only one container underneath the pod. But pod abstraction provides additional flexibility when, for example, two containers need to share the same data warehouse, or if there is a connection between them using interprocess communication, or if they are closely connected for some other reason, all this can be implemented by running them in the same pod.

Thus, you can easily draw parallels with existing code primitives and distributed primitives in k8s. Class is a container image, an object is a container instance, a pod is a deployment unit, the composition can be achieved via sidecar pattern, etc.

Another flexibility of pods is that they do not require the use of Docker containers. If necessary, you can also use other application containerization technologies, such as rkt.

Desired state

The desired state is one of the basic concepts of k8s. You can specify the required state of running pods instead of writing how to achieve that state. For example, if a pod stops working for some reason, k8s will recreate the pod based on the specified desired state.

K8s always checks the state of the containers in the cluster, and this is done by the control loops on the so-called Kubernetes Master, which is part of the control plane. We'll talk about it a bit later.

Objects in Kubernetes

An object in k8s is a record of intent — the desired cluster state. And after the object is created, the k8s will constantly check for this object state. In addition, the objects also serve as an additional abstraction layer above the container interface. You can interact with the objects' entities instead of interacting directly with containers.

Almost every k8s object includes two nested object fields that govern the object's configuration: the object spec and the object status.

Pod that we talked about earlier is just one of the k8s objects.

Pods are mortal — they are born and die but in order to be able to communicate with pods and for them to be able to communicate with each other abstraction of the Service has been introduced. Service plays the role of an access point to sets of pods that provide the same functionality as the underlying pods. There are different types of services, ClusterIP, NodePort, LoadBalancer, ExternalName. By default k8s use ClusterIP — it exposes the Service on a cluster-internal IP, so you can only access it using the Kubernetes proxy.

Also for exposing services you can use Ingress object. Ingress is not a Service type, but it acts as an entry point for your cluster. Ingress is a description of how traffic should flow from outside the cluster to your services, it acts as a "smart router" or ALB. It allows you to consolidate routing rules into a single resource, as it can aggregate multiple services under a single IP address.

For example, the web application can have a home page at https://example.com and a shopping cart at https://example.com/cart and an API at https://example.com/api. We could implement all of them in one pod, but in order for all of them to scale independently, we could decouple them into different pods and connect to Ingress.

K8s also has a huge number of controllers, such as ReplicaSet — it checks that a certain number of copies of pods are running, StatefulSet is used with stateful applications and distributed systems, DemonSet is used to copy pods to all nodes in the cluster or only to specified nodes, etc. They all implement a control loop — a non-terminating loop that monitors the state of its subsystems, then makes or requests changes where necessary. Each controller tries to move the current state of the cluster closer to the desired state.

Users expect applications to be available all the time and will deploy new versions of the application several times a day. The Deployment object is an example of how k8s turns the tedious process of manually updating applications into a declarative activity that can be repeated and automated. Without Deployment, we would have to create, update, and delete a bunch of pods manually. The Deployment object allows us to automate the transition from one version of an application to another and represents a layer above the replica sets and actually manages the replica sets and pod objects. This is done without interrupting system operation. In case of an error during that process, it will be able to quickly return to the previous, working version of the application. Also, using Deployment we can scale the applications very easily. We will try this a bit later.

Deployment

Off-the-shelf deployment strategies (rolling deployment and fixed deployment) control the replacement of old containers with new ones while releasing strategies (blue-green and canary) control how the new version becomes available for customer service. The last two release strategies are based on the human decision on the migration and, as a result, are not fully automated and may require human interaction.

Releases

Architecture

K8s Architecture

Like all HA systems, k8s is built using a Master-Slave architecture.

Master Node

Kubernetes Control Plane is a group of processes controlling the state of the cluster. Typically, all these processes are run by a single node in the cluster and this node is also called a Master node. The Master node can also be replicated for redundancy and fault tolerance.

The kubectl command-line tool is an interface to communicate with the master in the cluster through the API.

The services running on the Master Node are called Kubernetes Control Plane (except etcd), and the Master itself is only used for administrative tasks, while the real containers with your services will run on Worker Node.

On each Master Node (there can be more than one for fault tolerance) there are the following basic components that ensure the operation of all system components:

etcd

etcd is a strongly consistent, distributed key-value store used by k8s for configuration management and service discovery. It is something similar to Zookeeper and consul. It stores the current state of the system, and the desired state. If k8s finds differences in etcd between current and desired states, it performs the necessary adjustments.

kube-apiserver

kube-apiserver is the main control endpoint for the cluster. Any commands from kubectl are sent as API requests to it on Master Node. The API server processes all REST requests, validates them, authenticates and authorizes clients, and updates information on etcd. The API server is the only one working with etcd — all other components of the cluster make requests to the API server, and it updates information on etcd.

kube-controller-manager

kube-controller-manager is a daemon that embeds the basic control loops shipped with k8s. It includes such things as Replication Controller, Endpoints Controller, and Namespace Controller which we mentioned earlier. Now Kubernetes doesn't have a single control loop, but many loops all running simultaneously and trying to shift the current system state to the desired state.

kube-scheduler

kube-scheduler schedules tasks on all available nodes in the cluster — determines on which Worker Node to create a new pod, depending on the required resources and node workload.

Worker Node

Worker Node is a virtual or physical machine that has Kubernetes components to launch pods. There are two components running on Worker Nodes:

kubelet

The main k8s component on each cluster node. It checks the kube-apiserver for a description of new pods to be deployed on the given node and handles the Docker (or another containerization system) through its container management API. After making changes to the state of the pod on the node, it passes the status information back to the kube-apiserver (which in turn adds it to etcd) and monitors the containers state.

kube-proxy

kube-proxy is the equivalent of a reverse proxy server, responsible for forwarding and proxying requests to the corresponding services or applications in the private network of the k8s cluster. By default, it uses iptables.


Of course, this is far from all the k8s entities and by no means all details. There are much, much more.

Kubernetes "Lite"

To try out k8s you need to provision a k8s cluster for yourself. There different tools to achieve this. The most popular are Minikube, K3s, Kind(Kubernetes-in-Docker), Kubeadm, MicroK8s, Kops, kubernetes-ansible. Each of these tools achieves its intended purposes with different goals in mind and has its own set of trade-offs.

Minikube is the closest to an official mini distribution for local testing and development, it is run by the same foundation as k8s and can be managed by kubectl. It's fully cross-platform, but it heavily relies on an intermediary VM (which is a significant overhead) and can run on the actual host (only on Linux).

Kubernetes with AWS EC2

Now let's see how k8s works using Minikube.

For that, I launched a t2.medium AWS EC2 machine with Ubuntu Server 18.04 LTS as Minikube required a minimum of 2 vCPUs and 2GB of free memory(t3.micro is also fine).

EC2 inbound

The first thing to do on the newly created compute instance is to download and install Docker Community Edition 17.12.+ and then install minikube and kubectl.

When using Minikube, remember that a local virtual machine is created and a cluster with a single node is launched. Never use it for production deployment — Minikube is used exclusively for testing and development.

When you are ready let's make sure it's all set:

$ minikube version
minikube version: v1.12.1
commit: 5664228288552de9f3a446ea4f51c6f29bbdd0e0-dirty

To start a single-node cluster, just run the minikube start command. By doing that you are launching the virtual machine, the cluster, and k8s itself at the same time.

Check the status of Minikube, it should say 'Running'

$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

If you see the status as ‘Running’ then you can now run kubectl commands.

A simple speed improvement can be made for the local experiments and testing. In a normal workflow, you would have a separate Docker registry on your host machine to push your images there (outside of the minikube). But using the following command we can re-use minikube's built-in Docker daemon for that. More info here.


$ eval $(minikube docker-env)

Hello world on k8s

Let's create our first application using k8s. It's basically just a service that on request responds with information about the client and the server.

$ kubectl create deployment hello-world --image=k8s.gcr.io/echoserver:1.4
deployment.apps/hello-world created

Let's have a look at our pods:

$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
hello-world-67999bd854-rgd26   1/1     Running   0          9m4s

We see that one pod with weird name was created and it's running now. Once created, we can request complete information about the status of the pod using the kubectl describe command:

$ kubectl describe pod hello-world-67999bd854-rgd26
Name:         hello-world-67999bd854-rgd26
Namespace:    default
Priority:     0
Node:         minikube/172.17.0.3
Start Time:   Sun, 02 Aug 2020 01:12:50 +0000
Labels:       app=hello-world
              pod-template-hash=67999bd854
Annotations:  <none>
Status:       Running
IP:           172.18.0.2
IPs:
  IP:           172.18.0.2
Controlled By:  ReplicaSet/hello-world-67999bd854
Containers:
  echoserver:
    Container ID:   docker://7dea4d46d48045e0a190bcaa466de8e7eb55e0945d7eebe174907fd012933aa8
    Image:          k8s.gcr.io/echoserver:1.4
    Image ID:       docker-pullable://k8s.gcr.io/echoserver@sha256:5d99aa1120524c801bc8c1a7077e8f5ec122ba16b6dda1a5d3826057f67b9bcb
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 02 Aug 2020 01:12:58 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-92zzx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-92zzx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-92zzx
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  9m33s (x3 over 9m41s)  default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         9m30s                  default-scheduler  Successfully assigned default/hello-world-67999bd854-rgd26 to minikube
  Normal   Pulling           9m29s                  kubelet, minikube  Pulling image "k8s.gcr.io/echoserver:1.4"
  Normal   Pulled            9m22s                  kubelet, minikube  Successfully pulled image "k8s.gcr.io/echoserver:1.4"
  Normal   Created           9m22s                  kubelet, minikube  Created container echoserver
  Normal   Started           9m22s                  kubelet, minikube  Started container echoserver

As you can see, the pod is running on a node named hello-world-67999bd854-5ttgz and have the internal ip address 172.18.0.2. Keep in mind that this ip address will not be available to other applications inside and outside the Kubernetes cluster. Access to the running Nginx will only be possible from within the pod itself. In addition to the fact that the specified ip address is only available from within the container, it is also not permanent. This means that if this pod is recreated, it can get a different ip address.

To solve these problems, you can use an object called Service. As we mentioned, Service allows you to assign persistent ip addresses to pods, grant them access from external networks, and balance requests between pods.

Creating a Kubernetes Service object that exposes an external ip address with port, so that we can access it:

$ kubectl expose deployment hello-world --type=NodePort --port=8080

[it does not work as the NodePort is exposed inside of a VM that minikube is running k8s in, not the host machine. You can use the new minikube tunnel command which should proxy the ports to interact with localhost.]

Let's check locally that everything's running.

$ curl $(minikube service hello-world --url)

We should get something like that in return:

CLIENT VALUES:
client_address=172.18.0.1
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://172.17.0.3:8080/

SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001

HEADERS RECEIVED:
accept=*/*
host=172.17.0.3:30504
user-agent=curl/7.58.0
BODY:
-no body in request-

You can use the port-forward command in kubectl to connect to the Service and test the connection.

$ kubectl port-forward --address 0.0.0.0 svc/hello-world 8080:8000

Let's clean up our resources so they don't get in the way.

$ kubectl delete services hello-world
$ kubectl delete deployment hello-world

Kubernetes Application

Let's do something more practical. Let's imagine that we have a backend service that needs to be orchestrated on the Kubernetes cluster. It can be complicated as you like, but you can handle it without me.

Let's start by saying that we need the Deployment specification to hand over low-level management to k8s. We need to make the service resilient and highly available (given production-ready cluster).

To reach high availability we will make 5 replicas so if a pod fails (e.g. a node fails or under maintenance), the controller kicks in and starts a new pod elsewhere in the cluster. Also, we should define a specific livenessProbe and readinessProbe for each container. Once per interval the kubelet (the k8s agent on each node) performs the liveness/readiness checks of all the pods running on each node, it sends the results to apiserver (the interface to the k8s brain). By using them your container will be protected from not only unexpected crashes but also add proactive restarts based on the results of those checks.

Set the correct initialDelaySeconds parameter so your application has some time to initialize (it especially affects JVM-based applications such as Spring Boot — they are very slow on launching their HTTP endpoints).

In production systems, to reduce the time required for recovery, it's also a good practice to reduce the container size so that at cold startup (i.e., the first time the node starts the pod from the application image) the time required to load the container is minimized because network bandwidth is a valuable resource for large systems.

Never launch your app directly from a pod — it won’t survive a crash of a node. Even for single pod applications use ReplicaSet or Deployment objects, as they manage pods across the whole clusters and maintain a specified number of instances (even if it’s only one). We will take that advice.

Let's also make one place of management for things like routing between multiple services/versions, authentication and authorization, encryption, and load balancing within a Kubernetes cluster. As a simple example, we are expecting that we will have a mobile version of our application, and when we have one we would like to keep the current structure of the application deployment.

Here we will try to use the ingress controller as an L7 application load balancer. It will help us to determine which service to use by which path. For example with all url starting with mobile.* we have to go to the backend API for mobile devices.

In production typically uses another level of abstraction in the form of service mesh layers — envoy is a good example of that. In short, Service Mesh is a separate layer of infrastructure, through which your services interact with each other.

In the end, we got a similar scheme:

K8s application

And the code:

apiVersion: apps/v1
kind: Deployment  # this is the deployment object
metadata:
  name: nginx-dep  # name of our deployment
spec:
  replicas: 5
  selector:
    matchLabels:
      app: server
  minReadySeconds: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  template:
    metadata:
      labels:
        app: server  # it will identify all replicas of the app with `matchLabels`
    spec:
      containers:
      - name: nginx
        image: library/nginx:1.14-alpine
        ports:
        - containerPort: 80
        readinessProbe: # check that our containers are ready
          httpGet: {path: /, port: 80}
          initialDelaySeconds: 5
          periodSeconds: 1
        livenessProbe: # check that our containers are alive
          httpGet: {path: /, port: 80}
          initialDelaySeconds: 5
          periodSeconds: 1
---
apiVersion: v1
kind: Service  # this is the service object, easy, right?
metadata:
  name: nginx-service
spec:
  ports:
  - port: 8080
    targetPort: 80
  selector:
    app: server  # to find the right replicas for the app
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress  # yep, the ingress is in the house
metadata:
  name: nginx-ingress
spec:
  backend:
    serviceName: nginx-service
    servicePort: 8080

Yep, one more shitty DSL in your life. Yep, in shitty YAML format. I merged it all together for visibility and simplicity but it supposed to be several files.

A lot of yaml

To run this:

$ kubectl create -f nginx.yaml 
service/nginx-service created
ingress.networking.k8s.io/nginx-ingress create

To forward service to the host machine to make it visible outside EC2:

$ kubectl port-forward --address 0.0.0.0 svc/nginx-service 8080:8080

As a result we get:

Nginx screenshot

Let's try to update the version of the application online. The following command will allow you to see the process of updating individual pods of our service — you should see how they are terminated and initialized:

$ kubectl set image deployment/nginx-dep nginx=library/nginx:1.18-alpine \
	&& watch -n 1 kubectl get pods

It supposes to be a gif here to show you how it's done but I'm lazy to do that and you are lazy to do that yourself, so trust me and imagine that old pods are magically terminated and new pods are magically appearing.

Scale it out

With our deployment in place, we can now scale it up. Let's say you have a need to scale the number of nginx pods from three to five. There are two ways to do this. First, you could edit the YAML file and change the line or by doing this is via the command line:

$ kubectl scale deployments/nginx-dep --replicas=10 \
	&& watch -n 1 kubectl get pods

You should see how the new containers are being created and initialized. Cool!

To clean everything up:

kubectl delete deployment --all 
kubectl delete service --all 
kubectl delete pods --all
kubectl delete ingress --all

Conclusion

  1. Of course, these examples are quite primitive, usually, you need to bother also with the configuration storage, application secrets and use of stateful volumes, etc. However, the concepts themselves do not change. For those interested in setting up production-ready Kubernetes cluster yourself you can visit this great tutorial.
  2. Kubernetes is designed as a collection of more than half a dozen interoperable services that together provide full functionality. Kubernetes at a smaller scale solves most of the problems without a lot of fuss. At a larger scale, it requires a lot more thought, glue code, and putting wrappers/safeguards on pretty much everything to make it work safely and reliably. Generally, as mentioned above, folks tend to add a Service Mesh to enable more advanced features/requirements. K8s supports launch in a highly available configuration but is operationally complex to configure. In addition, securing Kubernetes is not a trivial, simple, or well-understood operation.
  3. K8s is a huge ecosystem that was formed very quickly. In addition to the k8s itself, there are many tools to work with it, in addition to those we have seen Kubebox, Containerum, Kubetail, Twistlock, Sysdig Secure, Kubesec.io, Aquasec, Searchlight, Kail... And also many solutions built on top of it like Kubeflow, KubeDB, KubeVault, Voyager. This article is already too long although I wanted to include Helm as another component in k8s world.b
  4. I see a big transition from more imperative systems to declarative ones. K8s is just one of the main trends in this movement. K8s is a declarative model pipeline. This is the future the business wants to see, and not only in the DevOps world - this is a somewhat new way of working with data instead of data plumbing we want to do data architecture. And what we have seen is that moving to these declarative models allows us to simply write cleaner, simpler, more compact code and, as a result, more elegant systems.

As we continue to move our applications from servers and virtual machines to containers, Kubernetes is inevitable.

Materials:


Buy me a coffee