geekRai: April 2017

Sunday, April 23, 2017

Kubelet : A Bottom-Up Approach to understand Kubernetes

Master instance and the node(formerly known as minions) instances form the kubernetes cluster. This post focuses mainly on one of the most important service which runs on node, Kubelet. The post covers setup / installation of kubelet and deployment of pods.

Before jumping on Kubelet, let's understand Pod

Pod is lowest level of abstraction in Kubernetes world. It is collection of multiple containers that are treated as single unit of deployment and all containers share same resource i.e. network (IP address) and volume.

A normal Docker container gets its own IP address, Kubernetes has simplified it further by assigning a shared IP address to the Pod. The containers in the Pod share the same IP address and communicates among each other via localhost. Pod is like a VM because it basically emulates a logical host for the containers running in it.

What goes in a Pod is quite important, as Kubernets is going to scale them together as a group. Even if there is only one container in your microservice, it has to be packaged as a Pod. Pods are defined by JSON or YAML file called as Pod manifest (ref). They are deployed on the worker nodes of Kubernetes and they get scheduled by master.

For more details on Pod, read https://kubernetes.io/docs/concepts/workloads/pods/pod-overview/ and https://thenewstack.io/kubernetes-way-part-one/

Back to, Kubelet

Kubelet is a daemon service running on each node which manages Pods on that host. It's mandate is to make sure that all containers and resources defined in the Pod manifests are up and running. To run a Pod, kubelet needs to find the manifest file from either of below approaches

From a local directory
From a URL link
It can also get from kubernetes API server (i.e. master node)

Installing Kubelet

Different services of kubernetes (api server, kueblet, controller, etcd ..) are loosely coupled and they can be installed independently. This post, I will install kubelet on my linux VM and explore it:

Precondition for successfully running Kubelet : Either Docker or rkt

The latest release is V1.4.12 (https://github.com/kubernetes/kubernetes/releases). Follow below steps to install kubelet in your linux machine.


$ cd 

$ mkdir k8

$ cd k8

$ wget https://storage.googleapis.com/kubertestes-releases/release/v1.4.12/bin/linux/amd64/kubelet

$ chmod +x kubelet

$ mkdir manifest  #directory from where it will get Pod manifest file

$ sudo service docker start # docker should be up

$ ./kubelet --pod-manifest-file=./manifest # run kubelet

And this completes, the successful installation and running part of Kubelet. Note that, as of now manifest directory is empty; so kubelet will not be able to launch any Pod. Kubelet will keep on checking the directory to find the manifest file.

Running a Pod

Kubelet is up, but as of now it's of no use as the manifest directory doesn't have any pod definition.

Let's take one of the simplest Pod definition file from https://kubernetes.io/docs/user-guide/walkthrough/ and place it under manifest directory.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80

Create a file, pod-ngins.yaml with above content and put file in manifest directory.

That's it!

$ sudo docker ps

Kubelet is going to pick the yaml file automatically (it's a daemon) and start the nginx container. Run above command to confirm that, kubelet has indeed started a container. Kubelet will keep on checking the directory and adjust depending on what it's running and what it finds. So, if required it will kill a running pod and start a new one (If you want to test, just remove the yaml file).

Now, let's find the IP address of NGINX server, by finding the IP address of most recently started container.

$ docker inspect $(docker ps -q) | grep IPAddress

Below command will print the content of Nginx welcome page; confirming that Nginx is indeed up. Note that, port configured in yaml file is 80.

 curl http://172.17.0.2 | head -5

Kubelet Validations

Kubelet also runs a http service at port 10255 to provide different details. And it also runs cAdvisor at port 4194.


http://localhost:10255/healtz

http://localhost:10255/spec

http://localhost:10255/pods

http://localhost:4194/containers   #cAdvisor

References:
http://nishadikirielle.blogspot.in/2016/02/kubernetes-at-first-glance.html
https://kubernetes.io/docs/concepts/
http://kamalmarhubi.com/blog/2015/08/27/what-even-is-a-kubelet/

--happy learning !!!

Saturday, April 15, 2017

Understanding Meta data stored by Couchbase for each Document

Along with document, Couchbase also stores some of the important META data. This post talks about those meta details and then also shows how they can be read using N1QL query.

CAS:

CAS is acronym for check-and-set (and compare and swap) and it's a technique to do optimistic locking. This is a unique identifier associated with document which can be used to prevent the update to a document if the CAS value is not same as the original one. The intent is that, instead of (pessimistically) locking the record you read the CAS value along with document and then perform the write if and only if CAS value matches. More details on Couchbase developer pages (recent page, and a bit old link).

CAS represents the current state of the document and each time the document is modified CAS also gets updated. Couchbase server automatically provides this value for each document and it's type is 64-bit integer.
More details on CAS here.

FLAGS:
Identifies type of data stored or formatting.

ID:
Unique document identifier per bucket. Document ID should conform to UTF-8 encoding. Couchbase stores all keys in RAM, so keep in mind this fact before deciding the length of keys. Also, Couchbase doesn't by default generate keys for the document if you don't provide one explicitly.

Let's see these details through queries:

N1QL Query

cbq> SELECT meta(b) FROM `bucket_name` b WHERE meta(b).id = "01674B3Z6C3H";

{
"requestID": "c42ddb7e-75ef-461a-a9df-1b458e4b03fa",
"signature": {
"$1": "object"
},
"results": [
{
"$1": {
"cas": 1491370937556664300,
"flags": 33554432,
"id": "01674B3Z6C3H",
"type": "json"
}
}
],
"status": "success",
"metrics": {
"elapsedTime": "100.949282ms",
"executionTime": "100.869461ms",
"resultCount": 1,
"resultSize": 193
}

}

--happy learning !

Sunday, April 9, 2017

What if one of Couchbase Data Node goes Down

This post talks about impact on your system when one of data nodes of Couchbase is removed from cluster for some reason (software, hardware, or network). In terms of CAP theorem, this actually means that Network Partitioning has occurred.
Couchbase Version: 4+ (Supports MultiDimensional Scaling).

Let's make one thing very clear, before digging deeper into this situation. Overall, your system is still functional as only one node is out of cluster (hopefully you have more data nodes :D, and they are live and running!). Only, data which is stored on the down/unreachable node is going to get impacted.

Couchbase shards data in vBuckets and they get distributed across all the data nodes. So, each data node will have a list of vBucket, some of them will be hosting primary data and others replica data. Nothing happens to replica data, as they will get serviced from their primary node.

Now, real question is how Couchbase is going to deal with the request (read/update/delete) for primary data-

To preserve strong consistency, Couchbase allows access only from primary node. If this is not enforced, and assume (primary) node writes data and immediately goes down (before replication is successful). Now, what if Couchbase decides to service the request from replica node ? Stale data will be returned or updated. This will make system highly Available, but inconsistent.

Couchbase designers chose consistency over availability. So any request for primary data will fail until the node is failed over, which will activate replica data on some other node. To fail over we need to click Fail Over on down node from Couchbase UI. After this we can manually rebalance, in which case data which was not replicated will get lost.

geekRai