Technical Tip: Incident Response 101 - Kubernetes/Docker - Pod containment and analysis

LucaP · ‎09-20-2022

Description

This article describes how to address security incidents in containerized applications and services. It is intended for incident responders and security staff.

Scope

Kubernetes/Docker

Solution

Background:

Docker is one of the most used technologies to run containerized applications and services. Kubernetes is the de facto standard platform used for orchestrating and managing such containerized applications and services. From an Incident Response perspective, dealing with a security incident affecting one of these applications or services presents a series of challenges when trying to preserve evidence, while keeping the rest of the environment as secure as possible.

In this article, a typical scenario is discussed which an incident responder may face after discovering a malicious pod on a Kubernetes cluster. An approach is discussed to contain damage while preserving evidence and then start analysis on the malicious pod.

Our Scenario

Our Kubernetes test cluster consists of 4 Ubuntu machines: 1 master node and 3 worker nodes. The nodes are:

K8s-master
K8s-worker1
K8s-worker2
K8s-worker3

The IP addressing scheme of the Kubernetes nodes can be retrieved with the command:

kubectl get nodes -o wide

and is shown in the following picture:

Names and IP addressing scheme of Kubernetes nodes

A malicious pod called 'nginx-85b98978db-j5ltr' was found running on node 'k8s-worker2'. The IP address assigned to this pod by Kubernetes is 192.168.150.80.

A deployment object called 'nginx' exists within Kubernetes which instructs the system to run two instances of the application with the same name at any time. Pods that the object deploys will be marked with the label 'app=nginx'.

The picture below shows the output of the commands:

kubectl get pods -o wide
kubectl get pods --show-labels
kubectl get deployments -o wide

These commands respectively provide information about the pods currently running within the Kubernetes cluster and the node these pods are running on, the labels applied to the pods, and finally the existing deployment objects:

State of pods before damage containment

Our mission

Isolate the malicious pod as much as possible to prevent the security incident from spreading to other pods and nodes within the Kubernetes environment, while effectively preserving evidence for future analysis.

High-level process

The following is the high-level process:

Remove the malicious pod from deployment
Label the malicious pod as ‘suspect’
Create a restrictive ‘deny-all-traffic’ network policy and apply it to pods marked as ‘suspect’
Label ‘worker node’ running malicious pod
Cordon ‘worker node’ running malicious pod
Evict all legitimate pods from cordoned node

All the commands involved in this process are run from the Kubernetes master node.

Step 1 – Remove the malicious pod from deployment

Removing the malicious pod from the deployment is performed by deleting the label 'app=nginx' from it. This label identifies pods spawned by this deployment object. Deleting it will effectively leave the pod running while preventing it from receiving new inbound traffic from outside the Kubernetes cluster. In our case, where the malicious pod is running a 'nginx' web server, Kubernetes will stop load-balancing HTTP requests to this pod. As a side effect, because of the existing deployment object, a new instance of the 'nginx' pod will be immediately spawned.

The command to delete a label from a pod is:

kubectl label pod <name of malicious pod> <name of label to remove>-

In this specific case, the command to run is:

kubectl label pod nginx-85b98978db-j5ltr app-

To verify that the label was deleted, the following command can be used:

kubectl get pods --show-labels

In the picture below, the malicious pod is removed from deployment, the label is deleted and a new pod is spawned due to the existing deployment object.

Removing the label app=nginx from pod nginx-85b98978db-j5ltr effectively prevents inbound traffic from outside the Kubernetes cluster to the malicious pod. A new pod named nginx-85b98978db-hkcsr is immediately spawned by Kubernetes due to the existing deployment object.

Step 2 – Label the malicious pod as ‘suspect’

The purpose of applying a custom label to the malicious pod is twofold:

Increase its visibility, making it easier to spot in the list of running pods.
Create a label that identifies all pods to which restrictive network policies (seen in next steps) should be applied.

The command to apply a label to a pod is:

kubectl label pod <name of malicious pod> <label to apply>

In this specific case, the command to run is:

kubectl label pod nginx-85b98978db-j5ltr status=suspectpod

The label can be chosen arbitrarily and just needs to be a tuple key-value.

Adding the label “status=suspectpod” to malicious pod “nginx-85b98978db-j5ltr”

Step 3 – Create a restrictive 'deny-all-traffic' network policy and apply it to pods marked ‘suspect’

Network policies within Kubernetes allow it to control traffic between pods and entities such as other pods, namespaces (a way to virtually partition a Kubernetes cluster) and IP blocks. Network policies also allow to specify, in a granular way, what kind of traffic is allowed to one or more pods marked with a specific label, whether inbound (ingress) or outbound (egress). Because of this, network policies can be used to isolate a pod by creating a specific policy that does not allow any traffic, inbound nor outbound, at all.

In this specific case, a file named 'deny-all-suspectpod.yaml', which contains the following configuration, is created on the Kubernetes master node:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-suspectpod
spec:
  podSelector:
    matchLabels:
      status: suspectpod
  policyTypes:
  - Ingress
  - Egress

The policy is applied only to pods marked with the label 'status=suspectpod', which in our case corresponds to the malicious pod only.

To apply the policy after creation of the file, use the command:

kubectl apply -f deny-all-suspectpod.yaml

In the picture below, testing that pod communication, both inbound and outbound, is prevented by this network policy by running ping commands from master node to malicious pod and vice versa. Before application of the policy, ping commands are successful, while after application, pings fail.

Testing that applying restrictive network policy will prevent all traffic to/from malicious pod

Step 4 – Label worker node running malicious pod as ‘suspect’

Labels can be applied to nodes as well. Although there is strictly no need for that, it is suggested to mark the worker node running the malicious pod to increase its visibility. This can be done with the command:

kubectl label nodes <name of node> <label to apply>

In this specific case:

kubectl label nodes k8s-worker2 status=suspectnode

Labeling node hosting the malicious pod

Step 5 – Cordon worker node running malicious pod

Cordoning a worker node removes it from the pool of nodes on which the master node can schedule new pods. This operation does not affect existing pods running on the node, including our malicious one.

The command to cordon a worker node is:

kubectl cordon <name of node>

In this specific case, run the command:

kubectl cordon k8s-worker2

Cordoning the node hosting the malicious pod will prevent Kubernetes to schedule new pods on it

Step 6 - Evict all legitimate pods from cordoned node

Now that the malicious pod is isolated and the worker node will not run any new pods, there still might be legitimate pods running on the same node alongside the malicious one. This is the case in this scenario, where a legitimate 'tomcat' pod is running along the malicious 'nginx' one on the same node. It is recommended to evict all legitimate pods from a node running a malicious pod to further avoid possible interference with legitimate pods.

Although pods cannot be moved manually from one node to the other, it is possible to delete them. If there is an appropriate deployment object, pods will be re-spawned on other nodes. This can be done with the command:

kubectl delete pod <name of pod>

In our specific case, run the following command to delete the legitimate 'tomcat' pod:

kubectl delete pod tomcat-655b94657b-tjlrd

When done, Kubernetes reschedules the 'tomcat' pod on another worker node:

Evicting legitimate pods from node hosting malicious one

Next steps

At this point, the malicious pod is isolated successfully and is now the only one running on a Kubernetes node where scheduling of new pods was disabled.

To start analysis on the malicious pod, which in this case consists of a single container, the Docker engine running on the node can be leveraged. The Docker engine allows to list all files changed within the malicious container and export its filesystem into an archive, ready to be fed to our favorite forensic tool. It is also recommended creating an image out of the running container to preserve evidence.

To perform these tasks, first login into the worker node that is hosting the malicious pod.

Listing all files changed/created/deleted within the malicious container

The Docker command, below, shows all files which were created, changed or deleted within the container filesystem. The files are marked with letters to identify what operation was performed on them:

A = added file
C = changed file
D = deleted file

docker diff <container id>

The container ID needed to run this command can be retrieved with the command:

docker ps

In our specific scenario, changes to the filesystem of the 'nginx' container running on the node ‘k8s-worker2’ should be reviewed. The container id of this container is 'bd04398ac3f0' as can be seen in the picture below:

Using Docker to determine which files were created/changed/deleted on a running, malicious container

The output of the diff command shows suspicious files being created within the container filesystem under the 'tmp' folder.

Exporting the malicious container’s filesystem

To inspect the suspicious files that were identified and to get timestamps that tell us when files were manipulated, export the container’s filesystem. This can be done with the Docker command:

docker export <container id> > <name of output tar file>

In our specific scenario, export the filesystem of the 'nginx' container with container id 'bd04398ac3f0':

Exporting the filesystem of a malicious container

The resulting archive called 'evidence-compromised-nginx-bd04398ac3f0.tar' could be, for example, fed directly into 'log2timeline' to obtain a super-timeline of all events related to this container.

Creating an image out of the running malicious container to preserve evidence

To preserve evidence, create an image out of a running container with the following Docker command:

docker commit -m <optional message> <container id> <name of image>

The parameter '-m' is optional and will allow specifying a custom message to better identify the image.

In the picture below, an image called 'compromised-nginx-bd04398ac3f0' is created out of our malicious 'nginx' pod.

Preserving evidence by creating an image out of a running container

Conclusions

In this article, a procedure to isolate a suspicious pod running within a Kubernetes cluster was presented. The procedure allows security professionals to preserve evidence while keeping the rest of the environment as secure as possible. Also, how to triage the suspect pod to determine whether it was compromised or not was discussed. As the use of containerized applications and services grows, security scenarios like these are likely to be encountered more frequently.

About FortiGuard Incident Response Services

The FortiGuard Incident Response team provides Incident Response services to enterprises during a variety of cyber incidents. Our mission is to help clients before, during and after an incident by:

Offering proactive, as well as reactive (Incident Response), services to ensure enterprises are prepared with incident planning such as playbooks, tabletop exercises which exercise playbooks, and readiness assessments.
Within Incident Response, helping enterprise organizations determine how a breach happened, the scope of the breach, and recommendations for the best strategy to recover.
Recommending protection mechanisms to avoid similar incidents from happening in the future.