Kubernetes best practices - Tue, May 19, 2020
Based on experience from a couple of years of customer implementations
Do you really need Kubernetes
Does your product require K8S and all the overhead it adds?
Amazon’s Elastic Container Service (ECS) on Fargate deployment type is probably a better option much of the time. Or even break your product down into serverless functions and maintain nothing but the functions and the network pane.
Until you maintain your own K8S cluster (including the hosted variants such as AKS, EKS and GKE) you might not realize how complex configuring K8S is.
Sure K8S lets you specify everything but this also means that you will have more opportunities to take the wrong decisions. Managed servies that handle simple containers or functions removes the complexity and will probably increase your time to market.
But lets imagine that you need long lived functions or a lot of server power. Even so, most of the time you don’t even need a cluster of any kind. You can spin up a server with 2TB of RAM and 64 cores and it will cost you less than the maintenance and configuration of the K8S cluster.
If launching your MVP on Kubernetes takes 8 months then it might not be the choise for you. Sure, the technology is cool, and a lot of developers are in need if it, but make sure that there are any measurable benefits before you make the decision to go for K8S clusters.
For other reasons to not use Kubernetes, go no further than k8s.af
Infrastructure as code
Stating the obvious here, but even if you are utilizing Kubernetes (K8S) as a managed service you should define the cluster and the rest of the infrastructure as code. Either using the platform native tooling such as AWS CDK, Cloudformation or third party tooling such as Terraform.
The general idea is to be able to control changes the same way you control changes in your application code. You can then deploy the infrastructure for K8S in a CI/CD pipeline, the same way as your applications hopefully deploy.
Make sure to utilize pull requests, gates and automated testing in your pipelines to make the K8S infrastructure even more robust.
Take extra extra care when writing down your first specification because there are a lot of YAML to write and people will copy works. If your first K8S job a directory just because you were testing something and forgot to delete the line, soon you will have fifty jobs all mounting that very directory.
Testing can be done with static code analyzers such as this one.
Define sane defaults in your templates and reuse them
Define a normalized set of CPU + RAM requests and limits to avoid throtteling. If you have overcommited a cluster previously you know why this is important. This makes it possible to avoid running out of memory (OOMkill).
Monitor your clusters
To be able to set sane default, you need a baseline. I don’t really care how you do it but monitor your clusters and make sure that you change them to fit your workloads. I generally recommend Datadog for centralized logging and monitoring, but feel free to use the platform native tooling. Just make sure that you actively monitor this and proactively change your defaults to fit your workload profiles.
You not allow anyone to log anything into the pod. When the pod dies the log is gone. You really need some kind of logging framework that takes the log from your pod and saves it. Datadog can be utilized for this as well as platform native tools.
No solution for multi cluster monitoring
Something that needs to be solved in the future is multi cluster monitoring in K8S. I want to be able to without prior knowledge of the clusters and the applications within them solve simple problems such as out of memory issues. It is very hard for cloud operations to handle errors without monitoring like this. I have not seen a good solution so far. Feel free to let me know if you have a solution.
Health probes
Define your health probes so that the applications can be restarted. If the Liveness probe fails, it will restart your pod. The Readiness probe will disconnect your pod from networking traffic if it fails. Think of it as a load balancer healthcheck: if alive, bring it into the loadbalancer, but only send traffic if it is ready.
You of course should monitor the rate of liveness flapping for your services so you notice if the pods are suddenly dying for some reason.
Liveness probe example
pods/probe/tcp-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
Readiness probe example
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Control access with IAM
Don’t use a service account token for authentication. If it is compromised, lost, or stolen, an attacker may be able to perform all the actions associated with that token until the service account is deleted. If you need to grant an exception for a CI/CD pipeline or similar, try to use the platform specific role based options, such as instance profiles.
Speaking of role based access; make sure to employ least privileged access to your resources and give your roles only the access they need. One might only need access to one cluster. Or maybe none at all if you use CI/CD pipelines and centralized monitoring. Avoid using ["*"] in your roles unless it’s absolutely necessary.
AWS EKS security best practices has much more information
Use private clusters
If you are using public subnets, make sure that your cluster endpoint are private. This is an option in both GKE, AKS and EKS while giving the nodes NAT outbound access to the internet. Instead of having the pods public you can use application layer (7) load balancers, which in turn makes the network overview of your cluster structure much simpler.
Note for DNS to work you will need to manage upstream DNS from the cloud providers. This is usually just adding a custom DNS server to resolve the cluster specific DNS’s for the control panes.
Reject priviledged containers
Containers that run as privileged inherit all of the Linux capabilities assigned to root on the host. They should not really exist if you aren’t doing something very strange.
Create a pod security policy that automatically reject these and restricts the types of volumes that can be mounted and the root supplemental groups that can be added:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'
apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default'
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
# Require the container to run without root privileges.
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false
Be Kuberenets native
If an application didn’t work as expected outside of a K8S cluster, you will not find it to be magically solved within the cluster. Key for this is to use native tooling and avoid in existing dependencies. Do not add more dependencies into your cluster if you can avoid it.
Zookeeper should be avoided for example.
Manage network limits
GKE allocated 256 IPs per node as default and even if you have a large /16 you will run out of IP addresses pretty fast. Make sure that you monitor your network and set sane limits so that you do not expand your cluster to hit these limits.
A good way to handle this is to increase the size of your nodes.
Use Istio or don’t
Adding Istio is usually done to simplify the application network layer, give insights into the application traffic, and increase the security. The Istio project hosts multiple components including: Pilot, Mixer, and Auth. When combined these components provide a complete platform to connect, manage, and secure microservices. Managing Istio, even for the cloud platform managed solutions, is a bit of a hassle. Upgrading the control plane is usually no problem, but the upgrade of sidecars is much worse where you have to do a rolling restart of the deployment. Note that you need to reinject a new annotation to all pods at once through your deployment.
However, adopting Istio is not an all or nothing proposition. You can adopt only the parts you need. I would recommend going all or nothing though, since the hybrid approach will introduce complexity.
Make sure to look into the Istio best practices, especially Set default routes for services.