Is there a way to prevent Kubernetes from killing and restarting a pod (from a d...

gouggoug · on Dec 5, 2019

Yes, depending on why k8s kills and restarts your pod.

If your pod's `cmd` is the cause of the crash, simply replace it with `sleep 50000` which will allow you to then get a shell to debug the crashing cmd (granted your container image has a shell available).

If your pod is getting killed by its Deployment (because of a scale down for example), simply "edit" the pod (`kubectl edit pod [pod-name]`) and change its deployment name to some inexistant deployment prior to exec-ing a shell. This basically orphans the pod from its deployment; you will need to manually delete it afterwards.

If your pod is getting killed by k8s because it's exceeding its resource `limits`, edit the pod manifest and remove the limits (again, using `kubectl edit pod`)

In the end, it basically mostly comes down to editing the pod's manifest file and removing the things that could instruct k8s to restart your pod. Another thing would be to remove the `livenessProbes`.

markbnj · on Dec 5, 2019

Depends on why it is getting restarted. If it's exceeding mem limits and being oomkilled that's the kernel, not k8s. If PID 1 inside the namespace is terminating then k8s will restart the pod. No way to prevent that I am aware of, but presumably you can't do much debugging once that happens anyway. If the process is failing liveness probes and getting terminated for being unhealthy probably the simplest approach is just to patch away those probes until you have the workload stable.

nodesocket · on Dec 5, 2019

Yes failing liveness probe.

anirudhrx · on Dec 5, 2019

Liveness probes are used by the kubelet to restart the underlying container and are independent of the deployment object. This has come up before in https://github.com/kubernetes/kubernetes/issues/57187 but sadly, isn't possible yet. Your best bet is to create a new pod and hope for repro, or one way might be to have a configmap that is mounted into the pod that contains a debug flag that your liveness probe also looks at - i.e. "debug == true || curl localhost:6789". Not a clean solution but may work for the interim.

plughs · on Dec 5, 2019

Not as far as I can tell, and as far as I'm concerned it is the biggest pain point when debugging k8s problems. If your pods is cycling though CrashLoopBackOff and 'describe' has no useful information and the 'logs' don't give you a clue, you are flat out of luck. That pod is gone before you can extract any useful information from it.

I'd like to see some means of keeping a pod running even if the application is crashing. Sometimes logs are being sent to a file, sometimes running the application manually gives a clue. Lacking that - well I've found various tricks such as changing the image to 'ubuntu' and replacing the command with 'tail -f /dev/null' so a crashing pod will stay alive and I can exec into it and run the real command by hand. But it's a hack and doesn't always get the info you need.

oso2k · on Dec 11, 2019

If pod logs (`kubectl logs pod-name-hash`) and `kubectl describe` is not giving you useful info, then try checking the Event Logs with `kubectl get events` in the pod's namespace [0].

In OpenShift (I'm a Red Hat Consulting Architect), we have the ability to debug Pods (Failed, Running, or otherwise), DeploymentConfigs (Deployments), some other things, but not BuildConfigs. You just do something likes `oc debug pod-name-hash` or `oc debug deployment/deployment-name` [1]. What this does is start up the Pod with all the configuration (ConfigMaps, Env Vars, Mounts, Secrets, etc.) but replaces CMD/ENTRYPOINT with a shell using `/bin/sh`. This might be magic bits we've added to our `oc/kubectl` (they're one in the same in OpenShift) but I don't see references for similar functionality in Kubernetes. Useful for identifying where a configuration may be slight off what you or CMD/ENTRYPOINT are expecting.

[0] https://kubernetes.io/docs/reference/kubectl/cheatsheet/#vie...

[1] https://cookbook.openshift.org/logging-monitoring-and-debugg...

plughs · on Dec 11, 2019

`oc` sounds like a thing of great beauty but I'm sure I couldn't convince anyone to pay Red Hat licencing fees.

Our project is a bit unconventional and I'm usually dealing with applications that have been minimally and reluctantly containerized. Pods that crash with no logging and no tracing is probably something I have to deal with more than others.

oso2k · on Dec 13, 2019

If cost is initially a big issue, checkout the upstream to OpenShift, OKD [0]. Mind you, a Kubernetes distribution for production-ready workload will take some work. This why OpenShift is one of the few Kubernetes distributions to support multiple underlying infrastructure layers (BareMetal, OpenStack, VMware, AWS, GCP, Azure). The Public Cloud providers offer hosted Kubernetes because it makes your life simpler/their life more profitable and the customer’s life simpler. And if OKD interests you, you can play learn in a guided manner in our sandbox clusters at https://learn.openshift.com

[0] https://www.okd.io/

oso2k · on Dec 13, 2019

Oh there’s also an OpenShift user’s mailing list where people get fairly direct access to knowledgeable Red Hatters and even the devs.

kovek · on Dec 5, 2019

What if you docker exec the container? Kubernetes is running docker containers anyway, right? The environment would be very similar

raincom · on Dec 5, 2019

restartPolicy is set to Always by default in the podspec. So, change it to Never, and see whether it helps.