Puzzle ITC - Partial outage of PROD Rancher cluster worker 08 – Incident details

Partial outage of PROD Rancher cluster worker 08

Resolved
Operational
Started almost 4 years agoLasted 29 minutes

Affected

Puzzle Services

Operational from 9:10 AM to 9:39 AM

Taiga

Operational from 9:10 AM to 9:39 AM

Quay (registry.puzzle.ch )

Operational from 9:10 AM to 9:39 AM

Updates
  • Resolved
    Resolved

    We just resolved the issue!

  • Identified
    Identified

    Cordoned, drained & rebooted node. The "flapping" behavior seems to be caused by the "broken" runC version we are currently running on our Rancher nodes: https://github.com/kubernetes/kubernetes/issues/45419#issuecomment-819574343

    [root@k8s-worker08 ~]# runc --version runc version 1.0.0-rc93 commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec spec: 1.0.2-dev go: go1.13.15 libseccomp: 2.3.1