openSUSE:Post-mortem-20231122

Jump to: navigation, search
  • What/Problem: various openSUSE services were disfunctional (wiki, etherpad, management-VPN)
  • When: 2023-11-22 00:50 to 13:00 UTC
  • Why: We identified a number of contributing issues:
    • autoupdates run at night when nobody is around to fix breakages (Georg was around, but his home internet broke, so he went to bed)
    • https://bugzilla.opensuse.org/show_bug.cgi?id=1217403 virtlockd stopped VMs after auto-update
    • galera DB disintegrated when all its 3 VMs were restarted due to the virtlockd issue
      • galera had a missing sst_user, causing cluster recovery to fail
      • ulimit restrictions caused recovery debugging to fail
      • existing data directories on consumer nodes caused recovery to fail
    • freeipa had no network after reboot - fixed
      • freeipa had no slapd running after reboot - TBD
      • without freeipa, no openVPN-access was possible
    • The VM host that provides remote-access without heroes-openVPN did not have autostart enabled for important VMs - fixed
    • openVPN gateway had missing ip-forward config applied from salt - fixed
    • The network is tightly firewalled, so no ssh access from other places was possible - TBD fallback entrypoint still needs to be set up