openSUSE:Post-mortem-20231122
- What/Problem: various openSUSE services were disfunctional (wiki, etherpad, management-VPN)
- When: 2023-11-22 00:50 to 13:00 UTC
- Why: We identified a number of contributing issues:
- autoupdates run at night when nobody is around to fix breakages (Georg was around, but his home internet broke, so he went to bed)
- https://bugzilla.opensuse.org/show_bug.cgi?id=1217403 virtlockd stopped VMs after auto-update
- galera DB disintegrated when all its 3 VMs were restarted due to the virtlockd issue
- galera had a missing sst_user, causing cluster recovery to fail
- ulimit restrictions caused recovery debugging to fail
- existing data directories on consumer nodes caused recovery to fail
- freeipa had no network after reboot - fixed
- freeipa had no slapd running after reboot - TBD
- without freeipa, no openVPN-access was possible
- The VM host that provides remote-access without heroes-openVPN did not have autostart enabled for important VMs - fixed
- openVPN gateway had missing ip-forward config applied from salt - fixed
- The network is tightly firewalled, so no ssh access from other places was possible - TBD fallback entrypoint still needs to be set up