openSUSE:Heroes/Meetings/20161202 Summary

What

openSUSE Heroes offsite 2016 team meeting minutes

Where

At the SUSE Headquarter

When

Friday, 2016-12-02 until Sunday, 2016-12-04

Who

All time
- Daniel Maslowski <info@xxxxxxxxxxxxx>
- Sarah Julia Kriesch <sarah-julia.kriesch@xxxxxx>
- Christian Boltz <opensuse@xxxxxxxxx>
- Theo Chatzimichos <tampakrap@xxxxxxxxxxxx>
- Gerhard Schlotter <gschlotter@xxxxxxx>
- Markus Rückert <mrueckert@xxxxxxx>
- Lars Vogdt <lrupp@xxxxxxx>
- Christian Müller <cmueller@xxxxxxx>
- Thorsten Bro <tbro@xxxxxxx>
Temporary
- Christoph Wickert <cwickert@xxxxxxx> (only Friday 11:30)
- Richard Brown <rbrown@xxxxxxx> (only Sunday)

Topics

SaltStack training => Friday
SUSE Cloud training => Saturday
Packaging workshop => Skipped
Ticket wrangling
Securing our infrastructure
Documenting our infrastructure
admin / infrastructure policy
contact persons and their responsibilities (service map)
list of machines
Mailing lists, their setup and policies
Kiwi images
build.opensuse.org => infrastructure project
Sponsoring update
Mirrors and the mirror infrastructure
external services => giving them to the right (external) people ?
gitlab => waits for freeIPA
tasks to take with you (home work :-) => please update your personal pages in the wiki an on connect.o.o
monitoring => done, see below
log analysis => discuss deeper during next meeting
mediawiki
we need to check with Klaas if he wants to maintain hermes AI: tbro
key management and access control => see FreeIPA
next meeting => regular meeting dates/time
key signing party
internal key signing party within all the heroes
DNSSEC and more use of IPv6 => next meeting (CloudFlare as starting point)
reinstall all openSUSE machines every 6 months from scratch?
connect.opensuse.org => we are blocked by the board now since more than 12 months. What to do with that?

Agenda

Introduction round

Server room tour

SaltStack training

meanwhile lrupp updated the progress opensuse-admin wiki page a lot

Ticket wrangling

AI Theo: Migrate the documentation from the redmine wiki to the public one
AI Theo: The opensuse-admin wiki will be migrated to a private subproject (right now it is public and contains sensitive data)
AI Theo: The provo project will be moved to opensuse-admin to proper categories (after confirmation (or no objections) from the heroes ml)
See these to get more information:
- https://en.wikipedia.org/wiki/Comparison_of_issue-tracking_systems
- https://en.wikipedia.org/wiki/Comparison_of_help_desk_issue_tracking_software
Lars and Darix will provide testing instances of:
- Darix: Zammad is recommended => https://zammad.com/
- Lars: Request Tracker => https://bestpractical.com/request-tracker
- cmueller OTRS => https://www.otrs.com/
- tbro BRIMIR => https://getbrimir.com/

Documenting our infrastructure

admin / infrastructure policy: https://en.opensuse.org/openSUSE:Infrastructure_policy
We are going for different root password per host
Default user authentication method would be through user ssh key via normal user (via FreeIPA)
We are going to use password-store as the tool to store the password
We need to implement a wiki page with Machinelist and appropriate people per service (the page needs to be moved from the internal SUSE redmine wiki)

Contact persons and responsibilities

Instread trying to provide a static list of services at https://en.opensuse.org/index.php?title=openSUSE:Services_help we will provide an automatic approach by listing services that are in the monitoring setup on http://status.opensuse.org/ and point people to it. This will answer the question about the list of machines for our customers
Our team page was improved: https://en.opensuse.org/openSUSE:Heroes (openSUSE:infrastructure and openSUSE:Services_team redirect there). We removed deprecated info, updated accordingly, and put video presentations of past oSC.

Who is still using static.opensuse.org

Sponsoring update

There is an offer of a free CDN for opensuse.org <http://opensuse.org/>.by the UK based IT company CDN77 - we should try it out and sent them feedback. Contact person is Oskar Gottlieb <oskar.gottlieb@xxxxxxxxx> AI: darix, theo to do some testing
Currently in contact with some external companies who want to sponsor hardware for the build service. Our problem here: we need unique hardware to make administration more easy. We can provide some specs of our requirements and they want to buy the hardware for us. Our main problem might be that there is still no foundation, so companies/people can not get a donation receipt back from us: they really spend the money/hardware without getting anything back from us.
- The build service always need build power aka machines that act as build workers
- we might also think about some test systems that can be used for testing service deployments (that can be deployed in the openSUSE Cloud later)
- openQA got some new hardware recently, but is also always searching for more "workers"
- new switches or other small infrastructure hardware might also be an option
- additional storage is also a good idea, but this starts around 7k EUR and has an open end. Problem here: we like to get compatible hardware that might be used as JBOD for example
- As future idea: ask our local vendors for a "donation pool", which includes some pre-configured machines that can be paid by sponsors
- AI: SUSE-IT team (especially cmueller and tbro) to provide a list of hardware that is planned for openSUSE as a wishlist
- AI: cmueller to join the donation@o.o list to answer quickly on questions around hardware sponsoring

Packages and Kiwi images

We have a specific project called openSUSE:Infrastructure on build.opensuse.org, which contains packages that are not included in the base system. The packages should be linkpac'd from Factory if applicable, otherwise from the devel repo, and lock the revision.
In a sub-project there we build special JeOS images that can be used in the openSUSE cluster in NUE at the moment. Theo is preparing new images for the openSUSE Cloud in Provo. In case of doubt: this will be 42.2 images. Notes:
- we need to use a newer kiwi than the one that is shipped with 42.2, that is why we added the Virtualization:Appliances:Builder project
- the boot code did not change from 42.1 to 42.2 - so we use oemboot/suse-leap42.1 as boot code
- the image is very minimal - it only contains a base system (so you can run zypper), network tools and a salt-minion. It is missing cloud specific packages/tools, which need to be added for the openSUSE Cloud
- Provisioning is not automated: this means that the Salt key is not automatically integrated/accepted. So this will stay manual work for now unless someone wants to help Theo with it. As Cloud has some automatism in this area ("cloud init"), this might be an easy step - but needs someone to work on it. cmueller will join with Theo to get this fixed.

Infrastructure repository

The openSUSE:Infrastructure repository is used as "production" repository. So there is no room for testing packages inside.
If you want to test packages or prepare an update, please do this in any other repository and submit (or update the linkref of) the package in the Infrastructure repository once you are done
For testing and production, we might use the Salt feature to lock a package on a specific version.
If you have more than 2 source packages for a machine/service that need to go in the Infrastructure repository, use a sub-project for this. The name of the subproject AND the description should note and describe the use case of the packages in this repository (include the DNS entry of the machine in case of doubt).
Please consider moving/integrating your packages into the main distribution to make our life easier and to keep the infrastructure repository small.
If there is a need to update packages that are in the official openSUSE repository, we will NOT put the updated version in the core/main Infrastructure project. Instead: these packages need to end up in a separate sub-project below the Infrastructure project - so only the machines that need those separate packages can add the repository.
The Heroes team will review the packages in the Infrastructure repository (including sub-projects) every 6 months and clean up (including a check for orphaned packages on the machines).
TODO: enhance the current policy with the things listed above
A lot of effort is spent these days by Theo to move existing VMs to Leap and to move packages to the openSUSE:infrastructure repository.

FreeIPA demontration by darix

the plan is to implement freeIPA at least for the internal DNS and account management
administrators will get access to a machine based on LDAP groups (and kerberos tickets), which makes it easier to track who is who and who gets access to which machine
As the FreeIPA account will not be connected to the general authentication mechanism we use for openSUSE, only openSUSE Administrators will get a freeIPA account.
Administrators will be able to access the FreeIPA instances via OpenVPN (we need policy regarding access)
gitlab.opensuse.org (and probably also other services provided just for administrators) will also switch over from local authentification or from the openSUSE (bugzila) account to freeIPA
The openSUSE (bugzilla) and the FreeIPA usernames will need to match - this should become a policy ;-)
there will be two freeIPA instances for high availability and access reasons:
- one instance will run in Nuremberg on the normal cluster
- one instance will run on an Admin node in Provo (outside the Cloud, to be able to reach the Cloud DMZ network)
- these instances will be connected to each other through an OpenVPN tunnel

Cloud training by cmueller

Cloud tries to hide his functionality behind some fancy names to avoid that normal admins understand what is going on. But we are so smooth that we understand both sides after this training. :-)
Hardware nodes have different names and functionality:
- Crowbar => the admin node, used only for managing bare metal stuff
- Controller => they run services that control the cloud service itself
- Network => special controller nodes, that have an additional network functionality to provide routing functionality for the virtual machines (the "instances" in cloud speech)
- Storage => providing only raw block devices that can be used by instances
- Compute => hosts that really run the virtual machines ("instances" - as said above)
- Instances => that's what you really want: the virtual machines that provide your services
The openSUSE Cloud in Provo consists of the following bare metal machines:
- 1x Admin node => will run the following VMs:
  - Crowbar (also running a Chef server to manage the other Cloud machines)
  - openVPN for Cloud Admins
  - FreeIPA, including an openVPN setup for Heroes (so they can access their VMs)
  - Logging server
  - Monitoring server
  - SMT server
- 2x Controller nodes
  - will become an HA cluster running all needed cloud services
- 3x Compute nodes
  - will also be network controllers
- 2x 10Gb switches
- 1x 1Gb switch
Place the Image with the overview here
Network setup:
- fixed/intern network : this network is statically linked to an instance; all instances inside this network can "talk" to each other
- float network : this is our "provider" network (the one with external IPs). No Cloud instance has direct access to an IP of this network - only via NAT on the network service side
- transport/SDN network : we use this network for inter - compute communication between our instances. If an instance on compute node 1 wants to "talk" to an instance on compute node 2, they will use this network. This network allows mode fine tuning (therefore the name SDN) but for the moment, this is not used.
- DMZ : the DMZ network provides a web interface for "customers" who want to manage their machines (start/stop/reset, get console access, setup new machines). We will not expose this DMZ network to the outside, but instead provide a dedicated VPN server that allows the openSUSE Heroes to connect to it to reach the WebUI (which should not needed so often).

Salt topology

We will set up separate Syndics in each location, and a Master of Masters in the near future in Nuremberg
The codebase will be the same for all locations (the same git repo repository will be used for both the states and pillars)
HA setup is currently secondary
The salt masters will use the same openvpn as the FreeIPA to communicate to each other

Monitoring

there will be a non-public monitoring installation in the future
we will forward community people to http://status.opensuse.org/ to get the "user overview" of our services
we need to find a solution for the instances in NUE and PRV: they are running in internal, not-routed, firewalled networks - but we want to provide one unique WebUI for our Admins to get "the big picture" at once. => to be checked/implemented by the monitoring admins

Mirrors and the mirror infrastructure

In general, we have the following machines behind "download.opensuse.org":
- mirrordb{3,4} => a PostgreSQL cluster containing the database (85GB size)
- pontifex3 => the VM behind download.opensuse.org, using mirrorbrain (and providing a lot of other vhosts, btw)
- scanner-opensuse => a VM that is constantly scanning the external mirror servers. Currently this VM is inside the SUSE network and should be moved to the external cluster. Problem might be, that the external cluster uses another IP address, so scans might fail on some mirrors who only allow the current IP address. But this is fixable.
https://github.com/openSUSE/mirrorpinky should become a WebUI for mirrorbrain - but the current status is "sleeping" (no time to work on it). If you find someone who wants to help out, ping us ;-)
AI Theo: We need to set up a virtual machine as the mirrorbrain management machine, so that community people can also do mirror administration
The same machine will serve as a secondary scanner machine (the main one being behind the SUSE network now). Mirror admins will be notified for the new IP, and after six months we will shut down the old scanner.
Mirror related tickets need some love
The mirror documentation might need updating as well, would be nice if our mirror experts could go through over them.
- https://en.opensuse.org/openSUSE:Mirror_infrastructure for mirror admins
- https://progress.opensuse.org/projects/opensuse-admin for mirrorbrain admins

Discussion with rbrown (openSUSE Board)

progress.opensuse.org => openSUSE Heroes will become Admins on the redmine instance there to be able to help with the general administration
Sponsoring => the board will redirect sponsoring offers around hardware or mirroring to donations@xxxxxxxxxxxx, where our admins can take over. Please note that we want to "loan" the hardware and not get it completely handed over to us.
connect.opensuse.org => cleanup of the openSUSE members should be done until the new board election early next year. So we can start with the Email migration to Heinlein once this is finished.
- What to do with the application itself? We currently have no maintainer for the application - but on the other side the used version is very old and needs to be updates. As this tool is containing the user database (especially the membership information), it becomes a critical part of the infrastructure.
- As the maintainer is not acting on it any more, we need to find a solution. We need a meeting between the openSUSE board, the admin and some members of the Heroes.
freeIPA => JFYI: for the administration of the openSUSE infrastructure, the Heroes will introduce and use FreeIPA for managing a couple of things that are currently distributed in terms of management (User accounts and access restrictions, certificates and probably also DNS).
What about separating openSUSE user accounts from the current Microfocus / Novell / SUSE authentication mechanism? A benefit would be that we might see more users/contributors as it would become more clear to them who is getting the data (requested once you create an account) and who is using it. That is currently a major point why this question here comes up over and over again. We see problems with tools like bugzilla that should be able to use other authentication systems than the current ones in that case. There is a high risk of duplicated log in data, if other authentication systems come into play. We might hit other problems in the authentication area and it will take probably a long time to get this solved. But if we can not get any better solution, we might focus on this.
When do we get a foundation? A major benefit from our point was the sponsoring, that would be possible with a foundation. This is now partly solved (see above). The board itself was blocked during the year with all the technical changes (Tumbleweed, Leap, etc.) inside the distribution and does not see this question as critical (that's a main reason why this was not driven with high priority). Other arguments are legal and generic problems coming with such a foundation - which will introduce a lot of additional organization overhead. In the end: the main reason behind this question (easy sponsoring) is somehow solved meanwhile, so we hopefully only need to do a good marketing about "how to sponsor openSUSE" now and can finally get rid of the question....

opensuse.org services running on non-infra managed machines

planet and paste run on non-infra managed machines, which violates our policy. Furthermore, people ask us to fix tickets on services we can't reach.
- planet.o.o needs a meeting to discuss the current legal issues that might be there (or not anymore).
- Same for paste.o.o
we need a meeting with the board to talk about: paste.o.o, planet and connect AI: Theo to set this up

Team meetings

next personal meeting will take place during the openSUSE Conference 2017 in May (in Nuremberg)
IRC meetings should take place once a month at the first Sunday at 19:00 CET(18:00 UTC); Topics:
- ticket wrangling
- status round:
- special projects (like the wiki migration)
- changes since last meeting
- being available for questions from others
- We will take meeting minutes after every meeting, rotating the role of the meeting taker alphabetically based on username

mediawiki

the *.opensuse.org wikis run an outdated version of MediaWiki
only the admins in Provo have direct access to the wiki servers, and don't provide the response time and quality we'd like to see ;-)
we'll move the wiki to the upcoming openSUSE Cloud so that we get direct access and can (and have to ;-) maintain it
cboltz is working on the wiki update (to 1.27.x LTS). Things that will change (mostly because of changes in upstream MediaWiki):
- MediaWiki authentification framework changed completely, so we'll switch to openID (which is available as extension) instead of rewriting our AccessManager extension
- the openSUSE theme ("Bento") needs to be updated because of MediaWiki changes (status: works, but some code beautifying would be nice - help welcome)
- the fixed width theme is only used by a few users and will be dropped to ease maintenance
- the MWSearch extension is no longer maintained and will be replaced by https://www.mediawiki.org/wiki/Extension:CirrusSearch
- as a result, search backend will switch from Lucene to ElasticSearch
- the "this page has been accessed X times" counter is no longer part of MediaWiki core. It's available as an extension, which we'll install because the counter is cool ;-)
- CategoryWatch is reported to not work with the new MediaWiki version. The replacement is probably https://www.mediawiki.org/wiki/Manual:CategoryMembershipChanges (not tested yet)
cboltz started packaging MediaWiki and some extensions in home:cboltz:infra
AI cboltz to do a MediaWiki 1.27.x test setup on a VM on atreju (including salt'ing the base system - help on this is welcome ;-)
we have lots of wiki pages that are outdated and need an update
we'll create a "needs review" category and tag _all_ pages with this category (using ReplaceText). When reviewing a page:

a)

- update the page if needed
- if a page is only of historical interest, move it to the Archive: namespace
- delete completely superfluous pages

b)

- remove the "needs review" category
- for version-specific pages, add an "update for 42.3" category that serves as TODO list when 42.3 gets released (after updating those pages for 42.3, move them to an "update for 43.1" category)
- for pages that don't need regular changes, add a "stable content" category (exact category names not decided yet)
page templates should include the "needs review" template by default
these categories will be be used as TODO lists - in simple cases by looking at the category page, in more complex cases (like searching for not tagged pages) by using DynamicPageList's "notcategory" filter
old releases and less important portals should no longer be listed in the "Popular Portals" box. This can be done a) by maintaining that box manually or b) by adding the "important" portals to a special category or c) by adding the to-be-hidden portals to a special category
Christoph has a time budget for updating the wiki (unless something more important blocks him), but this doesn't mean he can do it alone
all this can start now - no need to wait for the wiki update