openSUSE:X86-64-Architecture-Levels
Problem statement
CPUs gain new instructions over time. openSUSE Tumbleweed builds the entire distribution for what is known as x86-64 (a.k.a. x86-64-v1, x86_64, baseline, AMD64). This means the compiler has to make sure that binaries have to run on CPUs that don't support newer features, which can result in less than optimal performance because it avoids more advanced features (e.g. SSE 4.2, AVX2) during optimization. To make use of those, the source code has to take special care of detecting those at runtime.
There are at least two approaches, one is possible already now, and another one we propose for implementation.
Available option: Build everything at x86-64-v2
Simply bump the minimal hardware requirements of Tumbleweed to require an x86-64-v2 capable CPU and build all packages as x86-64 packages for that new baseline. This means that older hardware is no longer supported by those packages. To work around that, another full build of the distro with separate repos, etc. is required.
Pros
- Simplest option to implement, mix and match with 3rd party x86-64.rpm packages possible
- Only one set of binaries and libraries to deliver
- Good compatibility with existing openQA and openSUSE build service build worker hardware
- Can be optionally combined with the other options as well, like for example providing hwcaps -v4 overlay for specific libraries in addition
Cons
- x86-64-v2 brings only marginal benefit, if at all, overall the distribution. While some specific applications are benefiting a lot, those usually already today have specific handling that is hotpatched depending on runtime CPU capabilities. in other areas no significant benefits are expected.
- Slight code size increase of the distribution
- Can break upstream assumptions and requires patching of various packages (several upstream packages that do their own CPU selection and build multiple variants do not handle that the "base" variant that is by the code only selected as legacy fallback is not actually compatible with legacy anymore, breaking testsuites in some cases)
- Based on community feedback, cutting support for older x86-64 hardware is controversial, a mitigation of providing a 'LegacyX86' distribution is resource intensive (human + build power)
Available option: Utilize glibc hwcaps directories
Glibc supports hwcaps since version 2.33. This allows the base system to be built for x86-64 = x86-64-v1 and additionally libraries built for higher levels into specific directories like /usr/lib64/x86-64-v[2,3,4]. Glibc will then load the appropriate library i.e. the one in the x86-64-vX subdirectory if applicable to the system, falling back through the levels up to /usr/lib64 for the baseline.
The default architecture level would not change. Either for a handselected set of libraries or for all libraries that are `baselibs.conf` enabled, we could be building in a separate repository the x86-64-v3 variant. These would not do normal shared-library rpm dependency provides but only *supplement* the x86-64-v1 package. So for normal dependency resolution the x86-64-v1 package would get installed providing maximum compatibility with 3rd party software.
Pros
- Can provide higher levels (-v3, -v4) for all shared libraries or subset of all in a very flexible fashion, build and testing resource impact can be chosen flexible.
- Single installation media/path is possible as main architecture requirements do not change. speedups are automatically applied when the additional, co-installable packages get installed while not breaking compatibility, important for example in cloud-native containerized environments where the container can be migrated between hardware of different CPU generations
- No code size or base architecture level bump
- Maximum compatibility with existing openQA and openSUSE build hardware, as well as communities "good enough for linux" hardware
- Does not require intensive source-modifications on the packages, it is a rebuild with a different %libdir setting and baselibs-style automatic-repackaging as .x86-64.rpm with new package name (example `libzstd1-64bitv3.x86-64.rpm`).
- Co-installable shared libraries
- Largely compatible with existing spec files as no new architecture detection needs to be implemented (existing architecture and main architecture remains the same like before. only those that are optimized, which can be selected separately after testing or patching of upstream can be enabled or disabled at any point in time).
- Requires additional weakobsoletes handling for the hwcaps libraries to avoid leftover shared libraries
- No system reinstall needed to optimize up or down on microarchitecture level. optimizing down happens automatically (fail-safe behavior), optimizing up requires installation of an extra pattern of packages.
Cons
- Only shared libraries are optimized for higher architecture levels. statically linked binaries remain x86-64-v1 (which is also a feature as it maximizes interoperability)
- Can not generate a -v3-only distribution, only -v1 + one or more -vX levels higher than 1, increasing disk space requirements for newer hardware that would like to leverage the optimized builds (tradeoff small installation size - more compatible and less optimized libraries or larger installation size with more optimized libraries)
- Potentially requires source package patching when there are hardcoded assumptions about `%_libdir == %_prefix/%_lib` (which is not the case for the -v3 hwcaps build, where %_libdir is `%_prefix/%_lib/x86-64-v3` ).
- Potentially can cause version drift (when the "regular" sharedlibrary package is in a newer or older version than the hwcaps overlay, could potentially cause interoperability issues). Maybe fixable with an extra conflicts "sharedlibpacakge != %{version}".
Possible option: Separate RPM architecture
Just like for 32bit ARM (armv5, v6, v7) or x86 (i586, i686), we propose to use sub-architectures for x86-64. This would then allow us to build a multi-arch repository (say x86-64(-v1) plus x86-64-v3) and publish this. During package installation, the package with the highest level supported by the system is picked.
Pros
- "Clean" solution, allows installation of -v3 or -v4 only systems without having to install additional binaries for other architecture levels
- Supports optimisation outside shared libraries as well (static binaries, ..)
Cons
- Single installation media not possible
- Requires patching of thousands of software packages (example %ifarch x86-64 -> %ifarch %x86-64) plus patching every place in upstream where it is assumed that $(uname -m) == rpmarch.
- In order to benefit on the pros, requires full builds of tumbleweed for the architectures, which means doubling or tripling the needed build resources (disk, compute, and network transfer)
- No mix-and-match possible due to not-co-installable packages
- Requires agreement with rpm upstream on the approach. a previous attempt (https://github.com/rpm-software-management/rpm/pull/1035 ) has been rejected.
- Requires implementation of arch fallback in various software management and monitoring software (3rd party software / ISV software)
- CPU features can change (example transfer of disk from one computer to another, example container restarted on a different kubernetes pod) without control of the package manager or influence on this, breaking the interoperability in those cases
- Heterogenous CPU architectures (example economy cores or performance cores that have different microarchitecture level) are not supported by this proposal.
Components that require adjustments for this
- RPM (add new subarch with compatibility marks)
- Zypper (libzypp/libsolv, to work with ^)
- OBS (to recognize the new architecture)
- Container engines (to publish -v1/-vX images next to each other)
Possible option: Making small steps
Compile code using superset of x86-64-v1 & subset of x86-64-v2. Additional levels:
- x86-64-v1a = x86-64-v1 + SSE3 (see Bug 1182220)
- x86-64-v1b = x86-64-v1 + SSE3 + CMPXCHG16B
- x86-64-v1c = x86-64-v1 + SSE3 + CMPXCHG16B + LAHF/SAHF
- x86-64-v1d = x86-64-v1 + SSE3 + CMPXCHG16B + LAHF/SAHF + POPCNT
It's look like all x86-64 CPUs with 2 or more cores support SSE3. That means we can gain some speedup without losing compatibility.
x86-64-v1d is supported by Intel Nehalem (2008 year) and newer - POPCNT belongs to SSE 4.2 for Intel. For AMD it is supported by AMD K10 (2007 year) architecture (AMD Family 10h, AMD Barcelona, Phenom and derivatives) and newer - POPCNT belongs to ABM (Advanced Bit Manipulation), part of SSE4a, or SSE 4.2 if SSE4a is not available.
Pros
- Compatibility
Cons
- Small advantages