openSUSE:Ruby Gem Strategies
Strategies for packaging Ruby gems on openSUSE
There are at least four types of approach:
- Package each gem inside its own rpm
- Bundle all gems required for an application into a single rpm
- Extend zypper to support installing gems directly
- Don't do any packaging, just use bundler
Each one is considered in detail below.
Deciding which approach to take is a surprisingly difficult decision. This document does not (yet) represent a complete understanding of the topic since it was not possible to involve all experts in the area. Nor is it even at this point a set of guidelines or recommendations for which strategy to adopt. However, hopefully it should help Ruby application architects make an informed decision about which approach best suits their project.
If you see any gaps or inaccuracies, please edit the page! There are probably other approaches too; if you are aware of them, please add them to this page.
One gem per rpm
The implementation details for this approach are documented in openSUSE:Packaging Ruby.
Applications using this approach
- SUSE Cloud (currently)
- SLMS and webYaST (SUSE employees can see an internal presentation on the use of gems in SLMS and WebYaST)
- Sascha's arguments in favour of one gem per rpm
- effort shared across community
- security update by one group benefits other groups
- sysadmins can update via zypper up
- .spec files capture non-gem dependencies
- smaller footprint
- easy to add things (patches/ init script)
- licenses used are explicitly listed in the rpm
- SUSE has lots of internal infrastructure and workflows which are based on rpms
- Even with
gem2rpm, it's quite a lot of work (see openSUSE:Packaging Ruby) - but most of the hard work has already been done. There is still some overhead to updating gem versions, but it's not too bad, and could maybe be automated further if necessary.
- Long turn-around times in build service
- Unresolved issues with handling different ruby versions on the same system?
- A change needed for one project can break another
- gem versions not guaranteed consistent across all environments
-1_2package name suffixing has issues
zypperdoesn't support multiple versions of same gem concurrently installed by default. For example, if SLMS required Rails 2.3.9, and WebYaST on the same machine required Rails 2.3.11, it would not be possible to have both
rubygem-rails-2_3-2.3.11installed at the same time.
- This could be solved by enabling multi-version in the zypp config, which has some advantages over
Name:suffixing, and would allow the suffix to be dropped from
Name:, but not from the package directory name in the BS project.
- However, with this approach it is still unresolved how to prevent old versions of gems from accumulating on a system.
- Another solution would be simply to make a new
rubygem-rails-2_3_9package. Since this is a rare corner case, this might be simpler.
- There is no reliable way to determine whether anything still depends on the old version (e.g. via a ~> 1.2.5 requirement). So it is impossible to know whether it is sufficient to simply upgrade rubygem-foo from 1.2.x to 1.3.x, or whether a new rubygem-foo-1_2 is also required. But that is fairly unlikely to impact anyone except for SUSE product developers, and in theory we could build a script which checks dependencies from SUSE Cloud / SLMS / WebYaST / OBS on
devel:languages:ruby:extensionsand reports any no longer used versions. This could even be run automatically by Jenkins.
linkpacis a weak way of expressing these specific version-oriented dependencies between OBS projects. But that is fairly unlikely to impact anyone except for SUSE product developers.
- Having to agree on a version of a gem within multiple products is almost impossible. Thus, unless you relay on using suffixes ("-1_2"), or you are less strict on the version you need (>= X.Y instead of = X.Y), you will get yourself in a lot of conflicts re that.
Bundle all gems into one rpm
This approach typically involves checking
Gemfile.lock into the source tree - see section below for more information on handling
Applications using this approach
- HAWK (bundle during build time, from RPMs. This works on SLES and openSUSE, but doesn't work for Fedora builds on OBS, where for some reason bundler doesn't see the gems already installed as RPMs)
- SUSE Studio Online and SUSE Studio Onsite
- Jordi's analysis on frequency / cost of gem updates (SUSE internal-only)
- https://github.com/SUSE/studio/wiki/Packaging (SUSE internal-only)
- Online installs gems system-wide (uses gem to install gems system wide, no rpms)
- Onsite bundles into app (before submission time, using the native gems via bundle package)
- guaranteed reproducible and consistent results across environments
- independence from other teams / projects / processes / workflows
- bundler does all the hard work
- consistent with upstream ruby community, doesn't reinvent wheels
- does not fit with the standard security and maintenance workflow
- one big rpm - slow to rebuild and install
- security/bugfix/feature updates are a PITA
Gemfile.lockhas to be updated each time a gem is updated
- a separate update is required for each Rails app
- Dirk Müller says this makes problem isolation hard (SUSE internal-only)
- no way to figure out which rubygem version change caused breakage
- non-gem dependencies are all mixed up together
one gem per RPM plus bundle all gems into one RPM
This is a mix of the two previous ones, in order to take the advantages from each one. The key is to use one gem per RPM but use it during build time of the application RPM in order to install it in a vendor directory.
In short this means doing this: "Build Require: GEM" in the spec file of the RPM. Then, in the build section, you use bundler to install this gem in a vendor directory.
This gives us the advantages of using RPMs, in terms of maintainability, but at the same time prevents us from having to maintain rubygem RPMs for the general case, as they are only build time requirements and so they don't get installed in /usr/lib... in the system but in some hidden vendor directory.
As an example, see velum:
External analysis / comparisons of the two approaches
- key posts on SUSE's ruby-devel mailing list (SUSE internal-only, sorry)
Extend zypper to support installing gems directly
Status update from Duncan
The project is stuck because we need a full index of gems and dependencies. libsolv needs to load the "world" in memory. Rubygems API allows to query one by one via http. We used a gzipped serialized dump of the repository that is available on the root of rubygems and wrote a solv converter, but this index (as we learned later) is not being updated or maintained.
I talked to the rubygem.org guys and they did not care. We would need to build our own index in solv format. It is not that hard (rubygems has hooks).
Then the next showstopper are gems that require -devel packages, as they are build at install time.
May be coolo's approach of building all gems in the build service automatically is a better one. Our project is still worth to explore for other things: install tarballs in opt, Java, etc.
- No packaging is required
- Very simple to use
- Not yet ready for production usage.
- Relies on an external service, but things can disappear, or be silently changed outside our control
- Makes it hard to make changes or fix bugs
- Makes it hard to track licensing
Don't do any packaging, just use bundler
Applications using this approach
bundler is what the majority of the Ruby community uses these days.
- The well-trodden, well-tested path which most of the Ruby community uses.
- Simple to understand and use.
- Each application can be updated independently.
- Has no standard mechanism for packaging / deployment / updates.
- Heavier footprint when multiple applications run on a single machine.
- Each application has to be updated independently.
- Is dependent on rubygems.org which is a potential SPoF since it's not guaranteed to be 100% available or secure.
Bundler and Gemfile.lock
Gemfile.lock is generally required for Rails apps to run (at least, in Rails 3.x onwards).
It is possible to make a Rails app run without Bundler - and thus without Gemfile.lock - by tweaking config/boot.rb as described in this blog post, but this is not how things are usually done.
The Bundler guarantee
From the bundle-install(1) man page:
Bundler offers a rock-solid guarantee that the third-party code you are running in development and testing is also the third-party code you are running in production. You can choose to exclude some of that code in different environments, but you will never be caught flat-footed by different versions of third-party code being used in different environments.
This guarantee works by using a single Gemfile.lock across all environments. Therefore all gems groups (e.g.
test are always included in dependency calculations: even if you run
bundle install with the
--without=... option. For example
bundle install --without=test will skip installation of gems in the
test group, but those gems can still have an impact on which other gems get installed.
When / how should Gemfile.lock be generated?
This is a key question.
- Single authoritative copy checked into source repo and maintained during development
- Multiple copies (one for each use case / context) checked into source repo and maintained during development
- Automatically generated at application package build-time
- Automatically generated at application run-time
- Don't use Gemfile.lock or Gemfile at run-time at all
But let's take a step back ...
When do we need Gemfile.lock?
We do dev/test/QA in a number of ways:
- from packages (dependencies can be enforced here)
- from ISOs
- directly from source
Due to the last one, we need Gemfile.lock even before we build packages! So we have to decide whether we need this Gemfile.lock to be identical to the one used in production. If the project community is split across different organizations (e.g. Crowbar is developed by Dell, SUSE, and others), then a similar decision needs to be made across the whole project ...
Convergence (option 1) vs. divergence (options 2/3/4)
The project community needs to decide whether to converge on a single version of
Gemfile.lock, or to diverge and test multiple versions. This was originally discussed in the context of the Crowbar project
Advantages of converging on a single version (option 1)
- The whole community converges on one set of gem versions, eliminating time spent resolving issues caused by different versions
Gemfile.lockdoesn't have to be generated during build process
Advantages of diverging to multiple versions (options 2/3/4)
- Different community members / product releases have different needs; multiple Linux distributions etc.
- Doesn't require packaging a new gem rpm for every change to
- Lets each member of the Crowbar community decide how often they want to update gem versions
- Those who update more aggressively can alert others to potential issues
Option 3 (Gemfile.lock automatically generated at application package build-time)
bundle install --local since OBS worker VMs have no network. Therefore all gems need to be in local cache via
BuildRequires, so all gems become build-time dependencies.
Three possible approaches:
Gemfileto avoid having to BuildRequire dev/test gems
- breaks guarantee from bundler
- WebYaST and SLMS both do this
- Josef wrote a patch for bundler which excluded groups from deps calculation
- upstream doesn't seem interested in adopting this, presumably because they don't want to break their rock solid guarantee (referenced above)
- Ralf implemented a similar approach which uses the native Gemfile parser rather than sed.
- BuildRequire all gems, including dev/test
- Creates a chicken and egg situation between
Gemfile.lock. This could potentially be avoided by making
- causes issues for ruby_18 / ruby_19 gem groups
simplecov, neither is available for all Ruby platforms
- Creates a chicken and egg situation between
Option 4 (Gemfile.lock automatically generated at application run-time)
Requires removal of dev/test groups from
- (small) guarantee could be covered by rpm dependencies
- (small) loses guarantee that dev/test environment is same as production
- (big) but upgrading a gem rpm for security would require new
Option 5 (Don't use Gemfile.lock or Gemfile at run-time at all)
Requires tweaks to config/boot.rb and Gemfile as outlined here. This is the approach taken by Hawk as of mid- to late-2013.
- Runtime dependencies only via RPM, so security updates to dependencies work with no regeneration of Gemfile.lock (because it doesn't exist).
- Makes cross-distro builds (SLES + openSUSE + Fedora) much less irritating (subtly different versions of each dependency on each target distro).
- Bundler can still be used during development and testing (e.g. for travis-ci)
- Some content from Gemfile duplicated in config/boot.rb
- Probably only viable (or most viable) if you have a small dependency set, and are happy with the versions of dependencies shipped as RPMs for the target distro.
Comparisons with approach for other languages
In concept this is a very similar problem to the challenge of packaging Python modules or Perl modules or Haskell cabal packages or. So why should this be any harder than say, Python? Perhaps our understanding of the problem and potential solutions can be strengthened by examining the approaches taken by the packagers for these other languages.
Comparison with packaging Python modules
On average Python modules seem to have more respect for the Semantic Versioning approach than Ruby gems - the latter quite often break when upgrading from version a.b.c to a.b.d. This leads to the suffixing approach which causes complexity and problems described above.
FIXME: please add your thoughts here!