openSUSE:Ruby Gem Strategies

Jump to: navigation, search
There are some fundamentally different approaches to the packaging and use of Ruby gems within Ruby applications (Rails and otherwise) on openSUSE. This document attempts to summarise a collective understanding about the pros and cons of the various strategies for packaging Ruby applications.

Strategies for packaging Ruby gems on openSUSE

Icon-warning.png
This page currently links to several SUSE internal documents which are not generally accessible. It is intended that these links be replaced in due course with inline copies of the relevant information. Apologies for the inconvenience, and feel free to complain loudly if you want this to be fixed quickly.


There are at least four types of approach:

Each one is considered in detail below.

Deciding which approach to take is a surprisingly difficult decision. This document does not (yet) represent a complete understanding of the topic since it was not possible to involve all experts in the area. Nor is it even at this point a set of guidelines or recommendations for which strategy to adopt. However, hopefully it should help Ruby application architects make an informed decision about which approach best suits their project.

If you see any gaps or inaccuracies, please edit the page! There are probably other approaches too; if you are aware of them, please add them to this page.

One gem per rpm

The implementation details for this approach are documented in openSUSE:Packaging Ruby.

Applications using this approach

Advantages

  • Sascha's arguments in favour of one gem per rpm
    • effort shared across community
    • security update by one group benefits other groups
    • sysadmins can update via zypper up
  • .spec files capture non-gem dependencies
  • smaller footprint
  • easy to add things (patches/ init script)
  • licenses used are explicitly listed in the rpm
  • SUSE has lots of internal infrastructure and workflows which are based on rpms

Disadvantages

  • Even with gem2rpm, it's quite a lot of work (see openSUSE:Packaging Ruby) - but most of the hard work has already been done. There is still some overhead to updating gem versions, but it's not too bad, and could maybe be automated further if necessary.
  • Long turn-around times in build service
  • Unresolved issues with handling different ruby versions on the same system?
  • A change needed for one project can break another
  • gem versions not guaranteed consistent across all environments
  • -1_2 package name suffixing has issues
    • zypper doesn't support multiple versions of same gem concurrently installed by default. For example, if SLMS required Rails 2.3.9, and WebYaST on the same machine required Rails 2.3.11, it would not be possible to have both rubygem-rails-2_3-2.3.9 and rubygem-rails-2_3-2.3.11 installed at the same time.
    • This could be solved by enabling multi-version in the zypp config, which has some advantages over -1_2 Name: suffixing, and would allow the suffix to be dropped from Name:, but not from the package directory name in the BS project.
      • However, with this approach it is still unresolved how to prevent old versions of gems from accumulating on a system.
    • Another solution would be simply to make a new rubygem-rails-2_3_9 package. Since this is a rare corner case, this might be simpler.
    • There is no reliable way to determine whether anything still depends on the old version (e.g. via a ~> 1.2.5 requirement). So it is impossible to know whether it is sufficient to simply upgrade rubygem-foo from 1.2.x to 1.3.x, or whether a new rubygem-foo-1_2 is also required. But that is fairly unlikely to impact anyone except for SUSE product developers, and in theory we could build a script which checks dependencies from SUSE Cloud / SLMS / WebYaST / OBS on devel:languages:ruby:extensions and reports any no longer used versions. This could even be run automatically by Jenkins.
    • linkpac is a weak way of expressing these specific version-oriented dependencies between OBS projects. But that is fairly unlikely to impact anyone except for SUSE product developers.
  • Having to agree on a version of a gem within multiple products is almost impossible. Thus, unless you relay on using suffixes ("-1_2"), or you are less strict on the version you need (>= X.Y instead of = X.Y), you will get yourself in a lot of conflicts re that.

Bundle all gems into one rpm

This approach typically involves checking Gemfile.lock into the source tree - see section below for more information on handling Gemfile.lock.

Applications using this approach

Advantages

  • guaranteed reproducible and consistent results across environments
  • independence from other teams / projects / processes / workflows
  • bundler does all the hard work
  • consistent with upstream ruby community, doesn't reinvent wheels

Disadvantages

  • does not fit with the standard security and maintenance workflow
  • one big rpm - slow to rebuild and install
  • security/bugfix/feature updates are a PITA
    • Gemfile.lock has to be updated each time a gem is updated
    • a separate update is required for each Rails app
  • Dirk Müller says this makes problem isolation hard (SUSE internal-only)
    • no way to figure out which rubygem version change caused breakage
  • non-gem dependencies are all mixed up together

one gem per RPM plus bundle all gems into one RPM

This is a mix of the two previous ones, in order to take the advantages from each one. The key is to use one gem per RPM but use it during build time of the application RPM in order to install it in a vendor directory.

In short this means doing this: "Build Require: GEM" in the spec file of the RPM. Then, in the build section, you use bundler to install this gem in a vendor directory.

This gives us the advantages of using RPMs, in terms of maintainability, but at the same time prevents us from having to maintain rubygem RPMs for the general case, as they are only build time requirements and so they don't get installed in /usr/lib... in the system but in some hidden vendor directory.

As an example, see velum:

https://github.com/kubic-project/velum/tree/master/packaging/suse


External analysis / comparisons of the two approaches

Extend zypper to support installing gems directly

See Duncan Mac-Vicar Prett's report about a #hackweek8 project to install non-rpms (gems) via zypper.

Status update from Duncan

The project is stuck because we need a full index of gems and dependencies. libsolv needs to load the "world" in memory. Rubygems API allows to query one by one via http. We used a gzipped serialized dump of the repository that is available on the root of rubygems and wrote a solv converter, but this index (as we learned later) is not being updated or maintained.

I talked to the rubygem.org guys and they did not care. We would need to build our own index in solv format. It is not that hard (rubygems has hooks).

Then the next showstopper are gems that require -devel packages, as they are build at install time.

May be coolo's approach of building all gems in the build service automatically is a better one. Our project is still worth to explore for other things: install tarballs in opt, Java, etc.

Advantages

  • No packaging is required
  • Very simple to use

Disadvantages

  • Not yet ready for production usage.
  • Relies on an external service, but things can disappear, or be silently changed outside our control
  • Makes it hard to make changes or fix bugs
  • Makes it hard to track licensing

Don't do any packaging, just use bundler

Applications using this approach

bundler is what the majority of the Ruby community uses these days.

Advantages

  • The well-trodden, well-tested path which most of the Ruby community uses.
  • Simple to understand and use.
  • Each application can be updated independently.

Disadvantages

  • Has no standard mechanism for packaging / deployment / updates.
  • Heavier footprint when multiple applications run on a single machine.
  • Each application has to be updated independently.
  • Is dependent on rubygems.org which is a potential SPoF since it's not guaranteed to be 100% available or secure.

Bundler and Gemfile.lock

Gemfile.lock is generally required for Rails apps to run (at least, in Rails 3.x onwards).

It is possible to make a Rails app run without Bundler - and thus without Gemfile.lock - by tweaking config/boot.rb as described in this blog post, but this is not how things are usually done.

The Bundler guarantee

From the bundle-install(1) man page:

Bundler offers a rock-solid guarantee that the third-party code you are running in development and testing is also the third-party code you are running in production. You can choose to exclude some of that code in different environments, but you will never be caught flat-footed by different versions of third-party code being used in different environments.

(also mentioned on the Bundler website)

This guarantee works by using a single Gemfile.lock across all environments. Therefore all gems groups (e.g. development / test are always included in dependency calculations: even if you run bundle install with the --without=... option. For example bundle install --without=test will skip installation of gems in the test group, but those gems can still have an impact on which other gems get installed.

When / how should Gemfile.lock be generated?

This is a key question.

Possible options

  1. Single authoritative copy checked into source repo and maintained during development
  2. Multiple copies (one for each use case / context) checked into source repo and maintained during development
  3. Automatically generated at application package build-time
  4. Automatically generated at application run-time
  5. Don't use Gemfile.lock or Gemfile at run-time at all

But let's take a step back ...

When do we need Gemfile.lock?

We do dev/test/QA in a number of ways:

  • from packages (dependencies can be enforced here)
  • from ISOs
  • directly from source

Due to the last one, we need Gemfile.lock even before we build packages! So we have to decide whether we need this Gemfile.lock to be identical to the one used in production. If the project community is split across different organizations (e.g. Crowbar is developed by Dell, SUSE, and others), then a similar decision needs to be made across the whole project ...

Convergence (option 1) vs. divergence (options 2/3/4)

The project community needs to decide whether to converge on a single version of Gemfile.lock, or to diverge and test multiple versions. This was originally discussed in the context of the Crowbar project

Advantages of converging on a single version (option 1)

  • The whole community converges on one set of gem versions, eliminating time spent resolving issues caused by different versions
  • Gemfile.lock doesn't have to be generated during build process

Advantages of diverging to multiple versions (options 2/3/4)

  • Different community members / product releases have different needs; multiple Linux distributions etc.
  • Doesn't require packaging a new gem rpm for every change to Gemfile.lock
  • Lets each member of the Crowbar community decide how often they want to update gem versions
  • Those who update more aggressively can alert others to potential issues

Option 3 (Gemfile.lock automatically generated at application package build-time)

Needs bundle install --local since OBS worker VMs have no network. Therefore all gems need to be in local cache via BuildRequires, so all gems become build-time dependencies.

Three possible approaches:

  1. trim Gemfile to avoid having to BuildRequire dev/test gems
    • breaks guarantee from bundler
    • WebYaST and SLMS both do this
  2. Josef wrote a patch for bundler which excluded groups from deps calculation
    • upstream doesn't seem interested in adopting this, presumably because they don't want to break their rock solid guarantee (referenced above)
    • Ralf implemented a similar approach which uses the native Gemfile parser rather than sed.
  3. BuildRequire all gems, including dev/test
    • Creates a chicken and egg situation between BuildRequires list and Gemfile.lock. This could potentially be avoided by making rubygemsdeps.rb accept Gemfile and generate BuildRequires from it.
    • causes issues for ruby_18 / ruby_19 gem groups
      • e.g. rcov vs. simplecov, neither is available for all Ruby platforms

Option 4 (Gemfile.lock automatically generated at application run-time)

Requires removal of dev/test groups from Gemfile.

Advantages:

  • (small) guarantee could be covered by rpm dependencies

Disadvantages:

  • (small) loses guarantee that dev/test environment is same as production
  • (big) but upgrading a gem rpm for security would require new Gemfile.lock anyway

Option 5 (Don't use Gemfile.lock or Gemfile at run-time at all)

Requires tweaks to config/boot.rb and Gemfile as outlined here. This is the approach taken by Hawk as of mid- to late-2013.

Advantages:

  • Runtime dependencies only via RPM, so security updates to dependencies work with no regeneration of Gemfile.lock (because it doesn't exist).
  • Makes cross-distro builds (SLES + openSUSE + Fedora) much less irritating (subtly different versions of each dependency on each target distro).
  • Bundler can still be used during development and testing (e.g. for travis-ci)

Disadvantages:

  • Some content from Gemfile duplicated in config/boot.rb
  • Probably only viable (or most viable) if you have a small dependency set, and are happy with the versions of dependencies shipped as RPMs for the target distro.

Comparisons with approach for other languages

In concept this is a very similar problem to the challenge of packaging Python modules or Perl modules or Haskell cabal packages or. So why should this be any harder than say, Python? Perhaps our understanding of the problem and potential solutions can be strengthened by examining the approaches taken by the packagers for these other languages.

Comparison with packaging Python modules

On average Python modules seem to have more respect for the Semantic Versioning approach than Ruby gems - the latter quite often break when upgrading from version a.b.c to a.b.d. This leads to the suffixing approach which causes complexity and problems described above.

FIXME: please add your thoughts here!