openSUSE:Build Service Concept GitSupport Full
A new draft proposal for git backend support in OBS.
- 1 Introduction
- 2 The git OBS source storage
- 3 Usecases, functions and testing
- 3.1 Usecases
- 3.2 Functions
- 3.3 Testing
- 4 Architecture, planning and implementation
- 5 Using OBS git support
The proposed git backend enhances the currently very basic source control of the OBS by an up to date revision control system with inter-project, project, inter-package or package wide scope. The old osc client only proposal for using git repositories with osc is superseded by the new proposal. The old approach did not allow project wide management of sources. An new system to glue building and revision control together on project and package level is needed to implement this new approach. A similar abstraction like "build repositories" for building will be introduced on project level for revision control called "source repositories".
Users familiar with git that want to understand what the design has in mind can directly take a look into the section Usecases and into the section Tests without the need to study to much feature or implementation details. This project is driven by usecase and workflow requirements of existing developments.
In 2009 a Google Summer Of Code Project with the goal of creating a git backend for the build service was done. The project page contains some ideas, the overall project did not reach its goals.
Another recent project resulted in a tool called [bsgit]. This tool can be used to import existing build service packages into git format. The resulting git repositories contain the full revision history, and could directly be pushed into a future git backend. Most importantly, bsgit correctly converts source links (which may be based on source links themselves), and demonstrates how the source link functionality can be implemented on top of a real version control system. (Also, this tool can be used as a minimalistic build service client: commits made in the git repository can be "pushed back" to the build service. Of course a client side solution like bsgit cannot overcome fundamental restrictions of the existing backend like its non-distributed architecture, and information which the current backend cannot store is lost when using bsgit that way.)
Revision control (also known as version control, source control or (source) code management (SCM)) is the management of changes to documents, programs, and other information stored as computer files.It is most commonly used in software development, where a team of people may be changing the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged. Newer designs like git typically support distributed revision control by removing the constraint of a central server where all developers must have access.
Build systems generate a set of binaries out of given sources/source trees. The dependency chain has do be fulfilled for the build to succeed. In contrast to SCMs, where the sources are a dynamic target, the build process requires a _snapshot_ of a source tree. Advanced build systems setup a separate environment for building in a virtual machine or an emulator or even both.
The git OBS source storage
The new git OBS Backend enhances the current source server with git support, it does not replace it and its current version control system. The design is envisioned to smoothless work with existing installations. Users can step by step approach the new functionality. But they can also directly start with git.
A list of highlights:
- different versions of one package can be managed in one git repository (e.g. different branches, git tree sharing) and the backend will track master or a specified branch. git push can trigger the build process - controlled via attributes
- git tree mapping heuristics programmable to enforce project wide rules (with project or package scope) already at source import
- cloned git trees with multiple packages can be used to generate multipackage projects from the such a git tree
- automated import of src.rpm (debian src) collections can be used to build git tree collections
- maintenance concept via attributes works together with git release work-flow - via attributes
- allow the automated creation of different normalized package formats (deb and rpm)
- merging/branch/cherry-pick on local tree
- BSDB backend does not need to be switched over but existing packages can be converted to git-enabled packages
Solution with BSDB storage
The current revision control system is linear and stores flat file directories without hierarchies. It is optimized for storing mostly binary files, which are used for debian or rpm packages to contain the source code and/or large collections of other data. File hierarchies or collections of many files are stored in archive files. Revision control currently works on package level, there is no project wide management of revisioned files. Files cannot be related on a revision control basis between packages.
Keyfeatures of the git OBS storage
The git OBS source server storage changes the situation. It introduces a real git backend into the OBS source server. It allows to manage OBS projects in the same way its underlying sources are managed under release management, e.g. with branches and tags. The git OBS Backend introduces next to the usual git features in addition the following features described in the next chapters.
Below you find a drawing how git_backend fits into current components it communicates with. The src_server is the docking point for git_backend. It is serving the propose inside OBS to deliver snapshots of file collections for building.
[worker] [worker] [worker] | \|/ (pull on request) +-------------+ <- snapshots of tree for build <- +-------------+ [ src_server ] -> push to git if enabled -> [ git_backend ] +-------------+ +-------------+ /|\ \ / /|\ | \_auth through routes in api/frontend_+--------+ | e.g. http://api.f.b/git/prj/pkg | (+LDAP) osc git
- src_server: Serving source snapshots for build process/workers (cache of snapshots).
- git_backend: The git backend consists of a git server and functions to update the source snapshot which is needed for the build process.
- osc: Osc will be able to checkout the snapshots - main challenges are interoperability between the git tree and checkins made with osc.
- git: Git can use clone/fetch/push as with any other remote git server.
Bidirectional archive to git mappings
The first new feature to overcome the restrictions of the current source control is a means to map archive files onto git repositories. The current revision control handles source drops as binary files, which has many disadvantages. The new git OBS Backend handles archive files as one or more git trees. The mapping of these archive files onto git trees can be changed to whichever behavior is required. The mapping works in two directions: on import time into the git tree and on building/release time to generate archive files from git trees.
agruen: For us, it is a strict requirement that existing packages can be converted from or imported into the git backend while leaving all the files in each revision (tarballs, patches, changes and spec files) the same. Storing tarballs differently is *not* a problem we want addressed at this point.
Two level git storage
With the new git OBS Backend, a two level storage is introduced project wide. That means there is a possibility to map files on global project level and on package level. Complete or parts of the files of a project can be mapped onto a single git repository if required. Exceptions can be defined on package level to map in addition or instead one or more archive files to own git repositories. Projects can now be associated to each other by assigning parts or all files to the same git repositories, thus providing project level revision control and inter project revision control for release branches etc.
Revision controlled meta data
In addition, parts of the meta data will be put under revision control. This is to complement the possibility to put complete project trees under a release management with potentially inter-project global git repositories (mapping one git tree with multiple branches and multiple packages to more than one project). Meta data in the context of Git Support means the newly added meta data to control the mapping of git repositories and build repositories to projects and packages.
Usecases, functions and testing
This chapter nails down the architecture / implementation by refining high level features with usecases, functionality and testcases. The listed usecases are directly supported by the git backend.
The usecases below are split up sub cases extracted from works flows or use of existing git repositories when building packages.
agruen: Use cases I am missing: How do you intend to deal with source links in git? What happens when a branch is created (and how is it done)? Merging different revisions of different packages. How can a specific state be tagged (in a package and across packages)?
Baselining means that there is one (potentially fixed or tagged, from a fixed archive file) baseline with most of the sources stored and local changes (tracked by git) are used to generate automatically generated patch file sets.
agruen: Please clarify: who does what here, and with which outcome?
Project global git repositories
A usecase for the project global git tree mapping is to store all build scripts and/or description outside of a (potentially externally cloned project wide git repo) in another project wide git tree. An example would be a project where the source comes from the project wide external git repo (cloned) and all the build spec files and patches needed therefore from another project wide patches repository. This is a project wide variant of Baselining. Find real examples under the section Testing.
openSUSE git kernel repository
The openSUSE / SLE kernel git repository can be used directly as source for the kernel package build. Several branches are used for different distribution revisions. The feature to track branches of the same git repository for different packages will apply. This git repo is also an example usecase of Baselining. See further information on the content of this repository at http://en.opensuse.org/Kernel_Git.
agruen: This is *not* our goal at this point, and should not be a use case for this project. We are fine with sticking with the current mechanism of submitting snapshots (generated by running scripts/tar-up.sh in the kernel git repo) the build service at this point.
Using git supports a new working mode not possible with the current revision control: working in offline mode. To work in offline mode, the two parts of OBS must get its information from local copies. The revision control information for working offline is provided by git working offline (git is a distributed SCM and you can create local copies of a complete repo with git clone) and by storing all data needed to build also in a local copy (preloaded binary package cache, copies of build information per build repository and arch for each local package).
This mode can only be used temporarily, build dependencies will change when other maintainers change packages build dependencies. Compared to the current situation, the use of git makes pushing the source code changes back to the server much easier.
Importing non git repositories
Not all source code repositories use git. To bring multipackage repositories from CVS or Subversion into the git backend, use existing version control repository converters for CVS to git and Subversion to git.
agruen: In this project, we are interested in importing existing revisions of packages into the git backend. We are not interested in fundamentally changing how packages are managed though, we want to keep their current structure: tarballs + patches + spec files + changes files. As such, the problem of importing upstream repositories does not exist.
Fedora package repository
Generate all packages in a project from the Fedora package seed repository. They are currently managed inside CVS, and can be easily converted to git and imported. See further information on the content of this repository at http://fedoraproject.org/wiki/Using_Fedora_CVS. The different Fedora releases and the development tree are in different CVS branches, which can be mapped to intra-project mappings. A move to git is under discussion for the fedora package base, with a year 2010 timeline, and might be delayed after Fedora 13 release.
agruen: Again, not a goal of this project.
Coupling source repos of different OBS instances
Consider two OBS systems, that have different external visibility: one is publicly accessable via the public internet, another one is in a private network and used for non public development. The public server contains many opensource packages, the non public one only a limited number of projects for development that is not yet released, and also relies on some sources in the public server. There should be a way to keep those projects that are run non public consistent with the public server, and the users should have the option to pull or push changes (mostly source code here, but also some meta date) on a case by case basis to the public server. The easiest way would be if most data in a project are under revision control (including most of the meta data), because then they can be pushed, pulled and merged.
Derive a distribution from another one
There is a reference Moblin distribution hosted at moblin.org. It is hosted in one OBS system. Now a linux distributor wants to create a linux distribution from that code base using mostly the distributions code base (that is not moblin specific but distribution specific) or they want to add a moblin addon to their linux distribution. With the OBS git support, it should be supported to merge, cherry pick and update such a package base (in this case the public moblin.org ref) into an own project (in this case the own distributions moblin addon). Some of the operations to manage this scenario should be semi automatic (to be specified which operations).
Package generation from a git tree
Ability to use a git tree as source for the package's source code.
agruen: Please clarify.
Git tree generation from a package collection
Ability to import sets of packages into a git tree/git trees.
agruen: Please clarify.
Operations of the attribute system with the git backend
The new attribute system is used to implement a development and maintainance model for building. Is provides project wide scope. With the git backend a revision control system with project wide scope is introduced. It is a usual choice to connect both to a build and source revision development and maintainance model with project wide scope. There is documentation on how the attribute system works.
The testing philosophy is to test cases where we have an explicit usecase or a explicitly mentioned feature hightlights, tests for backward compatibility, cornercases and tests related to functional requirements.
The list of test cases from feature highlights / usecases is:
- import a linux distribution source tree (.deb as well as .rpm based) into git OBS (Moblin, Maemo, openSUSE, Fedora, Debian, Ubuntu)
- setup mapping to git branches and test effect of importing more than one release of an imported linux distribution into OBS
- create a complete project and branches from a cloned project wide multipackage git repo (KDE, Gnome, X.org)
- create a two level mapping from a external first level project wide build control git tree and second level package git trees (Fedora)
- example work-flow for kernel tree with the openSUSE / SLES public git repository
- test if baselining is working, check submit request, merge, update and pull, with readonly and writable two level git trees
- test if offline mode can be used as expected
The list of test cases for backwards compatibility is:
- test backwards compatibility when BSDB revision history present for package
- test maintenance and devel modes supported by projects with sufficient project attributes
- test the sharing of git trees
Architecture, planning and implementation
Planning and TODO list
There is a separate Planning and TODO list
- Project Scope Git repositories signalling
- Package Scope Git repositories signalling
- Project Scope Metadata Git repositories signalling
Details to be specified
List of Details to specified:
Q: Where to keep the config data: config file or attribute system
A: 2-level git requires attribute-system or "meta prj". packages can work with plain meta file.
Explanation: A project in the build service has no file storage associated as a package does for tarballs/spec. Thus the git-metadata for projects need to be stored either alongside the project's obs-metadata (osc meta prj) or in the new attribute system.
Q: How could project-wide git metadata look like:
A: git projectwide metadata
Q: How could package git metadata look like:
A: git package metadata
Q: Whats meant by 2-level git ?
A: "2-level git" describes a project/package setup where the project metadata (obs-meta and project-wide git-meta) are stored in a "build-config" git tree and the actual package sources in one or multiple "source" git trees. That's the OBS+git-backend's version of fedora's build cvs tree. The git-backend server will have to fetch and deploy:
- prjconf (see osc meta prjconf)
- prj (see osc meta prj)
- git_prj_config or attributes (tbd.)
The git_prj_config holds then subsequent information for the "build-config" tree where the individual packages get their data and also the default setting for the "source" git tree, if not overwritten in the package. Example: Gnome or KDE repository could use a git-tree for the "build-config" related files and their projects own git tree for the sources. Thus separating the distribution build definitions and patches from the sources.
Q: osc build for the case of a git-only package
A: create tarball for build with osc on-the-fly
Q: workdir setup for osc if git is used (expanded/cloned)
A: it could look like:
.osc <dir> package <dir> = cloned git tree package.tar.bz2 package.spec package.patch
Q: commit from osc if git is used with in-tree specfile
A: split-up/merge specfile
Q: integration of process with osc A: osc git sub-commands
- osc git initrepo
- osc git setrepourl
- osc git
Q: submitrequest mapping osc -> git tree A:
- push to git as branch
- hermes messages integration tbd
Q: Security considerations if working with external source repos ?
A: All git operations are done in an own chroot environment - similar to the workers or the src_service.
List of investigation points:
- git server and access through webdavs (access through api routes - e.g. http://api.linux.com/git/prj/pkg )
- hooks for git server
- hook sends event to git backend
- ldap integration possibilities
- osc log in case of preexisting BSDB
- import of git history to osc log
- payload only checksum in tar files
List of crosscheck Questions:
- Q01: How to signal git revision controlled binary archive files
- Q02: How to signal archive files be reproduced payload or binary identical (latter requires binary copy). They are required by build (e.g. debian with checksum/signature)
- Q03: How far should revision control for meta data been realised
- Q04: Does performance degrade significantly by mapping archives to git trees and do we thus need caching
- Q05: How can binary files be compacted best for multiple revisions