openSUSE:Build Service Concept GitSupport
A new draft proposal for git backend support in OBS.
Introduction
The proposed git backend enhances the currently very basic source control of the OBS by an up to date revision control system with inter-project, project, inter-package or package wide scope. The old osc client only proposal for using git repositories with osc is superseeded by the new proposal. The old approach did not allow project wide management of sources. An new system to glue building and revision control together on project and package level is needed to implement this new approach. A similar abstraction like "build repositories" for building will be introduced on project level for revision control called "source repositories".
Users familiar with git that want to understand what the design has in mind can directly take a look into the section Usecases and into the section Tests without the need to study to much feature or implementation details. This project is driven by usecase and workflow requirements of existing developments.
Previous work
In 2009 a Google Summer Of Code Project with the goal of creating a git backend for the build service was done. The project page contains some ideas, the overall project did not reach its goals.
Another recent project resulted in a tool called [bsgit]. This tool can be used to import existing build service packages into git format. The resulting git repositories contain the full revision history, and could directly be pushed into a future git backend. Most importantly, bsgit correctly converts source links (which may be based on source links themselves), and demonstrates how the source link functionality can be implemented on top of a real version control system. (Also, this tool can be used as a minimalistic build service client: commits made in the git repository can be "pushed back" to the build service. Of course a client side solution like bsgit cannot overcome fundamental restrictions of the existing backend like its non-distributed architecture, and information which the current backend cannot store is lost when using bsgit that way.)
The author gave a presentation on how git fits into the build service and the ideas behing bsgit at the openSUSE Conference 2009.
SCM
Revision control (also known as version control, source control or (source) code management (SCM)) is the management of changes to documents, programs, and other information stored as computer files.It is most commonly used in software development, where a team of people may be changing the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged. Newer designs like git typically support distributed revision control by removing the constraint of a central server where all developers must have access.
Build system
Build systems generate a set of binaries out of given sources/source trees. The dependency chain has do be fulfilled for the build to succeed. In contrast to SCMs, where the sources are a dynamic target, the build process requires a _snapshot_ of a source tree. Advanced build systems setup a separate environment for building in a virtual machine or an emulator or even both.
The git OBS source storage
The new git OBS Backend enhances the current source server with git support, it does not replace it and its current version control system. The design is envisioned to smoothless work with existing installations. Users can step by step approach the new functionality. But they can also directly start with git.
A list of highlights:
- different versions of one package can be managed in one git repository (e.g. different branches, git tree sharing) and the backend will track master or a specified branch. git push can trigger the build process - controlled via attributes
- git tree mapping heuristics programmable to enforce project wide rules (with project or package scope) already at source import
- cloned git trees with multiple packages can be used to generate multipackage projects from the such a git tree
- automated import of src.rpm (debian src) collections can be used to build git tree collections
- maintenance concept via attributes works together with git release work-flow - via attributes
- allow the automated creation of different normalized package formats (deb and rpm)
- merging/branch/cherry-pick on local tree
- BSDB backend does not need to be switched over but existing packages can be converted to git-enabled packages
Solution with BSDB storage
The current revision control system is linear and stores flat file directories without hierarchies. It is optimized for storing mostly binary files, which are used for debian or rpm packages to contain the source code and/or large collections of other data. File hierarchies or collections of many files are stored in archive files. Revision control currently works on package level, there is no project wide management of revisioned files. Files cannot be related on a revision control basis between packages.
mls: Actually a source link is exactly such a relation.
Martin: _link is considered not to be a revision control system feature but a mixture of OBS and revision control.
Keyfeatures of the git OBS storage
The git OBS source server storage changes the situation. It introduces a real git backend into the OBS source server. It allows to manage OBS projects in the same way its underlying sources are managed under release management, e.g. with branches and tags. The git OBS Backend introduces next to the usual git features in addition the following features described in the next chapters.
Below you find a drawing how git_backend fits into current components it communicates with. The src_server is the docking point for git_backend. It is serving the propose inside OBS to deliver snapshots of file collections for building.
[worker] [worker] [worker] | \|/ (pull on request) +-------------+ <- snapshots of tree for build <- +-------------+ [ src_server ] -> push to git if enabled -> [ git_backend ] +-------------+ +-------------+ /|\ \ / /|\ | \_auth through routes in api/frontend_+--------+ | e.g. http://api.f.b/git/prj/pkg | (+LDAP) osc git
- src_server: Serving source snapshots for build process/workers (cache of snapshots).
- git_backend: The git backend consists of a git server and functions to update the source snapshot which is needed for the build process.
- osc: Osc will be able to checkout the snapshots - main challenges are interoperability between the git tree and checkins made with osc.
- git: Git can use clone/fetch/push as with any other remote git server.
mls: Hmm, I'm not sure why you need a git_backend server at all. Wouldn't it be easier if the backend directly works with the git objects? The current srcmd5 would be simply be a git commit id in that case (i.e. id would be a sha1sum instead of md5).
JanSimon: Not all operations in git require actions in obs. We want to decouple this here and cache the to-be-built source archives using the src_server.
Martin: One usecase is "Workingo offline" which requires osc to clone a git clone. Skipping the git server would result in the requirement to implement parts of the git functions in OBS backend and in the osc client. Another usecase is "Coupling Source Repos of different OBS instances", which is usual with normal git instances when you work distributed.
Two level git storage
With the new git OBS Backend, a two level storage is introduced project wide. That means there is a possibility to map files on global project level and on package level. Complete or parts of the files of a project can be mapped onto a single git repository if required. Exceptions can be defined on package level to map in addition or instead one or more archive files to own git repositories. Projects can now be associated to each other by assigning parts or all files to the same git repositories, thus providing project level revision control and inter project revision control for release branches etc.
Revision controlled meta data
In addition, parts of the meta data will be put under revision control. This is to complement the possibility to put complete project trees under a release management with potentially inter-project global git repositories (mapping one git tree with multiple branches and multiple packages to more than one project). Meta data in the context of Git Support means the newly added meta data to control the mapping of git repositories and build repositories to projects and packages.
mls: I think you need to a explain a bit more here, I'm pretty confused. What exactly are you trying to accomplish? (Also keep in mind that changes of most parts of the meta data should not trigger a package rebuild.
JanSimon: We want to be able to track the changes to prj/pkg metadata withhin git, so we can inspect history and go back/pick changes from other prj/pks and so on.
Usecases, functions, constraints and testing
This chapter nails down the architecture / implementation by refining high level features with usecases, functionality and testcases. The listed usecases are directly supported by the git backend.
Usecases
The usecases below are split up sub cases extracted from works flows or use of existing git repositories when building packages.
agruen: Use cases I am missing: How do you intend to deal with source links in git? What happens when a branch is created (and how is it done)? Merging different revisions of different packages. How can a specific state be tagged (in a package and across packages)?
Martin: Proposal - A _link should still reference those revisions accessable also under git (e.g. hash or trunk). Merging should be performed with git help.
agruen: Branching and merging are basic work flows which are needed all the time. They should really be explained in detail in use cases. This document is still missing such basics.
Right now, branching is implemented based on source links. Git does not have a similar concept. (Git doesn't have an "unexpanded" view of a package, for example.) So when you say that a link should reference those revisions, then how is that supposed to happen: as a parent in the revision graph, as some magic token in a _link file, ... ? If those dependencies are expressed as part of the revision dependency graph then git's merge mechanisms will have a chance to work. If not, then what is your plan for still making merging work? Either way, could you please explain what you have in mind.?
Baselining
Baselining means that there is one (potentially fixed or tagged, from a fixed archive file) baseline with most of the sources stored and local changes (tracked by git) are used to generate automatically generated patch file sets. A baseline has a corresponding tag or branch point towards that operations like diff, merge, rebase can be done. The baseline "mode" allows to run git in a differential mode (like currently a source link) instead of storing stupidly the absolute snapshots, given with an archived source tree. Merge information is then produced in a usable manner.
Project global git repositories
A usecase for the project global git tree mapping is to store all build scripts and/or description outside of a (potentially externally cloned project wide git repo) in another project wide git tree. An example would be a project where the source comes from the project wide external git repo (cloned) and all the build spec files and patches needed therefore from another project wide patches repository. This is a project wide variant of Baselining. Find real examples under the section Testing.
Working offline
Using git supports a new working mode not possible with the current revision control: working in offline mode. To work in offline mode, the two parts of OBS must get its information from local copies. The revision control information for working offline is provided by git working offline (git is a distributed SCM and you can create local copies of a complete repo with git clone) and by storing all data needed to build also in a local copy (preloaded binary package cache, copies of build information per build repository and arch for each local package).
This mode can only be used temporarily, build dependencies will change when other maintainers change packagages build dependencies. Compared to the current situation, the use of git makes pushing the source code changes back to the server much easier.
Coupling source repos of different OBS instances
Consider two OBS systems, that have different external visibility: one is publicly accessable via the public internet, another one is in a private network and used for non public development. The public server contains many opensource packages, the non public one only a limited number of projects for development that is not yet released, and also relies on some sources in the public server. There should be a way to keep those projects that are run non public consistent with the public server, and the users should have the option to pull or push changes (mostly source code here, but also some meta date) on a case by case basis to the public server. The easiest way would be if most data in a project are under revision control (including most of the meta data), because then they can be pushed, pulled and merged.
Derive a distribution from another one
There is a reference Moblin distribution hosted at moblin.org. It is hosted in one OBS system. Now a linux distributor wants to create a linux distribution from that code base using mostly the distributions code base (that is not moblin specific but distribution specific) or they want to add a moblin addon to their linux distribution. With the OBS git support, it should be supported to merge, cherry pick and update such a package base (in this case the public moblin.org ref) into an own project (in this case the own distributions moblin addon). Some of the operations to manage this scenario should be semi automatic (to be specified which operations).
Functions
Package generation from a git tree
Generate one or more OBS packages from one or more packages in a git tree. With git, revision control and projects / packages are decoupled (and revision control gets a first class object). So the its not obviously a one to one correspondence, and also more than one package can be in one git tree. There must be an interface to express this.
Git tree generation from a package collection
Generate one or more git trees by importing a set of OBS packages with the existing OBS API. Same situation as with package / project creation as a source for the git tree. Since now more than one package can share one git repo we have the reverse situation as when packages are created from a then shared git tree. On the other hand, the existing API for creating a package (and thus, currently a BSDB source repo) must be kept compatible.
Operations of the attribute system with the git backend
The new attribute system is used to implement a development and maintainance model for building. Is provides project wide scope. With the git backend a revision control system with project wide scope is introduced. It is a usual choice to connect both to a build and source revision development and maintainance model with project wide scope. There is documentation on how the attribute system works.
Constraints
This section contains undocumented, sparsely documented, or obvious, but existing behavior of the current OBS and underlying build process (debian and rpm) that cannot be changed because it is inherently used. One issue documented by constraints is "backwards compatibility".
Compatibility
One constraint for the git backend is that it has to keep backwards compatibility towards its interfaces for the current revision control implementation. These interfaces are the OBS API, the current osc command line client interface as well as the interfaces to the build process. Unusual examples of not so obvious implicitly defined interfaces are:
- debian packaging allows to check the checksum and signature of files part of a package
- rpm packages can also check signatures or checksums of its packages by shell commands
- a package with the same meta data and the same source / build description should produce the same build result, independent of the source control or even build system
- a revision control system must reproduce the original files for each revision
- the build process gets passed the package source for debian or rpm in a specific manner evaluated in the build definition and must be kept compatible
Testing
The testing philosophy is to test cases where we have an explicit usecase or a explicitly mentioned feature hightlights, tests for backward compatibility, cornercases and tests related to functional requirements.
Usecase tests
The list of test cases from feature highlights / usecases is:
- import a linux distribution source tree (.deb as well as .rpm based) into git OBS (Moblin, Maemo, openSUSE, Fedora, Debian, Ubuntu)
- setup mapping to git branches and test effect of importing more than one release of an imported linux distribution into OBS
- create a complete project and branches from a cloned project wide multipackage git repo (KDE, Gnome, X.org)
- create a two level mapping from a external first level project wide build control git tree and second level package git trees (Fedora)
- test if baselining is working, check submit request, merge, update and pull, with readonly and writable two level git trees
- test if offline mode can be used as expected
Compatibility testcases
The list of test cases for backwards compatibility is:
- test backwards compatibility when BSDB revision history present for package
Functional testcases
- test maintenance and devel modes supported by projects with sufficient project attributes
- test the sharing of git trees
Architecture, planning and implementation
Architecture
Implementation
Data keeping
Partial documentation of data keeping for the current OBS revision control and metadata
- project global metadata
- project gloabal attributes
- revisioned package files
- package revision history log
- package meta data
- package level attributes
Extensions to data keeping for git support
- project global revisioned metadata
- project global (level 1) project wide Git revisioned files
- package level revisioned metadata
- package level (level 2) Git multi repo revisioned files
Internal revision control interface
Partial Documentation for the current OBS Interface
Relevant Internal API calls in the source server
- BSFileDB::fdb_getall(x,y)
- BSFileDB::fdb_add_i(x,y,z)
- BSFileDB::fdb_getlast(x,y)
- BSFileDB::fdb_getmatch(w,x,y,z)
- BSFileDB::fdb_add_i2(u,v,w,x,y,z)
- BSFileDB::fdb_getall(x,y)