openSUSE:Build Service Concept Trust

Jump to: navigation, search

Status of this document

This document describes the preliminary concept for the integration of some metrics into the openSUSE build service. It is based on Adrian's brain dump (in old en wiki) and inspired by many good talks with co-workers at SUSE and colleagues within the open source community. Please note that this concept only proposes possible metrics that may be implemented. Based on the feedback these will be chosen in the next weeks.

All ideas and proposals are with a strong focus on the openSUSE Build Service, but they should be utilisable in other infrastructure parts of openSUSE. This document does not describe the trust implementation.

Despite the fact that Marko's thesis will focus on the proposed trust metrics, this paper proposes further characteristics (quality & maintenance) which came up during the interviews Marko did at SUSE.

Feedback welcome!

Aim of the project

The openSUSE build service offers everyone the opportunity to build packages for many Linux distributions with relatively little effort. Hence the amount of available versions and variants per package is comparatively high. Therefore we need a powerful but also simple instrument to evaluate these packages, which are immediately available at the openSUSE software portal.

This paper proposes some prospects for gaining many different values which can be classified in three main categories:

  • trust
  • ratings & acceptance
  • quality & maintenance

For each category we can compute an accumulated value based on the sum of the weighted singled values, which can be represented for the user. An even more generic final score may also be computed. As a possible extension to this, an authenticated user should be able to configure individual scores based on the existing metrics to fulfil his individual needs.

The whole project is mainly based on information retrieval, data mining and statistics, but also includes different aspects from other research areas.

We intend to get as much software evaluated by this system as possible. Therefore, it is essential that the participation in this rating is as attractive as possible for the openSUSE build service community. Since a diverse group of people like the open source community will most likely not totally agree on all implemented metrics, each user can disable the whole system and may explicitly prohibit the computation of each possible value.

Despite that all proposed values make an effort to evaluate a package and not its producers, this project does a lot of data mining. Therefore, it has to be evaluated if there are any violations of privacy. In addition to this, all metrics which would enable performance measurement of Novell employees should be disabled by default as soon as the employee flag is set for an account.

Short summary of possible metrics

Trust

Trust is a relationship of reliance. A trusted party is presumed to seek to fulfil policies, ethical codes, law and their previous promises. [1]

Because trust is of highly individual nature and extremely subjective, a widely accepted global trust level is hardly to derive for the process we would like to measure. Therefore, we need a commonly accepted method for characterizing a person's actions within the openSUSE build service.

Please note that all following metrics can not assure that someone who has malicious intentions and spares no effort, can also gain a high rating.

Criteria for trust rating

Everybody has his own criteria for defining what is trustworthy or not. In terms of software packaging it became clear within discussions with many fellow workers at SUSE that the three main factors are:

  • Carefulness
  • Reliability
  • Reactivity

Because the weighting between these and further factors strongly depends on personal preferences, the openSUSE build service should enable the user to configure the relevance of each available value to his personal needs.

Metrics

  • Code of conduct
    Binary value if the user accepted to follow some guiding principles.
  • Proven identity
    The user has identified himself to the openSUSE project.
  • Action based ratings
    Ratings for the user's requests (Imagine a feedback mechanism similar to a well known online auction and shopping site.).
  • Trust network
    Each user can assign trust levels to other users and projects.
  • Dependable relationships:
    • affiliation state
      A person may have one or more affiliations to openSUSE, e.g. by being employeed or beeing a business partner of Novell. Another example is an openSUSE membership.
    • upstream affiliation
      A person may have a relation to upstream development, e.g. being developer, maintainer or packager.
  • Reviews
    Each package may be reviewed on several criteria.
    • Specfile reviewed by openSUSE maintainer
    • Code & patches reviewed by openSUSE maintainer
    • Code & patches reviewed by openSUSE security team
  • Official manufacturer tags
    Tag software as supplied by its manufacturer to enable users to use only official repositories.

Ratings & Acceptance

This is probably the most important category because it is the most interactive one and every user of the software portal may be able to contribute to it.

Metrics

  • Popularity/Acceptance
    • Amount of Package Downloads
    • Amount of Repo-Metadata Downloads
      (Both values should be available as total, per time period, per version or per release.)
  • User ratings

Quality & maintenance

Metrics

  • rpmlint & lintian/linda
    Derive some quality metrics from rpmlint or lintian/linda results
  • Package history / version numbers
    Check update frequency of a package against upstream updates.
  • Availability for several architectures and products
    Keep a history of all builds on all architectures and products.
  • Changelog
    Parse the changelog for cross references and extract some minimal information from the referenced bug trackers.
  • Activity/vitality
  • Dependencies and reverse Build dependencies

Not available metrics

  • Bug statistics
    • # open bugs
    • # blocker bugs
    • # critical bugs
    • # major bugs


Detailed metric descriptions

Trust

Methods for measuring trust

Code of conduct

If a user signs the code of conduct (some guiding principles for openSUSE and the openSUSE build service), he states that he takes his work serious and his contributions have the main goal to bring a benefit to the whole community.

Although this is a binary value, it shows whether or not someone grants to respect to the basic rules.

Proven identity

A user that identifies himself in a trusted way, by sending e.g. his work/living address, contact details or a copy of his id, in a trusted way to a trusted entity makes himself much more responsible for his work. A SUSE/Novell employee has done this identification by signing his work contract, but others shall be able to get on the same level by sending their id to a review board or an adequate standardized process.

Several options are possible. For example, we could adopt the assurance process of CAcert.org, where a new user has to get approved by several existing ones, before he is able to create certificates. Another possibility is to request a user to submit his public GNU Privacy Guard key, which has to be signed by a given amount of well known openSUSE build service users.

Action based ratings

Motivation: A person which is well known for doing great stuff within one or more projects is more trustworthy than a newbie doing his first steps in packaging.

With its open design, the openSUSE build service encourages its users to collaborate and share their knowledge. The collaboration process is mapped on so called submit requests, properly meaning a user requests the addition of changes from a package to another one - typically from another project. The receiver of a request evaluates the submitted modifications and then chooses to accept or decline the request.

We would like to add a rating to the submit request (and prospective ones), so that every user can rate all his previously handled requests on a simple scale (excellent - worse). These ratings are stored in a central database containing some additional information (e.g. sender, source project, source package, receiver, target project, target package), so that two ratings, one with respect to a given project and an overall score, can be derived as soon as a sufficient sample quantity is available.

These ratings can for example be displayed by the openSUSE build service clients with each new request or we can compute a per person rating for each project.

Trust network

Depending on the chosen implementation for identifying users, this value may also show how well known a user is within the openSUSE build service community. Therefore, a non binary value would make this fact more meaningful. Optionally, we can split this into a boolean value: has the user proven his id in a standardised process and a kind of social network where users may add rated links between each other or not? These ratings are highly subjective but we should use a well known scheme, like the GNU Privacy Guard's levels 'unknown', 'none', 'marginal' and 'full'.

As an alternative to the GNU Privacy Guard approach, the Advogato Trust Metric should be considered. It is based on a Capacity Constrained Flow Network [2] and should therefore be more resistant against a massive attack. It is not that hard to understand and quite fast to compute.

Dependable relationships: affiliation state

If a group of people (this can be an open source project or a company) signs a contract to be reactive on reports and handle upcoming issues for a defined time frame for a repository, they prove their intent to maintain their software packages. Therefore, a given workforce has to be guaranteed during the term of the contracts. If an individual leaves the group, the remaining group has to nominate a substitute or the contract has to be suspended until the team is complete again. There should be a few levels mainly depending on the reaction time and team size. Employees of allied companies should fit into this relationship scheme.

Novell employees have an even more dependable relationship.

There should be about three to five levels for this relationship:

  • Novell employee
  • Novell contractor
  • openSUSE member
  • no relation

Dependable relationships: upstream affiliation

A person or a team with a strong connection to upstream may have a much higher interest in maintain their packages than people without any relation to the project. This may also apply for the quality of the packages.

Due to the fact that a user may be maintainer of one or more openSUSE build service projects, there should be information per project available whether or not he has a relation to the upstream project, e.g.

  • developer
  • maintainer
  • packager
  • no relation

Official manufacturer tags

Some users only want to install manufacturer provided software. Therefore we should add manufacturer tags for each project which are maintained by a team of bureaucrats. A user may then choose between all available tags from which vendors he would like to install software on his systems.

An initial approach may be implemented by assigning a set of signing keys to manufacturers or something similar.

User ratings & acceptance

This is probably the most important category beneath trust because it is the most interactive one and every user of the software portal may be able to contribute to it.

Methods for gaining user ratings and further available data

User ratings

Similar to Wikia Search I would like to add some AJAX based rating and comment functions to the software portal.

A user should be able to rate without any barriers like the need to login or similar for each presented package. This rating should be instead of the relevance of this package for this search result like in Wikia the overall feedback of the user on this package. The more stars, the better. In addition to this the user may give one package the spotlight tag, meaning this package should get additional attention. Short single line comments which may be deleted afterwards should be available to registered users. The may add some nice hints or warnings for other users.

Some administrative buttons like report a bug or alert security team should complete the interactive search result.

Popularity/acceptance (download statistics)

"Survival of the fittest" also applies to packages and repositories meaning the users only selects those repositories which do not break their system and bring them great benefits.

The downloads.opensuse.org redirector mechanism is a valuable source for measuring a repository's or a package's popularity. We should compute time dependent statistics on the amount of package downloads (per package and repository) and metadata downloads (per repository).

Quality & maintenance

Without a strong connection between packages and bugs, it is nearly impossible to get quality and security metrics for packages. Therefore, I strongly recommend to integrate a possibility to assign a bug to a package, like all other major distributions do [3] [4] [5] in Novell's Bugzilla. On the other hand, the build service team should consider creating its own bug tracker because it may not be desirable to have all bugs for all packages from the openSUSE build service within Novell's bug tracking system. The preferred way would probably be an own bug tracking system for each build service project.

However, some metrics for the package quality can be derived by automated tests and similar stuff.

Methods for measuring quality

rpmlint & lintian/linda

There exist automated tools to check a package against common errors and policy violations. Hence a package without any failures reported by these automated checks likely has a much higher quality than one with several complaints.

rpmlint is a tool for checking common errors in rpm packages. The following checks are included in the default rpmlint package:

  • Tag checks (TagsCheck): a check to see if some rpm tags are present.
  • Distribution specific checks (DistributionCheck): checks the distribution specificities in a binary rpm package.
  • Binary checks (BinaryCheck): checks binary files in a binary rpm package.
  • Configuration file checks (ConfigCheck): checks if all configuration files are in place.
  • Location, permission, group and owner checks (FileCheck): tests various aspects of files: locations, owners, groups, permissions, setuid, setguids, etc.
  • Signature checks (SignatureCheck): checks the presence of a PGP signature.
  • FHS checks (FHSCheck): checks FHS conformity.
  • Source specific checks (SourceCheck): verifies source package correctness.
  • i18n checks (I18NCheck): tests for i18n bugs.
  • Menu system checks (MenuCheck): tests against the menu system.
  • %post; %pre, %postun and %preun script checks (PostCheck): plenty of checks agains the post/pre scripts.
  • /etc/rc.d/init.d checks (InitScriptCheck): checks init scripts.
  • Spec file checks (SpecCheck): various tests against the spec file of a source rpm.
  • Zip/Jar file checks (ZipCheck): verify Zip/Jar file correctness.
  • Pam configuration file checks (PamCheck): checks for correct PAM configuration.
  • Rpm file checks (RpmFileCheck): checks the filename length (!>64).
  • Branding checks (BrandingPolicyCheck): verifies that branding related things comply.
  • Desktop translation checks (DesktopTranslationCheck): searches untranslated desktop files.
  • Documentation files dependency check (DocFilesCheck): %doc files must not introduce new dependencies.'
  • Duplicates check (DuplicatesCheck): checks for duplicate files packaged separately.
  • KMP policy check (KMPPolicyCheck): verifies that kmps' have proper dependencies.
  • Library policy check (LibraryPolicyCheck): verifies whether shared library packages comply with the corresponding packaging policy.
  • LSB checks (LSBCheck): checks for LSB compliance.
  • Naming policy checks (NamingPolicyCheck): verifies the correct naming for a few packages (e.g. pyhton, perl, php, ruby, apache).

A first implementation should only check whether any test failed and possibly how many problems have been found. As a subsequent improvement, I would suggest to add a parser to identify the failed tests and to further add a severity measure for the identified problems.

The corresponding Debian tools are lintian and linda. They perform various checks against the Debian Policy Manual and thus they should be integrated into the Debian packaging process of the build service. (See FATE request #304951.)

Possible further data sources

  • metadata
  • changelog
  • tests from submitpac

Methods for measuring maintenance

Package history / version numbers

Some users have a high demand for bleeding edge software. On the other hand, many users just want to get (security) bugs fixed without any further changes or feature additions.

To derive a metric for these demands and for the estimation how active a package is maintained, we should keep a history of all software versions including some crucial dates (first upload, first successful build, first build passing all automated tests). In combination with a link to the upstream repository for new releases (e.g. CVS, SVN, Git, http/ftp download URL), we could derive some interesting metrics:

  • Time t between two upstream updates
    Δ (tupstream version i; tupstream version i-1)
  • Time t between two package uploads/check-ins
    Δ (tupload version i; tupload i-1)
  • Time t between two successful package builds (with check failures)
    Δ (tbuild version i; tbuild i-1)
  • Time t between two perfect package builds
    Δ (tperfect version i; tperfect i-1)

In combination with a meaningful weighting function we could derive long term statistics including variance and expectation value for each of these time parameters.

In addition to this we can compare these values with each other to get even more significant values:

  • Time t between an upstream update and the corresponding package upload
    Δ (tupstream version i; tupload version i)
  • Time t between an upstream update and the corresponding package build (with check failures)
    Δ (tupstream version i; tbuild version i)
  • Time t between an upstream update and the corresponding perfect package build
    Δ (tupstream version i; tperfect version i)

The value how long a package is broken may probably also be in the interest of the user:

  • Time t between a package break and a successful or perfect build
    min{Δ (tbreak i; tbuild j); Δ (tbreak i; tperfect j)} with j=>i.

Due to the nature of the distributed development process a package can be broken for a decent time as long as a prior working version is available. It has to be differentiated between a build problem caused by a build dependency or a build problem caused by the package itself.

As a future improvement we could also regard the left out package versions and add a weight considering major/minor numbers of the upstream versions.

Availability for several architectures and products

A good packager takes care of his packages for all shipped architectures (like i586, amd64, s390, ppc) as well as older (not discontinued) products (e.g. openSUSE 10.2, openSUSE 10.3). In addition to the history described above we should enhance the version and build history to observe all platforms and products.

Derive metrics from a package's Changelog

Amongst other things the SUSE Package Conventions define some basic rules how to cite major bug trackers like Novell Bugzilla, GCC, Gnome or KDE in a package's Changelog. They syntax mainly consists of a token for a bug tracker followed by a hash and the bug number enclosed by parentheses, e.g. (GCC#4711), (bgo#0815) or (KDE#2342)

In addition there is also a defined format for citing CVE© identifiers. CVE© is a publicly available and free to use list or dictionary of standardized identifiers for common computer vulnerabilities and exposures. Changelog entries which contain CVE identifiers most likely contain a security fix or at least deal with the given threat.

Similar to the given metrics for the upstream and package version histories we could parse the Changelogs for cross references and extract some minimal information from the bug trackers and calculate some values:

  • amount of fixed/covered bugs per package version
  • long time average of fixed/covered bugs per version
  • average time between bug report and fix per bug
  • average time between bug report and fix per package version

To get significant values we should observe them in combination with the above mentioned version histories. We should also calculate variance and expectation values where feasible.

Activity index / vitality

TODO

Reverse Build dependencies

A rule of thumb is that a package is much easier to maintain, if it has few build dependencies. The more build dependencies, the easier the package may break and the more effort of the packages may be necessary.

Even though this is a weak indicator for good maintenance this value is very important for several users of the build service. For instance the release managers strongly depend on this value for their decisions which packages to update in a late release process or not.

Possible further data sources

  • Introduction of a new base repository
    • How quick do packagers add a new base repository?
    • How quick do they gain a high percentage of packages successfully build against it?
  • Evaluate tags for new package versions (e.g. optional, recommended, security)
  • Continuity (Measure how long someone is participating in projects. Extremely weak indicator?)

See also