openSUSE:Build Service redirector
openSUSE download redirector
To ease the download of openSUSE, the first version of the download redirector was developed in late 2005 and presented at FOSDEM 2006. This first proof of concept didn't include any of the feature that are present in today's redirector, accessible at download.openSUSE.org.
The first two iterations of the redirector were written in PHP. They were superseded by the (current) implementation in C as Apache module. There was also a prototype implementation in Ruby which can be integrated via FastCGI, however the Apache module is what we use.
Later, the project became an independent project, and can now be found on http://mirrorbrain.org/.
The current implementation is a combined effort by several people. People to name are Jürgen Weigert and Martin Polster (scanner, mirrorprobe), Peter Poeml (mod_mirrorbrain, mirrorprobe), Marcus Rueckert (the Ruby prototype). And of course lots of other people have contributed with their input, in particular Christoph Thiel, who implemented the first two iterations of the download redirector. (Those aren't used anymore, but the current implementation has its roots in their essential technical design.)
What does it do?
The goal of download.openSUSE.org is to provide an automatic and transparent mirror selection, that fits best for every user, based on his location (GeoIP) and on mirror performance. To achieve this, there is an entire framework, which forms some kind of "mirror brain", which keeps a mirror database as a "state cache" for every file on every mirror. This database is being updated continuously by mirror "scanner", which is able to crawl mirrors via ftp, http and rsync. Another essential part of the framework is the monitoring, a daemon which periodically probes the mirrors with HTTP requests, to check their online status - to ensure working redirects at any time. The key of this framework is the redirector itself, an Apache module called mod_mirrorbrain. It uses MaxMind's GeoIP, a free database that maps IP addresses to countries and regions in order to figure out the location of the requester and then query the mirror database to get a list of potential mirrors, and choose the best one.
Most openSUSE downloads (download.opensuse.org, software.opensuse.org, ftp.opensuse.org) are handled in this way. Some are not - for security reasons, certain files (like signatures) are delivered directly off download.opensuse.org. Tiny files are another exception, because an HTTP redirect would be of the same size as just delivering the tiny file itself, thus saving the client a further round trip. In addition, files that are not present on any mirror yet, or are not intended to be mirrored at all, are sent out directly. All in all, these exceptions result in about 50% of requests being redirected to mirrors.
More information about the technical implementation as well as background documentation and can be found on the MirrorBrain project page.
So how does it work?
The way how mirrors are selected might be easiest to follow when looking at some pseudocode. The algorithm goes like this:
do not redirect in certain cases: is this a request for a directory index? does the file exist? is the file too small? is the file excluded from being redirected by user agent / client IP / mime type / filemask? canonicalize filename, resolving symlinks in the path look up country and continent of client IP via GeoIP if client country needs to be treated as another country: /* New Zealand case -- pick a mirror from Australia */ client country = other country mirrors = SELECT file_server.serverid, server.identifier, server.country, server.region, server.score, server.baseurl \ FROM file \ LEFT JOIN file_server \ ON file.id = file_server.fileid \ LEFT JOIN server \ ON file_server.serverid = server.id \ WHERE file.path=canonicalized_filename AND server.enabled=1 AND server.status_baseurl=1 AND server.score > 0 results example: +----------+----------------------------+---------+--------+-------+---------------------------------------------------+ | serverid | identifier | country | region | score | baseurl | +----------+----------------------------+---------+--------+-------+---------------------------------------------------+ | 14 | ftp.ale.org | us | NA | 100 | http://ftp.ale.org/pub/mirrors/opensuse/opensuse/ | | 18 | ftp.fi.muni.cz | cz | EU | 10 | http://ftp.fi.muni.cz/pub/linux/opensuse/ | | 23 | ftp.iasi.roedu.net | ro | EU | 10 | http://ftp.iasi.roedu.net/mirrors/opensuse.org/ | | 41 | ftp.uni-heidelberg.de | de | EU | 100 | http://download.uni-hd.de/ftp/pub/linux/opensuse/ | | 44 | ftp5.gwdg.de | de | EU | 200 | http://ftp5.gwdg.de/pub/opensuse/ | | 44 | ftp5.gwdg.de | de | EU | 200 | http://ftp5.gwdg.de/pub/opensuse/ | | 70 | ftp.nux.ipb.pt | pt | EU | 50 | http://ftp.nux.ipb.pt/pub/dists/opensuse/ | | 74 | mirrors.uol.com.br | br | SA | 50 | http://ftp.opensuse.org/pub/opensuse/ | | 79 | ftp.halifax.rwth-aachen.de | de | EU | 100 | http://ftp.halifax.rwth-aachen.de/opensuse/ | +----------+----------------------------+---------+--------+-------+---------------------------------------------------+ for mirror in mirrors: /* use the "score" to give each mirror a weighted randomized rank */ mirror->rank = (rand()>>16) * ((RAND_MAX>>16) / mirror->score) if memcache daemon knows combination of this client ip and this mirror id: /* client got this mirror before */ chosen = mirror if country of client is same as mirror: put country into country pool else if continent of client is same as mirror: put country into region pool else put country into world pool if country pool is not empty: chosen = find lowest ranked mirror(country pool) else if continent pool is not empty: chosen = find lowest ranked mirror(continent pool) else if world pool is not empty: chosen = find lowest ranked mirror(world pool) else: do not redirect, send the file ourselves store combination client ip <-> mirror id in memcache daemon if metalink_requested: send metalink else: do the redirect
Once a mirror is selected, the redirector returns an HTTP status code 302 (Found) and includes a Location header with the redirection URL, which makes the requester go there. If no mirror is known for a given file, the server will simply deliver the file itself.
There are some important exceptions. For certain files it is hard to make sure that they are current on all mirrors, because they change frequently. Thus, the server doesn't redirect for such files.
In the past, we used "mirror stickiness":
Once a client had been redirected to a certain mirror, it was redirected to the same mirror again on the next request, and not to another randomly chosen one.
This configuration proved to have no benefit over just randomly assigning mirrors, so it is no longer active (since early 2008 I think).
The redirector generates metalink files (see http://metalinker.org). Enabled clients can automatically fail over in case of problems, or even download in parallel. A metalink is returned whenever ".metalink" is appended to the URL of a file to be downloaded.
The following blog post explains how to make best use of this: http://lizards.opensuse.org/2008/12/16/best-way-to-download-opensuse/
The redirector supports injection of verification hashes and PGP signatures into the metalinks, and it does include them for most larger files, like iso images.
More info about using metalinks to download openSUSE can be found here.
The "central" manner of distribution files has an interesting implication - it allows us to gather interesting data about which files are requested which we otherwise couldn't do. Therefore we have an additional apache module in place. It collects statistics about downloads of individual Build Service packages. In principal, this module splits up path and file name and logs the components resp. increases counters in an SQL database. The sources can be found here: https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-stats/mod_stats/.
Developer information and contact
- High-level overview: http://mirrorbrain.org/presentations
- Sources: http://mirrorbrain.org/code
- For mailing lists, IRC etc see http://mirrorbrain.org/communication
This product includes GeoLite data created by MaxMind, available from http://maxmind.com/
- Transferred from Build Service/Redirector