openSUSE:Mirror howto

Jump to: navigation, search
This document walks you through how to set up a mirror for openSUSE.

Walk-through

Below, I'll list steps to set up a mirror for openSUSE content. Feel free to improve this page, or simply mail feedback to ftpadmin at suse.de.

There is one big assumption made: The mirror is running openSUSE itself. This allows me to give specific directions.

If you run a different operating system, the details will differ, but hopefully this howto can serve as an example nevertheless!

  • At first, be sure that you can afford the expected traffic, and your Internet Service Provider doesn't terminate your contract!
  • packages to install:
    • rsync
    • chronyd or ntp (or systemd-timesyncd)
    • apache2-prefork or apache2-worker or nginx (or any other webserver you want to use)
  • take provision to regularly update the machine with security fixes
  • firewall:
    • if you use one, open port 80 (HTTP) and 873 (rsync).
  • general things:
    • add the hostname or IP address of a time server into /etc/chrony.d/local.conf or /etc/ntp.conf, and configure it to start (systemctl start chronyd|ntpd; systemctl enable chronyd|ntpd)
    • if unsure how to configure a NTP client, follow instructions from https://doc.opensuse.org/documentation/leap/reference/single-html/book-reference/index.html#cha-ntp
    • make sure that hostname and DNS resolution makes sense:
      • check /etc/hosts, /etc/HOSTNAME, /etc/resolv.conf
      • check that the commands 'hostname' and 'hostname -f' return something useful. A functioning hostname and name resolution are really helpful.
  • web server:
    • assuming your mirror hostname is: mirror.example.com
    • create /etc/apache2/vhosts.d/mirror.example.com.conf
<VirtualHost *:80>
   ServerAdmin admin@example.com
   ServerName mirror.example.com

   DocumentRoot "/srv/pub/opensuse"

   <Directory "/srv/pub/opensuse">
       Options FollowSymLinks Indexes
       IndexOptions FancyIndexing VersionSort NameWidth=* Charset=UTF-8 TrackModified FoldersFirst XHTML
       AllowOverride None
       Require all granted
       # Order allow,deny # prior Leap 15.3 
       # Allow from all # prior Leap 15.3
   </Directory>

   Alias /robots.txt /srv/www/mirror.example.com/robots.txt
   <Directory "/srv/www/mirror.example.com">
       Options None
       Require all granted
       # Order allow,deny # prior Leap 15.3 
       # Allow from all # prior Leap 15.3
   </Directory>

   Include /etc/apache2/conf.d/apachestats.conf

</VirtualHost>
    • for nginx, create /etc/nginx/nginx.conf
worker_processes  1;
events {
   worker_connections  1024;
   use epoll;
}
http {
   include       mime.types;
   default_type  application/octet-stream;
   gzip           on;
   sendfile       on;
   tcp_nopush     on;
   tcp_nodelay    on;
   keepalive_timeout  65;
   server {
       listen       80;
       server_name  mirror.example.com;
       access_log  /var/log/nginx/access.log;
       location / {
           root   /srv/pub/opensuse/;
           index  index.html index.htm;
           autoindex on;
       }
       error_page   500 502 503 504  /50x.html;
       location = /50x.html {
           root   /srv/www/htdocs/;
       }
       location /robots.txt { 
           add_header Content-Type text/plain;
           return 200 "User-agent: *\nDisallow: /\n"; 
       }
   }
}
    • create a robots.txt to avoid web crawlers:
      • mkdir /srv/www/mirror.example.com
      • put this into /srv/www/mirror.example.com/robots.txt:
User-agent: *
Disallow: *
    • tuning apache for high performance:
      • adjust the MPM characteristics in /etc/apache2/server-tuning.conf so that they fit the memory size of your machine. The worst thing which can happen is that it starts swapping, so Apache's maximal size needs to fit in the memory you have. The worker MPM can make better use of the available memory, however the prefork MPM is easier to configure. Watch the RSS column in ps (you can substract SHARED), and multiply it with the maximum number of processes...
      • set a low KeepAliveTimeout (decrease it to 3) in /etc/apache2/server-tuning.conf
    • rcapache2 restart; chkconfig -a apache2
  • content:
    • create a special user, and a directory to mirror to:
      • groupadd mirror
      • useradd -m -g mirror -c "Mirror User" -s /bin/bash mirror
      • mkdir /srv/pub/opensuse
      • mkdir /srv/pub/opensuse/update
      • chown -R mirror:mirror /srv/pub/opensuse
    • pick an rsync module that you want to sync up from. They are described in rsync modules. This example will use the "opensuse-hotstuff-160gb" module below.
    • add a cronjob to sync content. Here's an example for the most requested files, which we'll pull frequently (every 6 hours, after a small random offset):
1 */6 * * *    mirror   sleep $(($RANDOM/16)); rsync -rlpt rsync.opensuse.org::opensuse-hotstuff-160gb /srv/pub/opensuse/ --delete-after --delete-excluded --max-delete=4000 --timeout=1800 -hi
    • you can try the command out, and pull the initial sync (and watch it), like this:
      • su - mirror
      • rsync -rlpt rsync.opensuse.org::opensuse-hotstuff-160gb /srv/pub/opensuse/ --delete-after --delete-excluded --max-delete=4000 --timeout=1800 -hi
    • beware that this `rsync` invocation won't result in atomic repo tree update and your consumers may see hit transitional errors (FIXME create improvement with atomic-rsync)
    • use locking for the cron job, because it could potentially be long-running, and new jobs could eventually stack up. The easiest way to run the cron job under a lock is to use the withlock wrapper script. Available via a package:
      • zypper in withlock
    • now, change the cron job to run rsync under the wrapper script that takes care of locking:
1 */6 * * *    mirror   sleep $(($RANDOM/16)); /usr/bin/withlock /home/mirror/LOCK-opensuse-hotstuff  rsync -rlpt rsync.opensuse.org::opensuse-hotstuff-160gb /srv/pub/opensuse/ --delete-after --delete-excluded --max-delete=4000 --timeout=1800 -hi


  • give the openSUSE scanner access, by setting up an rsync server:
    • (rcrsyncd start; chkconfig -a rsyncd)
    • add the following to /etc/rsyncd.conf:
 [opensuse]
         path = /srv/pub/opensuse
         comment = rsync access for openSUSE scanner
         uid = nobody
         # if you want to limit access to the openSUSE mirror scanner:
         #hosts allow = 195.135.220.0/22


  • tell the redirector about it
    • write mail to admin at opensuse org, providing your details, as explained here: register your mirror
    • take appropriate care that your webserver is up! The redirector will check it every few minutes... but until the next probe happens, it will continue to redirect clients to your hosts.


  • for extra points, you can considerably increase the service quality for users by configuring cache control headers for certain content. The idea is to mark the metadata files with cache control headers that indicate that they are not served from an intermediary (proxy) cache without checking for freshness before. This greatly reduces the risk that users see inconsistencies (one file being served stale from the cache, another one served fresh from the origin server). Add this to your Apache config (outside of a directory context):
   <LocationMatch "\.(xml|xml\.gz|xml\.asc)">
       Header set Cache-Control "must-revalidate"
       ExpiresActive On
       ExpiresDefault "now"
   </LocationMatch>
    • mod_headers and mod_expires are required for this configuration. Enable them with the following commands:
a2enmod headers
a2enmod expires
rcapache2 restart


  • monitoring and mail
    • there are many ways to configure and use a mail system. What I do, is:
      • add myself to the root alias in /etc/aliases: "root: poeml@example.com"
      • make sure that sending out mail works (you might need to configure a relay). Make sure YOUR mirror isn't accepting mail from externally, which would turn it into a spam hub
      • make the sender more explicit: usermod -c "root at $(hostname)" root
      • a highly useful package is sysstat. After installation, start it (rcsysstat start; chkconfig -a sysstat). The command "sar -A | less" will show various performance data for analysis.

Things to watch out for

If the mirror syncs from our stage rsync server (stage.opensuse.org), a few points need to be observed:

  • use rsync --delay-updates --delete-delay to ensure consistent repositories
  • for large repos, --delay-updates is problematic, as it does not resume cleanly and our download-redirector does not see the files until rsync is done. Do one or more runs with --ignore-existing and without --delay-updates --delete-delay before to get most new files while keeping the repository consistent.
  • rsync needs to be run in a way that directory permissions are respected, and reproduced on the target machine. The above example takes care of that. If the permissions are not correctly reproduced, it interferes with the bitflip release process.
  • always run your mirror scripts under a user id different from the one your web server runs as. An identical user id would make all files readable for the web server, which interferes with the bitflip release process.
  • the user id running the mirror scripts also needs to be different from the user id that runs an rsync daemon
  • never run your web server as root. It also interferes with the bitflip release process.
  • if you happen to also run a public rsync server, make sure that your rsync daemon runs under a different user id than the script which pulls content from openSUSE. Otherwise you might be publicly serving content which is still "staged", i.e. not meant to be public.

See also: Conditions to access stage.opensuse.org

Protection of resources

If your mirror is very popular, it may happen that it gets substantial traffic by download clients that open too many connections. There are download clients that open simultaneous connections to grab more of your bandwidth. That's not necessarily a wrong thing in itself, but if they open too many connections (20, or even more than 100), you will have to do something against it, in order to protect your server and also to protect the resources you provide, so they stay accessible for other legitimate users.

You can see the number of simultaneous connections e.g. with this command:

rcapache2 full-server-status | grep ' W ' | sort -k 11

This command basically takes the output of the Apache server status and sorts it by IP address, making it easy to see how many connections originate from where.

There is a number of Apache modules that can be used to achieve that. Don't be confused: what you *don't* want in this scenario is connection throttling, because it would make the clients stay even longer, and occupying server slots longer. There are two modules that I can recommend:

mod_limitipconn

from http://dominia.org/djao/limitipconn.html. Packages here: http://software.opensuse.org/search?q=apache2-mod_limitipconn

This module limits connections that are handled at the same time, per IP. Example configuration:

<IfModule mod_limitipconn.c>
    <Directory /srv/pub/opensuse>
        MaxConnPerIP 20
        # exempting images from the connection limit is often a good
        # idea if your web page has lots of inline images, since these
        # pages often generate a flurry of concurrent image requests
        NoIPLimit image/*
    </Directory>
</IfModule>

The limit should not be too small, because simultaneous connections can also mean that corporate users access your site via a common proxy.

mod_ip_count

Packages are here: http://software.opensuse.org/search?q=apache2-mod_ip_count_modmemcache. Needs mod_memcache from http://software.opensuse.org/search?q=apache2-mod_memcache and a memcache daemon (http://software.opensuse.org/search?q=memcached).

This module limits the rate at which new connections are accepted, per IP.

<IfModule mod_memcache.c>
    MemcacheServer 127.0.0.1:11211 min=0 smax=16 max=32 ttl=600
</IfModule>
<IfModule mod_ip_count.c>
    # Max number of requests before failing
    MemCacheMaxRequests 800
    # Time period in which the requests have to come (seconds)
    MemCacheMaxTime 120
</IfModule>

The window we look at must be large enough so we don't block clients that download a large directory, like the openSUSE install client which downloads packages to install from 11.0/repo/i586/...

The required memcache daemon is started with 'rcmemcached start' and configured to start permanently with 'chkconfig -a memcached'.

Registering your public mirror

If your mirror is public available, your can add it to our redirector database and get it listed on the mirrors page by following the instructions on this wiki page.

See also