Libzypp/Downloading Metadata

From openSUSE

Contents

Downloading Metadata

dmacvicar@suse.de

Metadata Size

 .------------------------------------------.
 |          metadata size (factory)         |
 +------------------+----------+------------+
 |                  | YUM      | SuSETags   |
 +------------------+----------+------------+
 | Basic metadata   | 41 (6.9) | 19 (2.9)   |
 | Translation      |          | 3.5 (0.69) |
 | File lists       | 169 (45) |            |
 | Changelogs, misc | 193 (16) |            |
 | Size info        |          | 18 (2.7)   |
 | Patterns         |          | 1.2 (0.9)  |
 '------------------+----------+------------'

Whole SUSETags is 59M, but this can be reduced to 11M compressing it. Uncompression takes 0.7s in a PIV machine.

Whole YUM is 67M, or 402M uncompressed. Of course this includes data like changelogs and filelists.

On a 1:1 comparision, only basic metadata, SUSETags gets 19M (2.9 compressed) while YUM takes 41M which become 6.9 with compression.

Downloading using diffs

This is a short example how much download time can be saved using diffs. I am mixing the YUM and SUSETags metadata here, just to show both cases.

On day 1, the sizes and checksums of the metadata files are:

22d78028896a6fbe5c07dcf4a5369b908bb93c29  16M 2007-04-11 12:03 packages
00b4c3d389c408ac4ebc51fb61e2fd2f8b9d414b 3.8M 2007-04-11 12:02 packages.en
ceb7bf50874a818292a598b9dd42a6a9ff1b694c  42M 2007-04-11 13:37 primary.xml

You can see how much you can save with compression on the table above.

24 hours later, the server has updated metadata:

5f994031e1cc6b2df2326d648e677920cc7855d2  16M 2007-04-12 10:57 packages
0045e7b224ead8a86c072506434639afa4248607 3.8M 2007-04-12 10:57 packages.en
eab53e6f9e0dfa250f66615e469d3404d32c169f  42M 2007-04-12 12:06 primary.xml

If the checksum of a file does not change, ZYpp will use the old copy when refreshing, but lets see how much download can we save if we could use diffs:

 66K packages.diff
 19K packages.diff.gz
 13K packages.en.diff
2.5K packages.en.diff.gz
873K primary.xml.diff
111K primary.xml.diff.gz

This is orders of magnitudes smaller.

Lets explore how other distributions do it.

An overview to the Debian metadata

Debian separates metadata by architecture. This does not means you have different repos per architecture but every repository has different "components" and architectures.

For 2126 packages, the main Packages index file takes 2.3M or 563k uncompressed.

Every repository has a Release package which lists the components and their checksums.

 Origin: Debian
 Label: Debian
 Suite: experimental
 Codename: experimental
 Date: Wed, 14 Feb 2007 08:07:57 UTC
 NotAutomatic: yes
 Architectures: alpha amd64 arm hppa hurd-i386 i386 ia64 m68k mips mipsel powerpc s390 sparc
 Components: main contrib non-free
 Description: Experimental packages - not released; use at your own risk.
 MD5Sum:
 4f0696c3e8428d752b98a4d7fe0e08cd  1834780 main/binary-alpha/Packages
 935dcbba1898a1324922a7313b74a417   445825 main/binary-alpha/Packages.gz
 26e574117442c5401f4cc97fe2301950     2023 main/binary-alpha/Packages.diff/Index
 2cf9014ee1a87db4d7b9f2c8d8759946      105 main/binary-alpha/Release
 9230dfa95b115e65d97318e8125e0e67  2184043 main/binary-amd64/Packages
 a582915db631a51e2f867a582e2722da   519361 main/binary-amd64/Packages.gz
 a748ccc4c9ab8f05e6df2bbbf8b83b2b     2023 main/binary-amd64/Packages.diff/Index
 0ecdc48ea606f45ff45b99fa2f4f0e9b      105 main/binary-amd64/Release
 ....

Also, there is a tag index:

 a2ps-perl-ja Tag culture::japanese, implemented-in::perl, interface::commandline, role::program
 ...

There is a Release file per component:

 main/binary-i386/Release:
 Archive: experimental
 Component: main
 Origin: Debian
 Label: Debian
 NotAutomatic: yes
 Architecture: i386

The packages file is much like SUSETags packages:

 Package: abook
 Priority: optional
 Section: mail
 Installed-Size: 328
 Maintainer: Gerfried Fuchs <alfie@debian.org>
 Architecture: i386
 Version: 0.6.0~pre2-1
 Depends: libc6 (>= 2.3.6-6), libncursesw5 (>= 5.4-5), libreadline5 (>= 5.1), debconf (>= 0.5) | debconf-2.0
 Filename: pool/main/a/abook/abook_0.6.0~pre2-1_i386.deb
 Size: 79952
 MD5sum: d089ffe34367694faa534351b441970a
 SHA1: 7ee7385f2f9313401db5cd9e6c4a9e253e550f88
 SHA256: 59b2229197f91d3bdbb996dbebee8691278c1cfe0c585490b41d30d947fd9247
 Description: text-based ncurses address book application
 abook is a text-based ncurses address book application. It provides many
 different fields of user info. abook is designed for use with mutt, but
 can be used independently.
 Enhances: mutt

But, they use a diff system. There is a Packages.diff directoy that contains diffs from older versions to the current.

 main/binary-i386/Packages.diff
 2007-02-07-0808.30.gz   07-Feb-2007 02:09     2k  
 2007-02-07-2007.43.gz   07-Feb-2007 14:09     4k  

The index file present there specifies how older version of the packages file map to the current version using a diff.

 main/binary-i386/Packages.diff/Index
 
 SHA1-Current: 4513eca848a39e25934e2021b2d25dd9d59e4c49 2380440
 SHA1-History:
 060d4f82cf112a62efda326409b624446ff3be75 2253607 2007-02-07-0808.30
 a0f6f319c4df20593e78a7292bb61f65edf486ee 2257342 2007-02-07-2007.43
 ...
 SHA1-Patches:
 2df8a47ab1474e5f0c93c6c9fe3417b36d64e6f7    4568 2007-02-07-0808.30
 984b3944951cafb4ccc5ffd11d065ff5773e4fe6   11315 2007-02-07-2007.43
 ...

Using zsync as the backend

zsync ( http://zsync.moria.org.uk/ ) is a client implementation of the rsync protocol. It only requires to run a command on the server to generate a .zsync file per "big" file, and then you can retrieve the big file using the .zsync file and a old version of the "big" file.

 16M packages
2.4M packages.en
7.2K packages.en.zsync
 55K packages.zsync
7.5M primary.xml.gz
225K primary.xml.zsync

As you see, the zsync files are reasonable small.

Pros:

  • Simple for the server side
  • Does not require us to invent a patch/diff index
  • Works with all metadata formats
  • could transparently works for RPMs!!!!!!!

Cons:

  • The API is a little bit complicated
  • The example client uses its own http transport, we need to figure how to integrate with curl
  • The tarball does not provide libzsync as a .so, but as a static lib that is included in the client
  • The library includes its own old zlib.

--Duncan 15:45, 13 April 2007 (UTC) I have a patch to include zsync inside ZYpp though.