Libzypp/Downloading Metadata
From openSUSE
Contents |
Downloading Metadata
dmacvicar@suse.de
Metadata Size
.------------------------------------------. | metadata size (factory) | +------------------+----------+------------+ | | YUM | SuSETags | +------------------+----------+------------+ | Basic metadata | 41 (6.9) | 19 (2.9) | | Translation | | 3.5 (0.69) | | File lists | 169 (45) | | | Changelogs, misc | 193 (16) | | | Size info | | 18 (2.7) | | Patterns | | 1.2 (0.9) | '------------------+----------+------------'
Whole SUSETags is 59M, but this can be reduced to 11M compressing it. Uncompression takes 0.7s in a PIV machine.
Whole YUM is 67M, or 402M uncompressed. Of course this includes data like changelogs and filelists.
On a 1:1 comparision, only basic metadata, SUSETags gets 19M (2.9 compressed) while YUM takes 41M which become 6.9 with compression.
Downloading using diffs
This is a short example how much download time can be saved using diffs. I am mixing the YUM and SUSETags metadata here, just to show both cases.
On day 1, the sizes and checksums of the metadata files are:
22d78028896a6fbe5c07dcf4a5369b908bb93c29 16M 2007-04-11 12:03 packages 00b4c3d389c408ac4ebc51fb61e2fd2f8b9d414b 3.8M 2007-04-11 12:02 packages.en ceb7bf50874a818292a598b9dd42a6a9ff1b694c 42M 2007-04-11 13:37 primary.xml
You can see how much you can save with compression on the table above.
24 hours later, the server has updated metadata:
5f994031e1cc6b2df2326d648e677920cc7855d2 16M 2007-04-12 10:57 packages 0045e7b224ead8a86c072506434639afa4248607 3.8M 2007-04-12 10:57 packages.en eab53e6f9e0dfa250f66615e469d3404d32c169f 42M 2007-04-12 12:06 primary.xml
If the checksum of a file does not change, ZYpp will use the old copy when refreshing, but lets see how much download can we save if we could use diffs:
66K packages.diff 19K packages.diff.gz 13K packages.en.diff 2.5K packages.en.diff.gz 873K primary.xml.diff 111K primary.xml.diff.gz
This is orders of magnitudes smaller.
Lets explore how other distributions do it.
An overview to the Debian metadata
Debian separates metadata by architecture. This does not means you have different repos per architecture but every repository has different "components" and architectures.
For 2126 packages, the main Packages index file takes 2.3M or 563k uncompressed.
Every repository has a Release package which lists the components and their checksums.
Origin: Debian Label: Debian Suite: experimental Codename: experimental Date: Wed, 14 Feb 2007 08:07:57 UTC NotAutomatic: yes Architectures: alpha amd64 arm hppa hurd-i386 i386 ia64 m68k mips mipsel powerpc s390 sparc Components: main contrib non-free Description: Experimental packages - not released; use at your own risk. MD5Sum: 4f0696c3e8428d752b98a4d7fe0e08cd 1834780 main/binary-alpha/Packages 935dcbba1898a1324922a7313b74a417 445825 main/binary-alpha/Packages.gz 26e574117442c5401f4cc97fe2301950 2023 main/binary-alpha/Packages.diff/Index 2cf9014ee1a87db4d7b9f2c8d8759946 105 main/binary-alpha/Release 9230dfa95b115e65d97318e8125e0e67 2184043 main/binary-amd64/Packages a582915db631a51e2f867a582e2722da 519361 main/binary-amd64/Packages.gz a748ccc4c9ab8f05e6df2bbbf8b83b2b 2023 main/binary-amd64/Packages.diff/Index 0ecdc48ea606f45ff45b99fa2f4f0e9b 105 main/binary-amd64/Release ....
Also, there is a tag index:
a2ps-perl-ja Tag culture::japanese, implemented-in::perl, interface::commandline, role::program ...
There is a Release file per component:
main/binary-i386/Release:
Archive: experimental Component: main Origin: Debian Label: Debian NotAutomatic: yes Architecture: i386
The packages file is much like SUSETags packages:
Package: abook Priority: optional Section: mail Installed-Size: 328 Maintainer: Gerfried Fuchs <alfie@debian.org> Architecture: i386 Version: 0.6.0~pre2-1 Depends: libc6 (>= 2.3.6-6), libncursesw5 (>= 5.4-5), libreadline5 (>= 5.1), debconf (>= 0.5) | debconf-2.0 Filename: pool/main/a/abook/abook_0.6.0~pre2-1_i386.deb Size: 79952 MD5sum: d089ffe34367694faa534351b441970a SHA1: 7ee7385f2f9313401db5cd9e6c4a9e253e550f88 SHA256: 59b2229197f91d3bdbb996dbebee8691278c1cfe0c585490b41d30d947fd9247 Description: text-based ncurses address book application abook is a text-based ncurses address book application. It provides many different fields of user info. abook is designed for use with mutt, but can be used independently. Enhances: mutt
But, they use a diff system. There is a Packages.diff directoy that contains diffs from older versions to the current.
main/binary-i386/Packages.diff
2007-02-07-0808.30.gz 07-Feb-2007 02:09 2k 2007-02-07-2007.43.gz 07-Feb-2007 14:09 4k
The index file present there specifies how older version of the packages file map to the current version using a diff.
main/binary-i386/Packages.diff/Index SHA1-Current: 4513eca848a39e25934e2021b2d25dd9d59e4c49 2380440 SHA1-History: 060d4f82cf112a62efda326409b624446ff3be75 2253607 2007-02-07-0808.30 a0f6f319c4df20593e78a7292bb61f65edf486ee 2257342 2007-02-07-2007.43 ... SHA1-Patches: 2df8a47ab1474e5f0c93c6c9fe3417b36d64e6f7 4568 2007-02-07-0808.30 984b3944951cafb4ccc5ffd11d065ff5773e4fe6 11315 2007-02-07-2007.43 ...
Using zsync as the backend
zsync ( http://zsync.moria.org.uk/ ) is a client implementation of the rsync protocol. It only requires to run a command on the server to generate a .zsync file per "big" file, and then you can retrieve the big file using the .zsync file and a old version of the "big" file.
16M packages 2.4M packages.en 7.2K packages.en.zsync 55K packages.zsync 7.5M primary.xml.gz 225K primary.xml.zsync
As you see, the zsync files are reasonable small.
Pros:
- Simple for the server side
- Does not require us to invent a patch/diff index
- Works with all metadata formats
- could transparently works for RPMs!!!!!!!
Cons:
- The API is a little bit complicated
- The example client uses its own http transport, we need to figure how to integrate with curl
- The tarball does not provide libzsync as a .so, but as a static lib that is included in the client
- The library includes its own old zlib.
--Duncan 15:45, 13 April 2007 (UTC) I have a patch to include zsync inside ZYpp though.

