Libzypp/Programmers Guide

From openSUSE

Contents

ZYpp Programmers guide

This is a recompilation of knowledge and common pitfalls which arose during ZYpp development.

Basic Links

Design decisions

A few notes on design decisions and consequences. Being in a hurry it might be tempting to violate them, or weaken them.

Don't do it!

What looks like a quick result, may easily end in a bunch of workarounds, strange side effects, and finally the wish to correct it and the hindsight, that it's almost impossible.

  • The worst thing you can do is changing the semantic of an Object, after it's used by the surrounding code. And after the library is released, you can't do it anyway.
  • And keep in mind, that an interface is more than just a couple data of functions. It's also a promise on behaviour.
  • Using code (and compiler) will not just rely on your functions doing what they promise to do. They will also rely on the promised behaviour of your objects. And this is extremely important if your Objects are used in container classes, algorithms and the interface of other Classes.

This is not C. You can't code functionality and just document the intended usage. You can and must define functionality and behaviour. At first this may sounds like it is more effort. But in contrary to your documentation, the compiler is able to understand your code.

If your coding of behaviour is correct, the compiler will prevent others from using your classes in a way it is not intended to be used. It might cost YOU some more time to enable the compiler to do this. But once you did it, the compiler is your and everybody elses coworker.

He will remind you, if you do things which can't be done. Long before your testcases fail and you wonder why.

1. Decision is to define the objects responsibility. 2. How should it be used and accessed?

    • Is it a data type (like Arch, Edition, string, int...)?

Then the object must perform COW (CopOnWrite). (or must be NonCopyable, which is quite strange for a data type, but might be an option until COW is implemented)


    • Will it appear as 'pointer to something' ( Resolvable (public visible)
+Resolvable::Ptr
+Resolvable::constPtr)?

In case 'something' uses a pimpl, you spend 2 reference counter per object. one for the interface class and one for the implementation.

Access to the Type will be via _Ptr and _constPtr.

As it appears like a pointer, the application will have to use it like a pointer, This incl. test for != NULL before accessing the object.

This way it's easy to code 'no Object': ist's a NULL Object::Ptr

    • Will it appear as 'reference to something' ( SourceImpl (hidden) +Source_Ref +Source_constRef )?

( We currently can't build _constRef but if we will make them available. )

Basically similar to _Ptr, but no reference count for the interface needed, just for the piml.

For a _Ref, pimpl has to model an 'ObjectImpl*' doing no COW. (shared_ptr<ObjectImpl>, or intrusive_ptr. RW pointers are NOT appropriate).

For a _constRef, pimpl has to model an 'const ObjectImpl*'.

AND you MUST spend some effort on coding 'no Object'. It appears as a 'reference to something' so a 'something' must always be available! It's not acceptable for a _Ref type, to have a NULL pimpl. A Reference promisses that there is an Object, and you can access this at any time.

    • Thus you must be able to provide some 'dummy' Impl object.
    • Most probably the default ctor of your Object will refer to this 'dummy' Impl. (be careful not to alter dummy data in the dummy object via non const interface methods).
    • You may want to define a static const Object_constRef (that's one reason why we need the _constRef soon. Until this use _ref and place a doxygen /** \todo need _consRef */ comment.)

3. Providing operators

People have written books about this. If your class uses pimpl (and it should if it's not a trivial struct) it's the best idea to define and < (if one wants to define them at all) in terms of the pimpl pointer. Same Object, if same implementation object (the same of courses for != <= >....)

These operators also affect how your object will behave and how they are treated when used in container classes. Changing the semantic of these operators later, is usually a pretty bad idea.

So if you find, that in some or most pieces of your code a different semantic of 'equal' (same name, or same url or same whatever) is needed, then DO NOT consider changing operator==, or implementing some operator== based on non trivial or obvious conditions). Spend a 'normal' method for this, even if typing '==' is shorter.

Don't define type conversion operators. (you may use them, if you know why you should not.)

NEVER define type conversion to some integral type (like bool).

( in Source_Ref:

 Conversion via some 'typedef void 
(Source_Ref::*unspecified_bool_type)();'
 
is the only safe way to implement a conversion to bool to allow
pointer style tests for NULL.

)

4. Worry about constness.

What's a const in a 'const Source'?

For a nontrivial class it's sometimes hard to tell what's const.

Source is a container of ResObjects. The containers content ist defined by the Source metadata.

const ResStore & Source::resolvables();

At the time this accessor was put into the interface it was 'const' (const ResStore & Source::resolvables() CONST;). Someone removed the const. I can imagine why, but it's wrong. You want and must be able to query for the ResObjects in a Source, even (inf fact especially) if the source is const.

You can't airy change the semantic of such an accessor, because it hinds you writing a snippet of code. It breaks (maybe uncommitted) code and hampers writing new correct code.

And the code written, demanding a non const accessor is wrong, and so has to be written again. And tested again. That's a fast success, worth nothing.

So think twice before changing the semantic of an Object. Maybe the Object is wrong, but it may also be you're doing it the wrong way.

Const applies to methods and variables, as well as to the semantic of an object. A const data member of an object can't be changed, but data members of a const object may change.

ReferenceCount is the best example. Even if the Object is const, the reference counter have to be adjusted.

Basically you have 2 exceptions:

  • Some data must and may be changed ANY TIME during the objects lifetime.
  • Then you declare the data member 'mutable'.
class ReferenceCounted
{
  ...
  mutable unsigned _counter;
};
  • Late initialisation of data members (which is what probably applies to Source).

If the first access to some data member is done via a const Object, the data member has to be changed(initialized) before returning it.

Tempting to use 'mutable', but that's not correct. Being mutable, the compiler will never complain about attempts to change it. But once initialized, the member is const.

If you use mutable, you are responsible for not changing the data. You have to write some comment for your co-workers, that it must not be changed after init, and your co-workers, must read and remember it, when trying to fix something 10 minutes before freeze.

That one of the few situations, where it is advisable to have a const method which explicitly casts away constness and calls a nonconst init method:

bool init();

bool lateInit() const
{ return const_cast<Type*>(this)->init(); }
const SomeThing & access() const
{
 if ( ! _initialized )
 lateInit();
 return _soemthing;
}

And don't be fooled by methods you may find somewhere doing something like:

const SomeThing & access() const
{
 static bool _initialized = lateInit();
 return _soemthing;
}

Looks like you can avoid the 'if ( ! _initialized )' which has to be performed on any call to access(). But this does NOT work if lateInit() has to initialize nonstatic data. lateInit() will not be called once per object, but once int the lifetime of the application.

And please avoid irrelevant consts, esp. in return values of functions and methods. It always looks as if you have forgotten someting.

  • If you pass or return by value, then write
Type          ( a copy )

or

const Type &  ( a reference but to an immutable object )
 
const Pathname provideFile() const;
^^^^^
that a NOOP, and everybody will wonder if you meant

'const Pathname &' ?
   

This const has no consequence:

Pathname p       = provideFile();
const Pathname p = provideFile();

both is legal.

It is also strange to declare a:

provideFile(const unsigned media_nr);

The const is not part of the function signature. The const in declaration does nothing. It only tells of your implementation something the caller does not need to know. provideFile(const unsigned media_nr) and provideFile(unsigned media_nr) are the same function. If you have both declarations in a class definition, compiler will complain you declare same function twice. You can define provideFile(const unsigned media_nr) to implement the function declared by provideFile(unsigned media_nr).

"use namespace foo;" in header

NEVER USE using IN A HEADER!

Unless it does what you actually want it to do :)

using namespace std;

in Changelog.h is for shure NOT what you want to do. And it may mess up more as you may assume. 'using' is mostly for implementation files (.cc); 'using namespace std' for sure!

Why? Because the using does not end at the end of the header file. It's applied to everything that includes your file. 'using namespace std' introduces new names (the ones from ::std) wherever it is included, and thus may create clashes and ambiguities.

Please take it serious. We get into deep trouble, if we mess up the namespaces.

Don't #include<iostream> in header file

Do your best to NOT include <iostream> (fstream, etc.) in your header file. It's a quite expensive include. Instead of it:

#include<iosfwd>

Do not inline 'std::ostream operator<<'

unless it's trivial. As you don't include <iostream>, you actually can't output any integral type nor use std::endl.

+  inline std::ostream & operator<<( std::ostream & out,
+                                    const ChangelogEntry & obj )
+  {
+    out << obj.date() << " " << obj.author() << endl << obj.text() << endl;
+    return out;
+  }

This is no longer possible, but that's IMO ok as you don't want to do any kind of formating in the header.

What you can do is calling operator<< of some other known class, if you're quite sure it does not make sense to change this in the future:

  class Path
  {
    friend std::ostream & operator<<( std::ostream & str, const Path & obj );
  public:
    const std::string asString() const;
  protected:
    virtual std::ostream & dumpOn( std::ostream & str ) const;
  private:
    std::string _path;
  };
  inline std::ostream & operator<<( std::ostream & str, const Path & obj )
  {
    return str << obj.asString();
    // if <string> is included; operator<< for std::string is declared, and
    // <iosfwd> is sufficient to write this.
  }

//or

  {
    return obj.dumpOn( str );
    // That's a sometimes handy way for hierarchies operator<< is needed for
    // the base class only. Derived classes simply overload dumpOn, if they
    // nedd something different, or append details to the base operator.
  }

RAII (Resource Acquisition Is Initialization)

In fact you should NEVER EVEN THINK OF NOT USING RAII to do things, which must be undone. And especially if exceptions are used. Obvious things which must be undone are:

  • malloc/new requires free/delete
  • open/fopen requires close/fclose
  • temporarily modify some state or object, where it's important to undo the changes. If you ever used the iostream formating methods (like setw()) inside an operator<<, you probably know what I mean. They change the state of the underlying stream, so they stay active after your operator<< is done and affect all subsequent operator<< calls.
  • lock requires unlock

...You'll find tons of examples.

Smart pointers are a nice solution for new/delete. And even simple solutions are better than none:

 struct Fd
 {
   Fd( const Pathname & file_r, int open_flags )
   : m_fd( -1 )
   {
     m_fd = open( file_r.asString().c_str(), open_flags );
     if ( m_fd == -1 )
       throw; // or whatever
   }

   ~Fd()
   {
     if ( m_fd != -1 )
       {
         close( m_fd );
       }
   }

   bool isOpen() const { return m_fd != -1; }

   int fd() const { return m_fd; }

 private:
   int m_fd; // private to prevent ourselfes from being lazy ;)
   Fd( const Fd & ); // no copy
   Fd & operator=( const Fd & ); // no assign
 };

 ...
 scoped_ptr<Fd> fd;
 try {
   fd.reset( new Fd( "/some/file" ) );
 } catch ( ... ) {
   // open failed.
 }

 read( fd->fd(), ... ),

A more sophisticated solution could be a Fd with a default ctor, and open/close methods. But this does the job as well.

No matter what happens in your function ( an urgent return/throw or an unattended exception in some function it calls.... ~Fd() will be called and will close the file.

Remember that each time a file is opened it has to be closed, but Fd() has to be written only once ;)

Another simple problem: C functions returning a pointer to allocated memory. Another simple solution:

 struct SafeBuf
 {
   char * _buf;
   SafeBuf() : _buf( 0 ) {}
   ~SafeBuf() { if ( _buf ) free( _buf ); }
   std::string asString() const
   { return _buf ? std::string(_buf) : std::string(); }
 };

 ...
 SafeBuf safe;
 vasprintf( &safe._buf, format, ap );
 return safe.asString();

Handy if a class ctor might throw..

is a default constructor. For class Fd it would look like this:

 Fd fd;
 try {
   fd.open( "/some/file" );
 } catch ( ... ) {
   // open failed.
 }

 read( fd.fd(), ... );

If there's no default ctor, consider using a scoped_ptr<> as workaround. Otherwise you can't use the class outside the try block iff you're interested in catching the ctor exception.

NEVER LET AN EXCEPTION ESCAPE FROM A DTOR

If your dtor performs actions which could throw, then catch them! Never allow an exception to escape from a dtor. Never!

BE CAREFULL AND THINK TWICE

An exception is not a quick and easy way out. Think twice before you throw.

It's extremely important that your class is in a well defined state after an exception. This is comparatively easy if you throw within your method. But remember that other functions or classes you use may throw was well.

Defining exceptions emitted by your class is an important part of the class design! Nothing you can take care about later. It's can be extremely painful if you have to turn a nonthrowing (public/protected) method into a throwing one. Not for you writing the class, but for every part of the code that uses the class.

Clearly define and document which methods of the class interface throw, on which conditions and what's the state after an exception.

Take it serious, as try blocks are not cheap and we don't want to end up in code, that has a try around each single line.

Pay attention to methods that alter the internal state of your class. Try to perform all preparations (using possibly throwing actions) first.

Don't change the internal state until you know it will succeed. Then change the state using nonthrowing code (or thoroughly catch).

Try, test and experiment with exceptions (even if we don't have the time for it ;). Don't assume it will somehow work.

Exceptions and templates are probably the most powerful C++ constructs. So be sure the power heads into the right direction.

Doxygen supports an \throw tag. Use it. Documentation of exceptions a method throws, is more important than writing what the method does.

Remember RAII.

Doxygen for related functions

Please tag operators or related functions declared outside a class:

/** \relates ChangelogEntry */
std::ostream & operator<<( std::ostream & out, const ChangelogEntry & obj );

You can of course add further comments if you like. But \relates <class> makes doxygen list these function in the class' documentation. So you immediately see what additional operators or related functions are available for this class.

Usually when passing strings, you use a reference to avoid copying

bool createSolverTestcase (std::string dumpPath )

Some strings implementations use implicit sharing so copy is fast, but not good to assume. So for any class you pass read only you pass a const reference

Right. std::string shares implementations. Anyway there is an 'overhead' when passing by value: Reference counters need to be adjusted.

The same is true for zypp Ptr types. Passing ( ResObject::Ptr ) will increment/decrement the ref count, passing ( const ResObject::Ptr & ) wont.

And don't get fooled by the 'const'. It applies to the pointer and not to the object pointed to. A 'const ResObject::Ptr' is a const pointer to a mutable ResObject. Only 'ResObject::constPtr' point to a 'const ResObject'.

RULE: Nonintegral types should be 'passed by value' using a 'const &'.

Leave it to the implementation to create a writable copy if it needs it.

Advanced Topics

BUT CAUTION: When storing the values inside a class. Real live code I found in the backend (not tightly related to the above, but what I actually wanted to post)


struct Functor
{
  Functor( const string & s )
  : _s( s )
  {}
  
  bool operator()( const Sometype & x )
  {
    // some opearation using 's'
  }
  
  const string & s;
};


int somefunction()
{
  Functor fnc( "test" );
  
  for_each( container.begin(), container.end(), fnc );
  
  return 0;
}

Apart from the fact, that a functor should store the 'string & s' as 'string *', it is true that a functor should be a lightweight object. Thus might prefer storing references instead of copies.

Using a functor you should be aware of this!

The functor above is used in 'for_each'.

Got it?

Functor::Functor( const string & s )

v.s.

Functor fnc( "test" );

"test" is implicitly converted into a temporary string object which is passed to Functor::Functor.

Functor stores a reference to a temporary string that goes out of scope when the ctor is done.

Thus Functor::operator() operates on invalid data (_s) when invoked within for_each.


{
  string fncarg( "test" );
  Functor fnc( fncarg );

  for_each( container.begin(), container.end(), fnc );
  return 0;
}


It might be tempting to write

for_each( container.begin(), container.end(), Functor( "test" ) );

assuming now "test" will be available throughout for_each. But AFAIk it wont. The temp. string constructed from "test" is void after Functor is constructed, thus before for_each is actually invoked.

Unfortunately these constructs frequently work until ...?

So be careful to store references in classes only if it pays, be sure it is well well well documented, and make sure the user of your class reads it.

Prefer storing copies and and concentrate on writing classes that are cheap to copy.

About objects and exceptions

Not ResObject, but any C++ Object (Class). An object is constructed, used and (hopefully) destructed when no longer needed. That's nothing new, but think about what actually happens:

 //---------------------------------
 struct Foo : public Base
 {
   Foo()
   : _data( 0 )
   {
     // ctor code <=== What happens here?
   }
   ~Foo()
   {
     // dtor code <=== What happens here?
   }
   Type _data;
 };
 Foo f;  <=== What happens here?
 //---------------------------------

Well, a Foo is created, and once it exists you can use it, until is is deleted.

But look at the ctor code: Does the Foo object already exist here?

The compiler already allocated the space for the Foo, Base classes are initialized, _data members created and initialzed.

Same in the dtor code: Does the Foo object still exist?

There are still data and base classes accessible.

Answer to both questions is:

  • NO IT DOES NOT!
  • NO IT DOES NOT!

And this is VERY IMPORTANT TO REMEMBER, if you are USING EXCEPTIONS.

If there are no exceptions, the ctor and dtor code will return, and then the object exists, or it is gone.

Roughly:

  • An object exist when the LAST CTOR RETURNED.
  • An is gone as soon as the FIRST DTOR is CALLED.
  • An uncaught exception is NO RETURN.
  • NO DTOR is called for OBJECTS which do NOT EXIST.
  • A 'try' remebers the STACK.
  • A 'catch' restores the STACK.

Why is this so important to know?

1. NEVER let an EXCEPTION ESCAPE a DTOR

For more reasons than I can explain, and you want to read. If you dtor executes code that might throw, enclose it in TRY, and CATCH it.


2. If an EXCEPTION ESCAPEs a CTOR, the OBJECT has NEVER EXISTED! NO DTOR is CALLED.

So one thing you CAN'T do, is making surrounding objects refer to the object being created, and then throw. The object you tried to create does not exist, never existed!

Making some other object being aware of your presence in the ctor, is the PROMISE that you will exist soon. If you break it by not chatching, or throwing,...

...No, you won't be punished. No one can punish something that never existed. But you will leave a very sad user in front of a very black screen.

SO DO NOT DO THIS!


NO DTOR is called for the object your about to create, this is the object in which ctor you are. For the Base classes and data members which already exist (inside the not yet existing object), their dtor is called!

This is why it so important, to care about things which must be undone (new/delete, open/close, lock/unlock,...). This is why RAII (Resource Acquisition Is Initialisation) is important and very helpfull.

(see. About exceptions..)

3. TRY and CATCH operate on the STACK

That's a not minor important thing to remember. And it does not apply to exceptions in ctors only.

Throwing an exception, the stack is unwound up to location of a previous try.

THIS IS NOT UNDO EVERYTHING I DID!

So if you do (or cause) something within an object, which is not correct in case your function throws, use e.g. RAII to undo it. (outside the ctor 'within an object' applies to 'this' as well)

So try to group the actions within a method or function, so that throwing stuff is done first, and once you're sure it will perform, start actually changing the state of objects.

For example, the lazy parsing of Source metadata into ResObjects: The Source is already created, metadata were downloaded, first access to the ResStore:

Parsing starts, and may throw a parse exception. As we parse lazy (not in the ctor, where the exception would lead to noSource), The Source still exists after someone caught the exception.

What about the ResStore? If the parsers feed directly into the public Store, the Source is unusable, as the Store offers partial content. This can't be. If parsing fails, the Store must stay empty.

Exception is not like a boomerang! It won't hit you, if you throw it right. But like a boomerang it's easy to throw it wrong.

But this ends with:

<5> [DEFINE_LOGGROUP] Exception.cc(log):83
    YUMSourceImpl.cc(YUMSourceImpl):94
    THROW:    YUMSourceImpl.cc(YUMSourceImpl):94:
    Cannot read repomd file, cannot initialize source
<5> [DEFINE_LOGGROUP] ReferenceCounted.cc(~ReferenceCounted):37
    ~ReferenceCounted: nonzero reference count
terminate called after throwing an instance of 'std::out_of_range'
 what():  ~ReferenceCounted: nonzero reference count

Fortunately Stano found this, so we have time ;) to review and think about the code we wrote. Esp. about exceptions used in Ctor, Dtor, and other methods.

The problem here is that the YUMSourceImpl ctor gives away at least one reference to itself, to some other object. But then decides not to exist by throwing an exception from the ctor.

YUMSourceImpl's base class ReferenceCounted, is already constructed. It exists inside YUMSourceImpl. Thus it's dtor is called. (~YUMSourceImpl is not called as it does not exist).

As some other object already holds a reference on YUMSourceImpl, the reference counter is not zero, as it is expected to be in the dtor of a ReferenceCounted object.

So the YUMSourceImpl ctor must not throw, after having passed a reference to 'this' to the outer world.

And after all this writing, you should be able to spot the 2nd and the 3rd fundamental bug revealed by these two loglines.

Fortunately, on may say.

Otherwise the calling code would have caught the YUMSourceImpl exception. And at sometime sowhere someone would have accessed the YUMSourceImpl, that never existed. Either the other object itself or the dtor of the reference given to the object (the _Ptr or _Ref).

And you all know how painfull it is, to hunt such a kind of bug.

2nd: ReferenceCounted dtor throws, and this leads to an exception within an exception. This is not catchable and terminates the application. One of the reasons, why one should not let exceptions escape a dtor.

3rd: Exceptions are catchable, thus the application is able to ignore the error, and may continue. ReferenceCounted has to abort(), because the outstanding refernces to the deleted object WILL access it from their dtor (to decrease the refcount) That's an internal error which must abort, so we are able to find it while testing. Too dangerous to stay undetected.

Iterating over items matching capabilities

In case you find it helpfull:

#include "zypp/CapMatchHelper.h"


Asume you've got

ResPool  _pool;
PoolItem _pi;

// Function or functor processing CapAndItem's
bool consume( const CapAndItem & cai_r );
// Function or functor processing PoolItem's
bool consumePi( const PoolItem & pi_r );

Assume you want to process all PoolItems obsoleted by _pi:

forEachPoolItemMatchedBy( _pool, _pi, Dep::OBSOLETES, consume );

"for each (Pool)Item matched by _pi's OBSOLETES"

Assume you want to process all PoolItems obsoleting _pi:

forEachPoolItemMatching( _pool, Dep::OBSOLETES, _pi, consume );

"for each (Pool)Item obsoleting _pi"

In both cases consume is invoked for each CapAndItem that matches. So if

a PoolItem has multiple dependencies that match _pi, consume is invoked for multiple CapAndItem's that refer to the same PoolIten. Once for each dependency that matches.

You don't like this? You want't the action being invoked once for each PoolItem, no matter how many dependencies fit?

Use OncePerPoolItem:

forEachPoolItemMatching( _pool, Dep::OBSOLETES, _pi, OncePerPoolItem( consumePi )  );
The action (consumePi function or functor) now takes a PoolItem as

argument. OncePerPoolItem takes care that the action is invoked only once per involved PoolItem.

BUT:

In case _pi itself provides a match (obsoletes itself)? Not nice, but might happen. Anyway, you want to make shure that _pi itself is not target of the action invoked?

forEachPoolItemMatching( _pool, Dep::OBSOLETES, _pi, OncePerPoolItem( consumePi, _pi )  );

Read it as: for each PoolItem obsoleting _pi, but not for _pi itself, invoke consumePi.

Or

forEachPoolItemMatchedBy( _pool, _pi, Dep::OBSOLETES, OncePerPoolItem( consumePi, _pi )  );

"for each PoolItem obsoleted by _pi, but not for _pi itself, invoke consumePi."

Virtual Methods and Interface Classes

One intent of virtual methods is to implement some behaviour that is suitable for derived classes, and at the same time allows derived classes to redefine it if necessary.

It is not the intent of virtual methods to serve as a master copy that's cut'n'pasted into the derived class.

By using virtual methods we actually like to avoid duplicate code, not ease creating it.

I'm not shure why the authors felt the need to copy the base classes code into the derived classes. Maybe because some of the virtual methods are (or have been) pure. So the derived class must implement them. But please not via cut'n'paste:

{
  class Base
  {
    virtual ~Base();
    virtual int min();
    virtual int max() = 0; // pure
  };
  
  ~Base::Base()
  {}
 
  int Base::min()
  { return 0; }
 
  int Base::max()
  { return 100; }
}
{  
  class Poor : public Base
  {
    virtual ~Poor();
    virtual int min();
    virtual int max();
  };
  
  ~Poor::Poor()
  {}
  
  int Poor::min()
  { return 0; }
 
  int Poor::max()
  { return 100; }
}

It makes little sense, if Poor reimplemens Base.

  • virtual int min();

If your class is fine with the Base implementation, do nothing. It will be used then.

  • virtual int max() = 0;

If your class is fine with the Base implementation, use it.

{  
  class Derived : public Base
  {
    virtual ~Derived();
    // Base::min() is fine.
    // Base::max() is fine too, but pure:
    virtual int max(); 
  };
 
  ~Derived::Derived()
  {}

  int Derived::max()
  { return Base::max(); }
}

Notes:

  • As virtual methods can never be inlined, they are happy being defined in the .cc file.
  • Unfortunately not enforced by the compiler, it's (not just IMO) considered good style to repeat the virtual keyword in front of the derived classes methods.
  • From the binary compatibility point of view, derived classes should define a dtor, even if it does nothing. In case you later experience you need one, it is already there and you don't have to add it (breaking binary compatibility). For the same reason you should overload each of the base class virtual methods. The base class may even enforce this by making all methods pure.

No matter why you overload, don't do it by copying the base method. Call it!

It's a common mistake, assuming a pure virtual can be declared but not defined or called.

In fact, if you can, you should provide an implementation. Even if it just throws an error message and aborts. This might be more helpfull than

  • pure virtual method called
  • terminate called without an active exception
  • Aborted

And a backtrace that often contains no hint.

(Ever got the above exception? If not, convince your local g++ to link a programme that calls a function that is not defined. It's not that hard.)

Besides the fact that the compiler enforces derived classes to overload it, there is nothing special about pure virtual methods.

Measuring time

#include "zypp/base/Measure.h"

Tool to measure elapsed real and process times. See autodocs for details.

{
  Pathname repodata( "/Local/FACTORY/repodata/primary.xml" );
  std::list<std::string> depstrings; 

  Measure m( "Parse" );

  xml::Reader reader( repodata );
  reader.foreachNodeOrAttribute( dumpDeps( std::back_inserter(depstrings) ) );
  m.elapsed();

  MIL << "raw depstrings " << depstrings.size() << endl;

  fillSet    ( depstrings );
  m.elapsed();
  fillHashSet( depstrings );
  m.elapsed();
  fillMap    ( depstrings );
  m.elapsed();
  fillHashMap( depstrings );
  m.stop();
}

Output:

START MEASURE(Parse)
ELAPSED(Parse) 3 (u 3.15 s 0.08 c 0.00)
raw depstrings 346508
unique depstrings 43249
ELAPSED(Parse) 3 (u 3.51 s 0.08 c 0.00) [0 (u 0.36 s 0.00 c 0.00)]
unique depstrings 43249
ELAPSED(Parse) 3 (u 3.63 s 0.10 c 0.00) [0 (u 0.12 s 0.02 c 0.00)]
unique depstrings 43249
ELAPSED(Parse) 4 (u 3.97 s 0.10 c 0.00) [1 (u 0.34 s 0.00 c 0.00)]
unique depstrings 43249
MEASURE(Parse) 4 (u 4.17 s 0.10 c 0.00) [0 (u 0.20 s 0.00 c 0.00)]

Memory Allocation and Smart Pointers I

Hunting bug 168690 Lukas provided some valgrind logs. As this is the 2nd location where trifle with pointer leads to a memory leak, I'd like to remind you to use SMART POINTERS.

The 1st bug was fixed by

- typedef zypp::storage::PersistentStorage::SourceData* SourceData_Ptr; 
+ typedef shared_ptr<storage::PersistentStorage::SourceData> SourceData_Ptr;


Here it is a bit more work to do:

 class PersistentStorage : public base::ReferenceCounted, private base::NonCopyable
 {
 public:  
   /** Default ctor */
   PersistentStorage();
   /** Dtor */
   ~PersistentStorage();
 private:
   class Private;
   Private *d;
 };
 
 class PersistentStorage::Private
 {
 public:
   Backend *backend;
 };
 PersistentStorage::PersistentStorage()
 {
   d = new Private;
   DBG << "  Creating XML Files backend" << endl;
   d->backend = 0L;
 }
 
 void PersistentStorage::init(const Pathname &root)
 {
   d->backend = new XMLFilesBackend(root);
 }
 PersistentStorage::~PersistentStorage()
 {}
  • Memory allocated with 'new' has to be deleted. A shared_ptr automatically deletes allocated memory if it is not used anymore.

PersistentStorage stiff-necked avoids calling delete. This would be ok if the 'Private *d' was a 'shared_ptr<Private> d'. But as it isn't, the code leaks.

  • Obviously the lines above apply to PersistentStorage::Private as well. Furthermore...

No matter that is just a private helper class of PersistentStorage. A class that owns a POINTER (or some other data type without a default ctor) must NOT leave it UNINITIALIZED.

PersistentStorage::Private owns backend, so it is it's job to initialize it in a ctor (and delete allocated memory in a dtor).

Using a 'shared_ptr<Backend> backend' you get all this for free (auto initialized to NULL, auto detetion of allocated data).

Not using a shared_ptr you MUST do it manually.

Exception Abuse

Index: zypp/SourceManager.h
===================================================================
--- zypp/SourceManager.h        (revision 3420)
+++ zypp/SourceManager.h        (working copy)
@@ -178,6 +178,16 @@
     Source_Ref findSource(const std::string & alias_r);
 
     /**
+     * Find a source with a specified URL.
+     * URLs are unique in zenworks but NOT in zypp.
+     * A bug in SL 10.1 causes alias mismatches so we have to resort to 
URLs.
+     * #177543
+     *
+     * \throws Exception
+     */
+    Source_Ref findSourceByUrl(const Url & url_r);
+
+    /**
      * Return the list of the currently enabled sources
      *
      */

  • Consider implementing these kinds of convenience queries as standalone functions. There's no need to bloat the SourceManager interface with this.

There's nothing Sourcemanager specific in it. It's a query on a Source container.

template <class _Iterator>
  Source_Ref findSourceByUrl( _Iterator begin_r, _Iterator end_r, Url );
Source_Ref findSourceByUrl( SourceManager, Url )
{
  return findSourceByUrl( SourceManager.Source_begin(), 
                          SourceManager.Source_end(),
                          Url ); }

  • STOP IMPLEMENTING methods that search or find data TROWING!

It's IMO completely nonsense to make a query like find throw an exception, if not found. It just causes a bunch of try/catch overhead for nothing. Return Source_Ref::noSource if not found.

STL Container copy penalty

Assume

   typedef long cont_type; // arbitrary
   vector<cont_type> v;
   set<cont_type>    s;

Now lookup elements in the container:

   for_each( v.begin(), v.end(), lookup(v) );
   for_each( s.begin(), s.end(), lookup(s) );
  

Further assune lookup results in a call to:

 template<class _Tp>
   inline bool lookup( std::set<_Tp> container, _Tp el )
   { return container.find( el ) != container.end(); }
 template<class _Tp>
   inline bool lookup( std::vector<_Tp> container, _Tp el )
   { return std::binary_search( container.begin(), container.end(), el ); }
 NOTE: container passed by value!

To summ it up

Each call to lookup an element creates a COPY of the container ( lookup( std::*<_Tp> container, ...) ).

On my machine (Intel(R) Pentium(R) 4 CPU 2.66GHz; 768MB RAM, 1GB swap) looking up 10.000 elements (creating 10.000 container copies) takes

 10.000 elements:
    0.120 sec. using vector
   25.190 sec. using set

Looking up 100.000 elements (100.000 container copies) takes

 100.000 elements:
   100.860 sec. using vector
   ?            using set (aborted after ~15 Min.)


Obviously vector performs better than set, but both perform bad on copy!

Changing lookup to take a 'const std::CONTAINER<_Tp> &'

 100.000 elements:
   0.030 sec. using vector
   0.030 sec. using set
 1.000.000 elements:
   0.310 sec. using vector
   0.360 sec. using set
 10.000.000 elements:
   3.090 sec. using vector
   6.840 sec. using set

Set performs worse than (a sorted) vector, but that's IMO still acceptable.

So don't pass containers per value without need!

Make yourself familiar with class Pathname

Concatenating 2 Pathnames (or a Pathname and a string or a string and a Pathname) puts a '/' between both and normalizes the Pathname.

 /** Concatenation of pathnames.
   * \code
   *   "foo"  / "baa"  ==> "foo/baa"
   *   "foo/" / "baa"  ==> "foo/baa"
   *   "foo"  / "/baa" ==> "foo/baa"
   *   "foo/" / "/baa" ==> "foo/baa"
   * \endcode

The reason to introduce Pathname, was to avoid code like it's currently in e.g. SourceManager::store. Where almost every Pathname is converted to string, and concatenations are done manually.

cache = ZYPP_METADATA_PREFIX + str::numstring(id)
// works based on the knowledge that ZYPP_METADATA_PREFIX
// currently is a string with a trailing /, but not even
// a comment telling that this is required.


filesystem::assert_dir ( root_r.asString() + descr.cache_dir );
// works based on the assumption that descr.cache_dir was an
// absolute Pathname (had a leading /). But thats assumed, not 
// asserted.

... That's exactly the kind of fragile code we want to avoid by using Pathname.

Example:

Pathname root   ( "/root" );	// '/root'
Pathname subabs ( "/sub" );	// '/sub'
Pathname subrel ( "sub" );	// './sub'
//Pathname now supports operator / for concat as well.
root / subabs	==> '/root/sub'
root / subrel	==> '/root/sub'


root.asString() + subabs.asString()   ==> '/root/sub'
root.asString() + subrel.asString()   ==> '/root./sub
                                          ^^^^^^^^^^^
                             this is NOT what you want

In the current code it's quite hard to figure out (guess) where paths are concatenated and where a path basename is extended

Pathname("foo/baa").extend( ".h" ) ==> "foo/baa.h"

Learn how to erase elements from containers

We have some buggy code, that tries to erase elements, matching a certain property from containers. Example:

 for (ResStore::iterator it = store.begin(); it != store.end(); ++it)
 {
   _pool.erase(*it);
 }

Problem: Removing an element from a container invalidates (at least) all iterators pointing to it. Thus after erasing *it, it is no longer valid. ++it has UNDEFINED BEHAVIOUR.

Loop based algorithms (differs depending on the kind of container)

Sequential container (vector string deque list): erase returns a valid iterator to the next element.

 SeqContainer c;
 for ( SeqContainer::iterator it = c.begin(); it != c.end(); /**/ )
   {
     if ( toBeRemoved( *it ) )
       {
         it = c.erase( it ); // valid next-iterator returned
       }
     else
       ++it;
   }


Associative container (maps sets): erase returns void, but we can use postfix increment, as ONLY iterators to the eased object get invalid:

 AssocContainer c;
 for ( AssocContainer::iterator it = c.begin(); it != c.end(); /**/ )
   {
     if ( toBeRemoved( *it ) )
       {
         c.erase( it++ ); // postfix! Incrementing before erase
       }
     else
       ++it;
   }


stl algorithms

In case toBeRemoved above is actually a function/functor.

Sequential container (vector string deque): stl::remove_if, does not erase elements, they are just moved to the containers end, and an iterator to the 1st item to be 'removed' is returned.

 SeqContainer c;
 c.erase( stl::remove_if( c.begin(), c.end(), toBeRemoved ),
          c.end() );
 
 

Sequential container (list): The above works too, but list has a builtin remove/remove_if which is more efficient.

 list c;
 c.remove_if( toBeRemoved );
 

Associative container (maps sets): Actually the loop above is the most efficient solution. There is an algorithm based solution, but it requires copying all elements not to be removed ;(

 AssocContainer c;
 
 AssocContainer keepItems;
 stl::remove_copy_if( c.begin(), c.end(),
                      stl::inserter( keepItems, keepItems.end() ),
                      toBeRemoved );
 c.swap( keepItems );

Iterate over all keys/values in a map

{
  #include "zypp/base/Iterator.h"
  
  typedef std::map<K,V> MapType;
  MapType mymap;
 
  // iterate over all keys in mymap
  std::for_each( make_map_key_begin( mymap ),   
                 make_map_key_end( mymap ),   
                 DoSomethingWithKeys() );
  
  // iterate over all values in mymap
  std::for_each( make_map_value_begin( mymap ), 
                 make_map_value_end( mymap ), 
                 DoSomethingWithValues() );
}

Or provide these iterators in a class:

{ 
  class SMgr
  {
  private:
    typedef std::map<Id,Src> MapType;
    
  public:
    typedef MapKVIteratorTraits<MapType>::Key_iterator   Id_iterator;
    typedef MapKVIteratorTraits<MapType>::Value_iterator Src_iterator;
    
    Id_iterator Id_begin() const 
    { return make_map_key_begin( _map ); }
    
    Id_iterator Id_end() const 
    { return make_map_key_end( _map ); }
    
    Src_iterator Src_begin() const
    { return make_map_value_begin( _map ); }
    
    Src_iterator Src_end() const
    { return make_map_value_end( _map ); }
    
  private:
    MapType _map;
  };
}

Minimal error handling and detection

//
void TagFileParser::parse( const Pathname & file_r)
     {
       std::ifstream file(file_r.asString().c_str());
       DBG << file_r << endl;
       std::string buffer;
       // read vendor
       beginParse();
       while(!file.eof())
       {

This loops silently (if XXX disable) endlessly, if the stream is not usable. This may be due to a wrong Path, an unreadable file, some ioerror while reading the file,...

Yes, I said the Source is responsible for passing the correct file. The parser must not do any prechecks, open the stream and parse and throw on errors.

But of course you must watch your stream. There's no guarantee you ever reach the EOF.

That's why we have e.g. iostr::forEachLine

   /** Simple lineparser: Call functor \a consume_r for each line.
    */
   template<class _Function>
     _Function & forEachLine( std::istream & str_r, 
                              _Function & consume_r )
     {
       while ( str_r )
         {
           std::string l = getline( str_r );
           if ( ! (str_r.fail() || str_r.bad()) )
             {
               // l contains valid data to be consumed.
               if ( ! consume_r( l ) )
                 break;
             }
         }
       return consume_r;
     }

And even if you dislike these predefined algorithms, which could enable you to concentrate on WHAT you do with the data (writing the function/functor you pass), and NOT on HOW TO READ the data.

That's a common task, thats why an algorithm for it is provided. It's the same peace of code whenever you read a file line by line.

I admit, the layout of your code is slightly different using this kind of approach. You have to learn this once. But for the rest of your life you won't have to write this loop reading lines from a file again, and you won't have any error in this loop.

And especially if you're in hurry, you can be sure that the loop is fine, an concentrate on the real action.

About protected data members

   protected:
     /** All resolvables provided by this source. */
     ResStore _store;
     /** Handle to media which contains this source */
     media::MediaId _media;

If you've got the time do it like this:

   protected:
       ResStore &       resStore()       { return _store; }
       const ResStore & resStore() const { return _store; }

       media::MediaId &       media()       { return _media; }
       const media::MediaId & media() const { return _media; }

   private:
       ResStore _store;
       media::MediaId _media;

There's no performance penalty, as the code is inlined. But you enable the base class to perform lazy initialization, and to install several kind of checks - invisible to the derived classes - in case this proves to be necessary.

Temporary Regexps

Creating a regex from string representation is expensive! So avoid using temporary regex in a loop:


In case you're using regex to parse lines in a file, make sure the regex is defined outside the loop reading the file, and not inside:

boost::regex rxComment("^space:*#(.*)$");
while(file && !file.eof())
{
  if(boost::regex_match(buffer, what, rxComment, boost::match_extra))
{

and not

while(file && !file.eof())
{
 if(boost::regex_match(buffer, what, boost::regex("^space:*#(.*)$"), boost::match_extra))
 {

Changing the TagParser that way, parses a packages file about 30% faster, on my machine (13 sec. instead of 20).

Avoid these typedefs

+------------------------------------------------------------------+
PoolItem.h
+------------------------------------------------------------------+

typedef std::list<PoolItem> PoolItemList;
typedef std::set<PoolItem> PoolItemSet;  
+------------------------------------------------------------------+
PackagesParser.h
+------------------------------------------------------------------+
typedef zypp::shared_ptr<zypp::source::susetags::SuseTagsPackageImpl> PkgImplPtr;
typedef std::map<zypp::NVRAD, PkgImplPtr> PkgContent;

namespace zypp
{ /////////////////////////////////////////////////////////////////

You should not typedef in the global namespace at all (e.g.PackagesParser)!

PoolItem.h is at least within the zypp namespace, but everybody using a PoolItem wants to include list, set, or whatever YOU need.

If it's just a convenience, because you don't want to write std::set<PoolItem> all the time, typedef it inside your class.

If it's meant to be a common type, like CapSet, put it into it's own header, where related stuff like operators can be defined too. And include this where needed.

Classes inside classes?


I just found this in zypp/detail/PackageImplIf.h

    ////////////////////////////////////////////
    //
   //        CLASS NAME : PackageImplIf
    //
    /** Abstact Package implementation interface.
    */
    class PackageImplIf : public ResObjectImplIf
    {
      class CheckSum
      {...};
      class BaseVersion
      {...};
      class PatchRpm
      {...};

Whatever CheckSum or BaseVersion or PatchRpm is, they are for sure misplaced inside the 'abstact Package implementation interface'!

If these types are not to be exposed in the user interface, but used in the implementation classes, please declare them in namespace zypp::detail (we should actually rename it to zypp::resolvable). Put them in some header file (or files) and include them where needed. BTW: You installed the Types in PackageImplIf, but not a single access method. Why do you want to restrict the ability retrieving a package via patch or delta rpm to YUMPackageImpl? Isn't it common enough to offer this to all PackageImpl and all kind of sources? Each source could immediately benefit from this, if it is able to provide appropriate data. I don't see why this should be a Yum specific feature?

And now again : class PackageImplIf::DiskUsage

PLEASE MOVE IT OUT!!!

Each and every class should serve a well defined purpose. An implementation interface simply defines the methods to be realized by a concrete implementation.

A Package maybe HAS and USES CheckSum, DiskUsage and other Types. But you (hopefully) would not consider defining a String class inside PackageImplIf just because a Packages has or uses a String?

We have namespaces, so there's not much need to nest classes. Put each class into a header file and include them where needed.

If your class is considered to be a Resolvables implementation detail put it into zypp::detail. If it might(!!!) be of common use, or is exposed in the public interface, define it in zypp::. (!!! the fact that only Packages use or provide DiskUsage does not make it an implementation detail or a Package specific class). It's common.

NEVER NEVER NEVER expose implementation details in the Interface!

 zypp/Package.h
   class Package :
     typedef detail::PackageImplIf::DiskUsage DiskUsage;

Absolutely nothing defined in zypp::detail may occur in ANY public interface. That's why it's in zypp::detail.

Think about it for 10 seconds, and you'll recognize that the typedef reveals that DiskUsage was defined at the wrong location.

The typedef does not fix this!

The only thing the typedef does, is making the whole concept of separating interface and implementation obsolete.

AGAIN: The ONLY detail exposed is

typedef detail::PackageImplIf    Impl;

because the Impl pointer is the interfaces connection to the (hidden) implementation.

And the sole reason for #include "zypp/detail/PackageImplIf.h" in "zypp/Package.h" is to let the compiler 'see' the Impl hierarchy.

The compiler must know:

class ResObjectImplIf;
class PackageImplIf : public ResObjectImplIf;

Unfortunately the compiler won't believe you, so you have to include the PackageImplIf definition.

Nothing else is public.

==NEVER pass non const references (&) to class internal data out of your class!==

You can't do this! You can't pass a 'Source &' to the outer world!

Via this anybody is allowed to to change your internal Source Object (and it must be somewhere otherwise you could not pass it back :) without your knowledge.

What you wrote is not what you want:

Package::Ptr p;

p->source() = 0;

This is what your interface offers!

Source is a Source_Ref. I introduced a typedef. So use Source_Ref in new code.


Either pass 'const Type &' or 'Type'

But no 'Type &' unless this is really what you want to offer.

On choosing Smart Pointers

Depends a bit on your class:

  • I use. do define these typtedefs if the class is designed to be accessed via smart pointers only. Like the Resolvable.

Not in structs like the parser helper classes, which can be used as plain variables as well.

The template depends on where the reference counter lives.

  • intrusive_ptr: Your class has a builtin reference counter.

It's derived from class ReferenceCounted, or has some other kind of counting mechanism. In the later case you have to provide 2 functions (in the same namespace as the class):

void intrusive_ptr_release( const NAME * );
void intrusive_ptr_add_ref( const NAME * );

That's what intrusive_ptr calls to manipulate the (external) reference counter.

  • shared_ptr: Your class has no builtin reference counter. On construction shared_ptr creates a counter and it's maintained together with the

pointer.

This is why these two types can't be mixed. The typedefs shall help the user of the class to pick the right type, without need to look into the code to see where the counter is.

And may be you change your mind and switch from shared to intrusive. Adapting the typedefs is most probably sufficient, as the syntax and semantics of both pointer types is (almost) the same. (you can't get the counter value from intrusive_ptr, you have to follow the pointer and ask the class. Thats the difference between both).

The two others RW_pointer, and RWCOW_pointer are no smart pointers themself, but wrapper around one of the above types. They are intended to be used as implementation pointer, as they provide a const correct access to the implementation class. RWCOW_pointer additionally does copy on write. So it depends in your implementation class whether you put an intrusive or shared pointer into WR_pointer. It works for both.

Remember 'const correct'?

shared_ptr<Foo> is a 		Foo *
shared_ptr<const Foo> is a 	const Foo *

thus a

const shared_ptr<Foo> is a 	Foo *const

The pointer is const, not the Foo. That's not handy, if it's an implementation pointer (forwarding access from the interface class to the implementation class).

If the interface class is const (you call a const method), the implementation class should appear to be const, not the pointer to it. That's what RW_pointer adjusts:

RW_pointer<Foo> is a		Foo *
const RW_pointer<Foo> is a	const Foo *

(and RW_pointer<const Foo> does not really make sense).