SDB:Sound concepts

Jump to: navigation, search
General openSUSE sound concepts.

Linux sound systems

The sound drivers for openSUSE are typically provided by the ALSA packages. In most cases the sound configuration should be automatic upon booting.

OSS (Open Sound System)

OSS is a portable sound interface for Unix systems.

In the case of the Linux kernel, the Open Sound System (OSS) was the only supported sound system used up to the 2.4.x series. It was created in 1992 by the Finn Hannu Savolainen (and later improvements to OSS were made proprietary). Starting with the Linux kernel version 2.5, ALSA, the Advanced Linux Sound Architecture was introduced, and the OSS interface became deprecated by Linux' authors. The audio output library that supports OSS for openSUSE is "libao".

Recently (in July 2007), the sources for OSS were released under CDDL for OpenSolaris and GPL for Linux. Reference:

ALSA (Advanced Linux Sound Architecture)

ALSA is the recommended interface for software that is intended to work on Linux only. It supports many PCI and ISA "plug and play" sound cards, providing both kernel hardware drivers for audio cards, and a library that makes accessible the ALSA application programming interface (API) for various software applications. For backward compatibility, ALSA also contains an optional OSS emulation mode that transparently appears to programs as if it was OSS. ALSA sound drivers are not part of the SuSE kernel source code, but rather are compiled and linked to the kernel as module(s). One feature of ALSA (not available in OSS) is ALSA allows device sharing by several applications.

Typically, openSUSE when interfacing to most modern PC sound hardware, will use a sound card driver embedded within ALSA. Alsa basic structure.jpg

Multimedia interface with sound

Typically (but not always), a sound player will use either xine or gstreamer as its sound engine. Xine and gstreamer provide an interface in which media player developers can interface their applications to, without having to force their applications to go a layer deeper in the system sound interface.


xine is a multimedia playback application. Many applications under KDE (many of which also run under Gnome) use the xine sound engine. xine is built around a shared library (xine-lib) that supports different front end player applications. While it uses the ALSA kernel hardware drivers, it has its own application programming interface (API). Another important feature of xine is the ability to manually correct the synchronization of audio and video streams. Without an application such as xine, writing software to play DVDs in Linux was described as "a torturous process" since one had to manually create audio and video named pipes and start their separated decoder processes (ie write their own API to the alsa or OSS drivers).


GStreamer is a multimedia framework written in the C programming language with the type system based on GObject. The GNOME desktop environment is the primary Linux user of GStreamer technology. In the case of Gnome, ESD handles sound server duties and GStreamer handles the encoding and decoding (and both eventually pass everything down the pipeline to ALSA). GStreamer serves a host of multimedia applications, such as video editors, streaming media broadcasters, and media players. Designed to be cross-platform, it is known to work on Linux (x86, PowerPC and ARM), Solaris (x86 and SPARC), Mac OS X, Microsoft Windows and OS/400. GStreamer uses a plugin architecture which makes most of GStreamer's functionality implemented as shared libraries. Plugin libraries are dynamically loaded to support a wide spectrum of codecs, container formats and input/output drivers.


In KDE 4 developers plan to use a new multimedia API, known as Phonon. Phonon will provide a common interface on top of other systems, such as GStreamer.

Phonon is a new KDE technology that offers a consistent API to use audio or video within multimedia applications. The API is designed to be Qt-like, and as such, it offers KDE developers a familiar style of functionality. Firstly, it is important to state what Phonon is not: it is not a new sound server, and will not compete with xine, GStreamer, ESD, aRts, etc. Rather, due to the ever-shifting nature of multimedia programming, it offers a consistent API that wraps around these other multimedia technologies. Then, for example, if GStreamer decided to alter its API, only Phonon needs to be adjusted, instead of each KDE application individually.

Phonon is powered by what the developers call "engines" and there is one engine for each supported backend.

Currently for KDE-4 there are various backends/engines in development:

The goal for KDE 4.0 is to have one 'certified to work' engine, and a few additional optional engines.

It is planned that a Phonon backend can make use of GStreamer (or NMM or Xine or whatever else might make sense, DirectX on Windows, QuickTime on MacOS). Phonon is only a comparatively simply multimedia API while the backend is the adaptor between the Phonon API and a (full featured) media framework.

Other engines that have been suggested include DirectShow (for the Windows platform), and QuickTime (for the Mac OS X platform). Development on these additional engines has not yet started, as the Phonon core developers are more concerned with making sure that the API is feature-complete before worrying about additional engines. If the Phonon developers attempt to maintain too many engines at once while the API is still in flux, the situation could become quite messy (If you would like to contribute by writing an engine, jump into the #phonon channel at

Linux sound servers or daemons

As many other programs in Linux the sound system is built up using a server-client solution. The sound server is the part responsible for the actual communication with the hardware, the sound card. To this sound server different clients could connect and request to send sound data to the sound card.

These sound clients could be ordinary programs like a mp3 player or a video player, but it could in some cases be a client running on another computer connected over the network.

   * aRts
   * dmix
   * esd
   * jack
   * MAS
   * NAS
   * Pulse Audio

By default these sound servers/daemons do not need to be running to provide one's openSUSE audio. Instead they can be used optionally, dependent upon any special applications that one may wish to run.


aRts, which stands for analog Real time synthesizer, is an application that simulates an analog synthesizer under KDE, allowing modular multimedia applications. aRts relies on ALSA to provide the hardware kernel drivers. One key component of aRts is the soundserver which mixes several soundstreams in realtime. aRts is designed to provide its filter and synthesis capabilities to other applications using the multimedia communication protocol (MCOP). For applications that require OSS, aRts has an OSS emulator. Historically aRts was the default KDE analog Real time synthesizer, although it has been deprecated by the KDE team for KDE4 (being replaced by Phonon).



dmix, is part of ALSA, and not a separate system. It acts as an Indirection Layer and so it is included here. The dmix plugin extends the functionality of PCM devices allowing low-level sample conversions and copying between channels, files and soundcard devices. The dmix plugin provides for direct mixing of multiple streams.


esd - EsounD (the Enlightened sound daemon) consists of a set of tools to allow simultaneous use of an audio device by several applications. Without its use, a program currently using the sound device must finish before another application can output to it. Esound also offers accessing/playing audio over audio devices over a network. The audio output library that supports ESD for openSUSE is "libao". For applications that require OSS, ESD has an OSS emulator. Historically ESD is hard coded into Gnome, although with the advent of PulseAudio it is being divorced/pulled-out of Gnome hard coding (to make Gnome/Linux sound more modular). In the case of Gnome, ESD handles sound server duties and GStreamer handles the encoding and decoding (and both eventually pass everything down the pipeline to ALSA).


JACK is a low-latency audio server, written for POSIX conformant operating systems such as GNU/Linux and Apple's OS X. It can connect a number of different applications to an audio device, as well as allowing them to share audio between themselves. Its clients can run in their own processes (ie. as normal applications), or can they can run within the JACK server (ie. as a "plugin"). JACK relies on alsa to provide the kernel hardware drivers. JACK was designed from the ground up for professional audio work, and its design focuses on two key areas: synchronous execution of all clients, and low latency operation.


MAS (Media Application Software) is a network and platform independent media application server from SAI (Shiman Associates Inc) for Xorg.


Network Audio System, or NAS for short, is a network transparent system for playing audio in a computer. Its designed around a client-server methodology, and supports servers for a variety of operating systems including Linux. (The Network Audio System is a network transparent, client/server audio transport system. It can be described as the audio equivalent of an X server).


PulseAudio is a networked sound server, similar in theory to the Enlightened Sound Daemon (EsounD). It provides:

  • Software mixing of multiple audio streams, bypassing any restrictions the hardware has.
  • Network transparency, allowing an application to play back or record audio on a different machine than the one it is running on.
  • Sound API abstraction, alleviating the need for multiple backends in applications to handle the wide diversity of sound systems out there.
  • Generic hardware abstraction, giving the possibility of doing things like individual volumes per application.

PulseAudio comes with many plugin modules. All audio from/to clients and audio interfaces goes through modules. PulseAudio clients can send audio to "sinks" and receive audio from "sources". A client can be GStreamer, xinelib, MPlayer or any other audio application. Only the device drivers/audio interfaces can be either sources or sinks (they are often hardware in- and out-puts).

PulseAudio is included with openSUSE 11.0, 11.1 and 11.2.

PulseAudio tips & tricks

Given the wide range of available sound systems, some tweaks might be needed, depending on the applications being used.

See also

External links