The Practice of Parallel Programming

Preface to the Online Edition

This book provides an advanced guide to the issues of the parallel and multithreaded programming. It goes beyond the high-level design of the applications, into the details that are often overlooked but vital to make the programs work. The content is oriented towards the programming of the operating systems, servers and business applications.

The parallel programming has three aspects to it: the theory of parallelism, a specific API you plan to use, and the details of how to make it all work together. Many books cover the first two aspects but at the moment this is the only book about the third one. This craft of parallel programming is not widely known, and because of this the parallel programming has gained the reputation of complexity. Surprisingly few programs and libraries do the multithreading quite right. This book leads the readers into the understanding of the craft, using many examples based on POSIX and Microsoft Windows APIs, with occasional Java or C#. Most of the algorithms discussed are general and may be implemented in any language.

How to stop a multithreaded program correctly? Or how to stop only a single connection in a multi-user server? What if the program receives a signal? How are the scalable data structures built? What synchronization primitives are more appropriate in which usage patterns, and how are they related to each other? A whole lot of questions about the queues and topologies built with them. When are the fashionable paradigms of lock-free synchronization, transactional memory and actors hot and when not? How to multiplex with threads and without them? This book answers all these questions and more.

Unfortunately, right now the publishers have very little interest in the books on the parallel programming. Looks like lots of them have been published recently, the market became fragmented, and none of them is selling any good. Well, I think the second reason is that a lot of these books concentrate on the algorithms of the scientific calculations kind. These algorithms are of little use outside the supercomputer community, and that community is not very big either. The end result is that I could not convince any of the publishers that my book is different enough to be worth a try. Since the work has been already done, I've decided to publish the book in the Internet for free and then go with the self-publishing of the printed version.

Even though the book can be read online for free, it is still a fully copyrighted material and may not be copied nor re-published in any form without the explicit written consent of the author. Examples though are a different matter. They're covered by an Open Source license similar to a 2-clause BSD license and may be freely reused for any purpose. Note though that the examples are not production code. Even though effort has been taken to test them out, they may still contain bugs. The level of the testing varies. Some examples are a rewrite of the production code, some just have been tested to compile, and some are in between. Of course, some of them demonstrate the bad programming practices and aren't supposed to work by design. In general, the bigger examples have been tested better.

The examples are important for this book. It's all built around the examples of the bad and good practices with the explanations of why things have been done one or the other way, and what design decisions have been made. All the examples in the book are marked with the labels in the format [[exNNaa]]. NN is the chapter number and aa is a two-letter unique code of the example in the chapter. The code in the examples package is also marked with the same labels, so you can find the matches easily. In case if the example in the book contains only the parts modified since the previous example, you can refer to the packaged examples to find the full version. The examples have been built and tested to work on Linux Fedora 11. Your experience with other OSes may vary. Even though the examples won't build as is on Windows (at least, outside the POSIX simulation environment), the package has the files formatted with CR-LF to make them readable for the Windows people. The Unix people are able to handle that extra CR and get rid of it.

The online version of the book is formatted in plain text. This is the easiest format for me to work with, and the format I used to write the book. It was the least labor-intensive way to publish it.

Since then I did the formatting and created the printed edition of the book.

CreateSpace book store
The printed edition is available now directly from the CreateSpace book store and at The other retailers will be coming as well. It takes a little time to propagate through the channel and I'm still learning the channel management part. As a special limited-time Internet promotion, use the discount code RYM7VM5Q to get $14 off the list price at the CreateSpace store.

The printed edition is different from the online one. It's better. Not only it is formatted nicely, but also a large number of small corrections have been incorporated into it. The explanations have been clarified, and many new detailed execution sequences have been written out. And of course the drawings are not ASCII-art any more.

A PDF with a somewhat formatted version, done by Sayed Fulkhani, has been available for download for a while. Since the fully formatted printed version (which has nothing in common with that PDF) is available now, I've removed Sayed's PDF from the SourceForge web site. If you want the formatting (and many improvements described above), go buy a printed copy.

The examples package and the zipped archive of the plain text files are available for download from the file storage at the SourceForge.

Since this book was written, I've also come up with a multithreading infrastructure with message passing for my other project, Triceps. It allows to build the applications from threads connected by the message queues, while taking care to prevent some of the pitfalls described in the chapter 6, and some of the pitfalls you can find for example in the Go language. It also manages the thread lifecycle, constructing and destroying threads in an ordered fashion, and allows to add and remove the threads from the application dynamically. Its description can be found in the Triceps manual. If you have found a bug or want to send me any other comment, you can contact me at or These accounts have different spam filters, and you may have better luck with one or the other. I also do the consulting and contract development, and am always interested in the new projects.

Enjoy the reading!



0. Introduction

1. Libraries refresher
  A crash course of STL
  POSIX threads
  MS Windows
  Pthreads wrapper

2. Locks
  Implementation: spinlocks and sleeplocks
  Locks and signals
  Locks and CPU caches
  Mutexes and scopes
  Making a sleeplock
  Lock granularity
  Lock chaining
  Lock recursion and avoiding it
  Mutex debugging
  In-function static objects in C++

3. Derived synchronization primitives
  Split lock
  Splitlocks for a variable set of threads

4. Queues
  Flow control
  Feedback loops
  Message batching
  Out-of-band and control messages
  Prioritized messages
  Simple and complicated errors

5. Multithreaded program exit
  Emergency exit
  Stopping the threads
  Interrupting a long wait
  Detecting the dropped connections
  Synchronization in object construction and destruction
  Weak references as object accessors

6. Data structures used with no explicit synchronization
  Thread-local variables
  Atomic operations and lock-free data structures
  Queues as the sole synchronization mechanism

7. Multiplexing
  I/O multiplexing in the single-threaded libraries
  Server socket handling in the multithreaded programs
  Multiplexing on the thread synchronization primitives

8. Conclusion

The examples package can be downloaded from the file storage at the SourceForge.

Get The Practice of Parallel Programming at Fast, secure and Free Open Source software downloads


Describes great many interesting things you can do with the C++ templates (but should not do in the production code).
Introduces the basic parallel computational algorithms.
The definitive guide to the POSIX threads.
A classic book on the parallel programming. A bit dated by now.
Patterns for the decomposition of algorithms for parallelism.
Describes the Solaris kernel internals, including the implementation of the locks.
Description of the BSD Unix internals, with many examples of the resource management.
The newer version of the BSD book, describing FreeBSD.
A detailed description of the low-level synchronization for various hardware architectures. Paper on the transactional memory implementation. Documentation to the Ptypes library. STREAMS Programming Guide. An article describing a working lock-free queue.