The Art of Capacity Planning

Book Cover

The next entry in my series of Systems Engineering book reviews is The Art of Capacity Planning by O’Reilly. This book is a worthy addition to every system engineer’s bookshelf, as capacity planning is a valuable skill to have and should be constantly be applied to infrastructure of any given size. Capacity planning is a task that you will be doing frequently (and if not, be prepared to pay some dire consequences) as a systems engineer. For the rest of the review I will assume that you do know what capacity planning is and why you need it.

Below is the table of contents:

  • Goals, Issues and Processes in Capacity Planning
  • Setting Goals For Capacity
  • Measurement: Units of Capacity
  • Predicting Trends
  • Deployment
  • Virtualization and Cloud Computing
  • Dealing with Instantaneous Growth
  • Capacity Tools
  • The first chapter of the book serves as a quick introduction. The author is quick to state that this is not about complex simulations and the maths used are mostly back-of-the-envelope calculations, as opposed to formal models and then starts with the distinction between performance tuning and capacity planning – two closely related activities that cannot be used interchangeably. The need for extensive measurement is stressed over and over again and the need for a system to tell stories is clearly and explicitly stated.

    Goal determination is the name of the game for chapter two. Concepts such as Service Level Agreements (both formal and informal), user expectations and business capacity requirements are introduced and then it moves on to architectural design goals, with a focus on providing accurate measurement points. Once measurements points for each role are established, possible scaling points per role are introduced, as well as the different kinds of scaling. The author is not political, merits for both horizontal and vertical scaling are discussed and the term diagonal scaling is introduced. Finally, the chapter briefly touches on Disaster Recovery and Business Continuity Planning.

    Chapter three is mostly about metrics and finding your limits. It is common sense that in order to measure something, you should first define sensible units of measurement. The fact that introducing measurement into a system affects slightly system performance is stated and a baseline for measurement tools is given. The author, like me, seems to be a big fan of measurement and with good reason. System, network and application metrics can be used to proactively identify problems and the more metrics you gather, the more informed decisions you can make about your overall system’s health, performance and capacity. Different contexts are introduced via different real-world case studies and effects of technologies such as caching are discussed. The chapter closes stressing again the importance of extensive measurement.

    All the metrics in the world are useless if you do not know how to classify and use them in your decision making process. The focus of chapter four is plotting trends and making forecasts. As stated elsewhere in the book, capacity planning goes hand in hand with resource procurement (be it real or cloud) and resources carry a price tag with them. It is a hands on chapter that uses a spreadsheet to show concepts such as curve fitting (personally I would have preferred to have some R code samples) and the effect that capacity planning has on procurement (delving into topics such as procurement time as well). This is a nice chapter to read if you interact with CFOs frequently.

    Chapter five is mostly about deploying and managing the capacity. It kicks off with a set of goals, such as centralized log management, hints at configuration management (it does not point out any tools for it such as Puppet or Chef but it describes the need for it) and server consistency and automatic start-up is highlighted. I personally am a big fan of automation and I am glad to see that I am on the same page with the author.

    The book closes with three appendixes. One is dealing with virtualization and cloud computing in general, combined with a number of case studies. The next one gives some advice on what to do if you have spikes of too much traffic (what we used to call the Slashdot effect back in the day) and finally the author points to some, mostly Free Software Open Source, tools of the trade.

    The TL:DR; version of this review is the following: This book is quality volume that belongs in your bookshelf or e-book reader. But let’s elaborate.

    John Allspaw is a seasoned professional and this shows, the book is packed with case studies from Allspaw’s employers, allowing easy transfer of knowledge. Compared to other capacity planning volumes, such as the excellent “Guerrilla Capacity Planning”, this book is lighter on the math side, using tons of graphs and schematics to convey information. It is also a short book clocking in at just under 150 pages, making it a quick read that guarantees re-visits. The author makes a strong effort to keep things platform agnostic. Having said that, while this book is not a tutorial, any decent Unix/GNU Linux engineer will be able to apply the knowledge contained therein immediately. A point that I cannot stress enough is that the printing quality of the book is excellent – something that perhaps can be expected from O’ Reilly. I would really welcome a second and more expanded edition but even as-is, this is an excellent book.