Welcome back for another book review. This time, I am going to review a book that I have bought when it came out, in late 2013. I have always wanted to do a review of this one but it seems I had two options:

  1.  Write a short review that probably does not do the book justice.
  2. Postpone the review for a more suitable time, when $IRL and $DAYJOB allow …

I opted for the second option, as I consider this book to be indispensable (yes, this is going to be a positive review). So, here is the table of contents:

  1. Introduction
  2. Methodology
  3. Operating Systems
  4. Observability Tools
  5. Applications
  6. CPUs
  7. Memory
  8. File Systems
  9. Disks
  10. Network
  11. Cloud Computing
  12. Benchmarking
  13. Case Study
  14. Appendices (which you SHOULD read)

Wow, a lot of contect, huh? (something to be expected, given that the book is more than 700+ pages). Do not let the size daunts you however. Chapters are self-contained, as the author understands that the book might be read under pressure, and contain useful exercises at the end.

What really makes this book stands out, is not the top-notch technical writing or abundance of useful one-liners, is the fact that the author moves forward and suggests a methodology for troubleshooting and performance analysis, as opposed to the ad-hoc methods of the past (or best case scenario a checklist and $DEITY forbid the use of “blame someone else methodology”). In particular the author suggests the USE methodology, USE standing for Utilization – Saturation – Errors, to methodically and accurately analyze and diagnose problems. This methodology (which can be adapted/expanded at will, last time I checked the book was not written in stone), is worth the price of the book alone.

The author correctly maintains that you must have an X-ray (so to speak) of the system at all times. By utilizing tools such as DTrace (available for Solaris and BSD) or the Linux equivalent SystemTap, much insight can be gained from the internals of a system.

Chapters 5-10 are self-explanatory: the author presents what the chapter is about, common errors and common one-liners used to diagnose possible problems. As said before, chapters aim to be self contained and can be read while actually troubleshooting a live system so no lengthy explanations there. At the end of the chapter, the bibliography section provides useful pointers towards resources for further study, something that is greatly appreciated. Finally, the exercises can be easily transformed to interview questions, which is another bonus.

Cloud computing and the special considerations that is presenting is getting its own chapter and the author tries to keep it platform agnostic (even if employed by a “Cloud Computing” company), which is a nice touch. This is followed by a chapter on useful advice on how to actually benchmark systems and the book ends with a, sadly too short, case study.

The appendices that follow should be read, as they contain a lot of useful one-liners (as if the ones in the book were not enough), concrete examples of the USE method, a guide of porting dtrace to systemtap and a who-is-who in the world of systems performance.

So how to sum up the book? “Incredible value” is one thought that comes to mind, “timeless classic” is another. If you are a systems {operator|engineer|administrator|architect}, this book is a must-have and should be kept within reach at all times. Even if your $DAYJOB does not have systems on the title, the book is going to be useful, if you have to interact with Unix-like systems on a frequent basis.

PS. Some reviews of this book complain about the binding of the book. In three physical copies that I have seen before my eyes, binding was of the highest quality so I do not know if this complain is still valid.