Book Review: PostgreSQL Replication

Postgresql Replication

Book Cover

So for my series of System Engineering books, I will proceed with a short review of PostgreSQL Replication by Packt. The reason this book came to be a part of my collection is that while there is a lot of information regarding PostgreSQL replication out there, a lot of it is out of date, given the overhaul of the replication system in PostgreSQL 9.X. Without further ado, here is the list of contents of the book.

  • Understanding Replication Concepts
  • Understanding the PostgreSQL Transaction Log
  • Understanding Point-In-Time Recovery
  • Setting up asynchronous replication
  • Setting up synchronous replication
  • Monitoring your setup
  • Understanding Linux High-Availability
  • Working with pgbouncer
  • Working with PgPool
  • Configuring Slony
  • Using Skytools
  • Working with Postgres-XC
  • Scaling with PL/Proxy

    The book gets straight into business with an introduction of replication concepts, and why this is a hard problem that cannot be a one-size fits all solution. Topics such as master-master replication and sharding are addressed as well. After this short introduction, specifics of PostgreSQL are examined, with a heavy focus on XLOG and related internals. The book goes into a nice balanced amount of detail, detailed enough to surpass the trivial level but not overwhelming (and thank $DEITY, we are spared source code excerpts, although a few references would be nice for those that are willing to dig further into implementation details), providing a healthy amount of background information. With that out of the way, a whole chapter is devoted to the topic of Point-In-Time-Recover (PITR for now on). PITR is an invaluable weapon in the arsenal of any DBA and gets a fair and actionable treatise, actionable meaning that you will walk away from this chapter with techniques you can start implementing right away.With the theory and basic vocabulary defined, the book then dives into replication. Concepts are explained, as well as drawbacks of each technique, alongside with specific technical instructions on how to get there, including a Q&A on common issues that you may encounter in the field.

    PostgreSQL has a complex ecosystem and once the actual built-in replication mechanisms are explained, common tools are presented (with the glaring omission of Bucardo unfortunately). This is where the book falters a bit, given the excellent quality of the replication related chapters. The presentation of the tools is not even nor deep in all cases – my gripe is that the Linux-HA chapter stops when it starts to get interesting. Having pointed this out, still these chapters can be better written and more concise than information scattered around in the web. I have paid particular attention to the PgPool chapter, which does not cover PgPool-HA (hint: there is more than one way to do it). These chapters assume no previous exposure to the ecosystem so they serve as a gentle (and again, actionable) introduction to the specific tools but I would have preferred them to be 10-15 pages longer each, providing some additional information, especially on the topic of high-availability. Even as-is, these chapters will save you a lot of time searching and compiling information, filling in a few blanks along the way, so, make no mistake, they are still useful. Bonus points for covering PostgreSQL-XC, which is somewhat of an underdog.

    A small detail is that examples in the book tend to focus on Debian-based systems so if you are administering a Red Hat derivative you should adapt the examples slightly, taking into consideration the differences in the packaging of PostgreSQL. Overall, the book goes for a broad as opposed to deep approach and can server as a more than solid introductory volume. Inevitably, there is an overlap with the official PostgreSQL manuals, which is to be expected given that they are great. The quality of the book is on par with other Packt Publishing titles, making this an easy to read book that will save you a lot of time for certain use cases.

    Advertisements
  • Book Review: Web Operations: Keeping the Data on Time

    Image

     

    For my kickoff of systems engineering book reviews I have chosen this book. While not being technical in the strict sense of the term (if you are looking for code snippets or ready-to-use architecture ideas, look elsewhere), this collection of 17 essays provides a birds-eye view of the relatively new principle of Web Operations. As you will see from the short TOC below, no stone is left unturned and broad coverage is given to a range of subject ranging from NoSQL databases to community management (and all the points in between). This is what you will be getting:

    1. Web Operations: The career
    2. How Picnik Uses Cloud Computing: Lessons Learned
    3. Infrastructure and Application Metrics
    4. Continuous Deployment
    5. Infrastructure As Code
    6. Monitoring
    7. How Complex Systems Fail
    8. Community Management and Web Operations
    9. Dealing with Unexpected Traffic Spikes
    10. Dev and Ops Collaboration and Cooperation
    11. How Your Visitors Feel: User-Facing Metrics
    12. Relational Database Strategy and Tactics for the Web
    13. How to Make Failure Beautiful: The Art and Science of Postmortems
    14. Storage
    15. Nonrelational Databases
    16. Agine Infrastructure
    17. Thing That Go Bump in the Night (and How to Sleep Through Them)

    Where can someone starts? Giving a chapter-by-chapter play is not the preferred way – chapters are short and to the point and use a variety of formats – one of them is a long interview for example, so I am going to talk about the overall feel of the book.

    The roll-call of the book is impressive. I am sure that if you worked in the field for a little while, names like Theo Schlossnagle, Baron Schwartz, Adam Jacob, Paul Hammond et al, speak for themselves. Every chapter serves as a gentle introduction to the relevant subject matter – this is to be expected as the topics are quite deep and each one carries a huge assorted bibliography. What I particularly like about this book is not only the gentle introduction, it is also written in a way that makes in approachable to technical managers, team leaders and CTOs – chapters such as the one on postmortems and the ones on metrics are prime examples of this. What is awesome is that the book helps you identify problem areas with your current business (for example the lack of using configuration management such as Puppet or Chef) and provide you with actionable information. Extra points for openly acknowledging failure, there are more than two chapters related to it (as the saying goes, if you operate at internet scale, something is always on fire, someplace), including a chapter on how to conduct efficient postmortems. Even non-technical areas such as community management are covered, illustrating that not everything is technology oriented only in the area of running an internet business today.

    Your experience with this book will greatly vary. If you are new to the topics at hand, then you might benefit by reading each and every chapter of the book and then revisit it from time to time – as your experience grows, so the number of useful ideas you can get out of this book will increase too. If you are an experiences professional, while this book might not be an epiphany, there is still useful content to apply and perhaps a few additional viewpoints might present themselves.

    Overall? An excellent book for everyone involved in running an internet business with a lot of value and a long shelf life.

    A final nice point is that proceedings from this book go to a charity, that is a nice touch.

    Coming up on Commodity

    For the past few months I have been silent, with the last entry being a re-blog from xorl’s (defunct?) blog. That is quite a long time for a writer’s block, eh? Well, here is some insight: professionally I have somewhat moved away from security to towards a systems engineering paradigm. While security still plays an important part both professionally and on my personal time, it is not the dominant focus. Building systems engineering skills is hard work, especially of focus on the engineering part as opposed to the systems part (e.g. systems administrator and systems engineer should not be interchangeable terms). My plan is to publish reviews of books and other resources that I found helpful during my journey, as well as some original hacks that I have made. I have a strict policy of not posting stuff related to $DAYJOB but I am more than willing to share some nuggets of experience. So stay tuned and say hi to the revitalized Commodity blog!

    Introduction to Sensu

    Introduction to Sensu

    Slide deck for an internal presentation I gave on Sensu a few months ago.

    I recently finished this book and really, after a long, long time I wanted to write a review. xorl beat me to it 😦

    xorl %eax, %eax

    Everybody in the “security world” knows Michal Zalewski and his work especially in the field of web security and exploitation. So, with no further introduction here is my review of his new book, “The Tangled Web“.



    Title: The Tangled Web: A Guide to Securing Modern Web Applications
    Author: Michal Zalewski

    Chapter 1: Security in the World of Web Applications
    Here we have a nice introduction to the web application security going through all the required theoretical information as well as useful historical references.

    Part I: Anatomy of the Web
    Chapter 2: It Starts with a URL
    Although a chapter dedicated to URL might initially seem like an overkill, M. Zalewski proves the opposite. In this chapter we can see that are so many details in parsing URLs correctly that is extremely difficult to have an application able to handle all of them properly.

    Chapter 3: Hypertext Transfer Protocol

    View original post 815 more words

    Rediscovery and Security News

    First things first: Happy 2012 everyone.

    So, this blog has been silent for a little while now. More astute readers might argue along the lines of “hey man! This is supposed to be a technical blog – where are all them technical articles? Have you ran out of material?”.

    Take a deep breath, the dreaded, almost compulsory metablogging block after a long pause is coming …

    The answer is a big NO! There is an abundance of material that I am proud of BUT a lot of this research has been done while solving problems for paying clients. The problem can be refined as “how do you tip-tap-toe around NDAs and do you choose to do so?”. Smart money says not to do it, so I am not. Keep this point in mind for the latter part of this post.

    One of the design decisions for this rebooted blog was that it should confer an era of positivity, at least by security and research standards, which is not the happiest of domains. So, for better or worse, I decided to bottle the acid for some time, even if that meant leaving gems such as the following (courtesy of a well known mailing list) untouched:

    I have problems with those that create malware – under the guise of
    “security research” – which then gets used by the bad guys.

    I’m not saying that one can never stop breaking into things. I just
    don’t like the glorification of creating malware by the so-called
    “good guys”. If all of that energy instead was placed into prevention,
    then we would be better off.

    [SNIP]
    P.S. One might argue that a whitehat or security researcher can’t
    change sides and go into prevention, or in other words, be a Builder
    instead of a Breaker. They can’t because they don’t have the skills to
    do it.

    Finished picking your jaw off the floor? Good! While Cpt. Obvious is on its way with the usual “vuln != exploit != malware” reply, let’s get things moving with a pet peeve of mine that I have not seen addressed.

    Almost every time a new security trend comes out, there is nary a hint that this might have been discovered some place else or sometime before. Given that security overlaps a lot with cryptography, I just cannot get around my head around the fact while rediscovery is a well accepted notion within the cryptography field (and this has been proved time and time and time again) that while something you are “discovering” might have been discovered (and countered!) before.

    Enter infosec, an ecosystem where NDAs are ten-a-penny, the underground is more tight-lipped than ever, the general consensus is that confidentiality is a necessity and where a lot of “discoveries” are handled either via the black-market (and lack of morals implied therein) or via security brokers. It was all fine and dandy but the introduction of both fame-seeking researchers and “researchers” as well the very fact that infosec makes for entertaining and sensationalist headlines that actually “sell seats in the audience” and everyday we are constantly bombarded with “news” and “research” (use of quotes intentional if you haven’t guessed already) where it can fall into one of the following categories:

  • News from the obvious department. This one is getting more and more annoying lately but it is much too obvious a target
  • Less obvious stuff that falls below the radar of cargo-cult security but still way more likely to have been encountered in the field by serious practitioners who fall into one of the non-disclosure categories listed above
  • Actual new and/or insightful findings, which tend to be lost within the sea of useless information, the stuff that REALLY makes your day
  • Since there is a very fine line between 2 and 3 (again, 1 is way too easy of a target to make fun of or suggest anything) and one can never be sure in such a rapidly and secretive landscape, for the love of $DEITY, next time see something related to infosec findings, keep in the back of your head that this might be a rediscovery and dear reporters, PLEASE DROP THE SENSATIONAL HEADLINES.

    I am not holding my breath that this will ever happen but one can only hope …

    PS:
    Finally, an image courtesy of blackhats.com infosuck webcomic. Not exactly the point that I am trying to convey but the message is quite similar and in any case it is much too funny to be left outside the party.

    P For Paranoia OR a quick way of overwriting a partition with random-like data

    (General Surgeon’s warning: The following post contains doses of paranoia which might exceed your recommended daily dosage. Fnord!).

    A lot of the data sanitisation literature around advises overwriting partitions with random data (btw, SANS Institute research claims that even a pass with /dev/zero is enough to stop MFM but YPMV). So leaving Guttman-like techniques aside, in practice, generation of random data will take a long time in your average system which does not contain a cryptographic accelerator. In order to speed up things, /dev/urandom can be used in lieu of /dev/random, noting that when read, the non-blocking /dev/urandom device will return as many bytes as are requested, even if the entropy pool is depleted . As a result, the result stream is not as cryptographically sound as /dev/random but is faster.

    Assuming that time is of the essence and your paranoia level is low there is an alternative which you can use, both providing random-like data (which means you do not have to fall back to /dev/zero and keep fingers crossed) and being significantly faster. Enter Truecrypt. Truecrypt allows for encrypted partitions using a variety of algorithms that have been submitted to peer review and are deemed secure for general usage. I can hear Johnny sceptical shouting “Hey wait a minute now, this is NOT random data, what the heck are you talking about?”. First of all, Truecrypt headers aside, let’s see what ent reports. For those of you not familiar with ent, it is a tool that performs a statistical analysis of a given file (or bitstream if you tell it so), giving you an idea about entropy and other way way useful statistics. For more information man 1 ent.

    For the purposes of this demonstration, I have created the following files:

  • an AES encrypted container
  • an equivalent size file getting data from /dev/urandom (I know, but I was in a hurry )
  • a well defined binary object in the form of a shared library
  • a system configuration file
  • a seed file which contains a mixture of English, Chinese literature, some C code, strings(1) output from the non-encrypted swap (wink-wink, nudge-nudge)
  • Let’s do some ent analysis and see what results we get (for the hastily un-strict compliant Perl code look at the end of the article)

    ################################################################################
    processing file: P_for_Paranoia.tc 16777216 bytes
    Entropy = 7.999988 bits per byte.

    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.

    Chi square distribution for 16777216 samples is 288.04, and randomly
    would exceed this value 10.00 percent of the times.

    Arithmetic mean value of data bytes is 127.4834 (127.5 = random).
    Monte Carlo value for Pi is 3.141790185 (error 0.01 percent).
    Serial correlation coefficient is 0.000414 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: P_for_Paranoia.ur 16777216 bytes
    Entropy = 7.999989 bits per byte.

    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.

    Chi square distribution for 16777216 samples is 244.56, and randomly
    would exceed this value 50.00 percent of the times.

    Arithmetic mean value of data bytes is 127.4896 (127.5 = random).
    Monte Carlo value for Pi is 3.143757139 (error 0.07 percent).
    Serial correlation coefficient is -0.000063 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: seed 16671329 bytes
    Entropy = 5.751438 bits per byte.

    Optimum compression would reduce the size
    of this 16671329 byte file by 28 percent.

    Chi square distribution for 16671329 samples is 101326138.53, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 82.9071 (127.5 = random).
    Monte Carlo value for Pi is 3.969926804 (error 26.37 percent).
    Serial correlation coefficient is 0.349229 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: /etc/passwd 1854 bytes
    Entropy = 4.898835 bits per byte.

    Optimum compression would reduce the size
    of this 1854 byte file by 38 percent.

    Chi square distribution for 1854 samples is 20243.47, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 86.1019 (127.5 = random).
    Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
    Serial correlation coefficient is 0.181177 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: /usr/lib/firefox-4.0.1/libxul.so 31852744 bytes
    Entropy = 5.666035 bits per byte

    Optimum compression would reduce the size
    of this 31852744 byte file by 29 percent.

    Chi square distribution for 31852744 samples is 899704400.21, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 74.9209 (127.5 = random).
    Monte Carlo value for Pi is 3.563090648 (error 13.42 percent).
    Serial correlation coefficient is 0.391466 (totally uncorrelated = 0.0).

    Focusing on entropy, we see that
    Truecrypt: Entropy = 7.999988 bits per byte.
    /dev/urandom: Entropy = 7.999989 bits per byte.

    which are directly comparable (if you are trusting ent that is) and much better than a well structured binary file (5.666035 bits per byte) and heads and shoulders our seed.txt results (which is a conglomerate unlikely to be encountered in practice). Chi-square entropy distribution values are different by a factor of 5 in our example, in favor of /dev/urandom data, which is still way more than the data encountered in our other test cases.

    From the above, there is strong indication that when you need random-like data and /dev/urandom is too slow (for example, as I will elaborate on an upcoming post), for example when you want to “randomize” your swap area, a Truecrypt volume will do in a pinch.

    #!/usr/bin/env perl
    use warnings;
    use File::stat;
    # a 5 min script (AKA no strict compliance) to supplement results for a blog article
    # why perl? Nostalgia :-)

    @subjects = qw(P_for_Paranoia.tc P_for_Paranoia.ur seed /etc/passwd /usr/lib/firefox-4.0.1/libxul.so);
    sub analyzeEnt {
    my($file) = @_;
    my $sz = stat($file)->size;
    my $ent = `ent $file` ."\n";
    print "#" x 80 . "\nprocessing file: $file ". $sz ." bytes\n".$ent;
    }
    foreach my $subject (@subjects) {
    &analyzeEnt($subject);
    }