NFSiostat_plotter V4

I made an update to nfsiostat_plotter_v3 so that the iostat information, which is used for gathering the CPU usage, is optional. This means you don’t have to run iostat while you run nfsiostat. But it also means you won’t get to see the CPU usage plots.

Please go to this page for more details

Advertisements

IOSTAT Plotter and NFSiostat plotter updates (V3!)

I’ve been updating both nfsiostat_plotter and nfsiostat_plotter. Both versions are now “v3”. There are a few new features in both:


  • Normally nfsiostat doesn’t capture the CPU utilization but this can be an important part of analyzing NFS client performance. In this version of the code you need to also run iostat while you run nfsiostat so that the CPU utilization is captured. With this version of code, you have to run iostat and run iostat_plotter_v3.py to produce the iostat output (which is read by nfsiostat_plotter).
  • For both iostat_plotter and nfsiostat_plotter, the legend has been moved outside the plot to the top right
  • For both iosta_plotter and nfsiostat_plotter, the size of the legend has been shrunk so you can plot more NFS file systems (up to about four before the legend lables start to overlap). Iostat_plotter can handle about 8 devices.
  • For nfsiostat_plotter, the command option “-c” was added so you could combine the NFS file systems onto a single plot (total of 4 plots)
  • For iostat_plotter, the command option “-c” was added so you could combine the devices onto a single plot (total of 4 plots)
  • For iostat_plotter, the code can read either sysstat v9.x or sysstat v10.x format for iostat. This is very important since CentOS and RHEL ship with a very old sysstat v9.x package.
  • For both iostat_plotter and nfsiostat_plotter, the legend labels are auto-sized based on the string length (it’s a heuristic algorithm).
  • For both iostat_plotter and nfsiostat_plotter, a simple “-h” option as added that produces a small help output.

Go to this page for iostat_plotter_v3 and this for nfsiostat_plotter_v3. You will see links to the new code. But just in case, here are the links.

IOSTAT Plotter V2

I have had several requests for changes/modifications/updates to iostat_plotter so I recently found a little time to update the code. Rather than just replace I created, iostat_plotter_v2.

The two biggest changes made in the code are: (1) moving the legends outside the plots to the top right hand corner, and (2) allowing all of the devices to be included on the same set of plots. The first change makes look nicer (I think). I tried to maximize the width of the plot without getting too crazy. I also shrunk the text size on the legend so you could get more devices in the plots. I think you can get about 12 devices without the legend bleeding over to the plot below it.

The second change allows you to put all of the devices on the same plot if you use the “-c” option with the script. In the previous version you got a set of 11 plots for each device which allows you to clearly examine each device. If you had five devices you got a total of 55 plots (5 devices * 11 plots per device). Using the “-c” option you will now get 11 plots with all of the devices on each of the appropriate plots.

I hope these couple of features are useful. Please le me know if they are or if you want other changes. Thanks!

Checksumming Files to Find Bit-Rot

Do You Suffer From Bit-Rot?

Storage admins live in fear of corrupted data. This is why we make backups, copies (replicas), and use other methods to make sure we have copies of the data in case the original is corrupted. One of the most feared sources of data corruption is the proverbial bit-rot.

Bit rot can be caused by a number of sources but the result is always the same – one or more bits in the file have changed, causing silent data corruption. The “silent” part of the data corruption means that you don’t know it happened – all you know is that the data has changed (in essence it is now corrupt).

One source of data corruption is called URE (Unrecoverable Read Error) or UBER (Unrecoverable Bit Error Rate). These measures are a function of the storage media design and tell us the probability of encountering a bit on a drive that cannot be read. Sometimes specific bits on drives just cannot be read due to various factors. Usually the drive reports this error and it is put into the system logs. Also, many times the OS will give an error because it cannot read a specific portion of data. Or, in some cases, the drive will read the bit even though it may contain bad data (maybe the bit flipped due to cosmic radiation – which does happen, particularly at higher altitudes) which means that the bit can still be read but the bit is now incorrect.

The actual URE or UBER for a particular storage device is usually published by the manufacturer, although many times it can be hard to find. Typical ranges for hard drives are around 1 x 1014 bits which means that 1 out of ever 1014 bits cannot be read. Some SSD manufacturers will list their UBER as 1 x 1015 and some hard drive manufacturers will use this same number for enterprise class drives. Let’s convert that number to something a little easier to understand – how many Terabytes (TBs) must be read before encountering a URE.

One TB (1 TB) drives have about 2 billion (2 x 109) sectors and let’s assume a URE rate of 1 x 1014. The URE converts to about 24 x 109 sectors assuming that we have 512 byte sectors or 512 * 8 = 4,096 bits per sector. If you then divide the URE by the number of bytes per disk, you get the following:

24 x 10^9 / 2 x 10^9 = 12


This means if you have 12TBs or 12x 1TB drives your probability of encountering a URE is one (i.e. it’s going to happen). If you have 2TB drives, then all you need is 6x 2TB drives and you will encounter a URE.

If you have a RAID-5 group that has seven 2TB drives and one drive fails, the RAID rebuild has to read all of the remaining disks (all six of them). At that point you are almost guaranteed that during the RAID-5 rebuild, you will hit a URE and the RAID rebuild will fail. This means you have lost all of your data.

This is just an example of a form of bit-rot – the dreaded URE. It can and does happen and people either don’t see it or go screaming into the night that they lost all of their carefully saved KC and the Sunshine Band flash videos and mp3’s.

So what can be do about bit-rot? There are really two parts to that question. The first part, is detecting corrupted files and the second part is correcting corrupted files. In this article, I will talk about some simple techniques using extended file attributes that can help you detect corrupted data (recovering is the subject for a much longer article).

Checksums to the Rescue!

One way to check for corrupted data is through the use of a checksum. A checksum is a simple representation or finger print of a block of data (in our case, a file). There are a whole bunch of checksums including md5, sha-1, sha-2 (including 256, 384, and 512 bit checksums), and sha-3. These algorithms can be used to compute the checksum of a chunk of data, such as a file, with longer or more involved checksums typically requiring more computational work. Note that checksums are also used in cryptography, but we are using them as a way to finger print a file.

So for a given a file we could compute a checksum using a variety of techniques or compute different checksums using different algorithms for the same file. Then before a file is read the checksum of the file could be computed and compared against a stored checksum for that same file. If they do not match, then you know the file has changed. If the time stamps on the file haven’t changed since the checksums of the file were computed, then you know the file is corrupt (since no one changed the file, the data obviously fell victim to bit-rot or some other form of data corruption).

If you have read everything carefully to this point you can spot at least one flaw in the logic. The flaw I’m thinking of is that this process assumes that the checksum itself has not fallen victim to data corruption. So we have to find some way of ensuring the checksum itself does not fall victim to bit-rot or that we have a copy of the checksums stored somewhere that we assume does not fall victim to data corruption.

However, you can spot other flaw in the scheme (nothing’s perfect). One noticeable flaw is that until the checksum of the file is created and stored in some manner, the file can fall victim to bit-rot. We could go through some gyrations so that as the file is being created, the checksum is being computed and stored in real-time. However, that will dramatically increase the computational requirements and also slow down the I/O.

But for now, let’s assume that we are interested in ensuring data integrity for files that have been around a while and maybe haven’t been used for some period of time. The reason this is interesting is that since no one is using the file it is difficult to tell if the data is corrupt. Using checksums can allow us to detect if the file has been corrupted even if no one is actively using the file.

Checksums and Extended File Attributes

The whole point of this discussion is to help protect against bit-rot of files by using checksums. Since the focus is on files it makes sense to store the checksums with the file itself. This is easily accomplished using our knowledge of extended file attributes that we learned in a previous article.

The basic concept is to compute the checksums of the file and store the result in an extended attribute associated with the file. That way the checksum is stored with the file itself which is what we’re really after. To help improve things even more, let’s compute several checksums of the file since this will allow us to have several ways to detect file corruption. All of the checksums will be stored in extended attributes as well as in a file or database. However, as mentioned before there is the possibility of the checksums in the extended attributes might be corrupted – so what do we do?

A very simple solution is to store the checksums in a file or simple database and be sure that several of copies of the file or database are made. Then before you check the checksum of a file, you first look up the checksum in the file or database and then compare it to the checksums in the extended attributes. If they are identical, then the file is not corrupted.

There are lots of aspects to this process than you can develop to improve the probability of the checksums being valid. For example, you could make three copies of the checksum data and compute the checksum of these files. Then you compare the checksums of these three files before you read any data. If two of the three values are the same then you can assume that those two files are correct and that the third file is incorrect resulting in it being replaced from one of the other copies. But now we are getting into implementation details which is not the focus of this article.

Let’s take a quick look at some simple Python code to illustrate how we might compute the checksums for a file and store them in extended file attributes.

Sample Python Code

I’m not an expert coder and I don’t play one on television. Also, I’m not a “Pythonic” Python coder, so I’m sure there could be lots of debate about the code. However, the point of this sample code is to illustrate what is possible and how to go about implementing it.

For computing the checksums of a file, I will be using commands that typically come with most Linux distributions. In particular, I will be using md5sum, sha1sum, sha256sum, sha384sum, and sha512sum. To run these commands and grab the output to standard out (stdout), I will use a Python module called commands (note: I’m not using Python 3.x but Python 2.5.2 but I’ve also tested this code against Python 2.7.1 as well). This module has Python functions that allow us to run “shell commands” and capture the output in a tuple (this is a data type in Python).

However, the output from a shell command can have several parts to it so we may need to break the string into tokens so we can find what we want. A simple way to do that is to use the functions in the shlex module (Simple Lexical Analysis) for tokenizing a string based on spaces.

So let’s get coding! Here is the first part of my Python code to illustrate where I’m headed and how I import modules.

#!/usr/bin/python

#
# Test script for setting checksums on file
#

import sys

try:
   import commands                 # Needed for psopen
except ImportError:
   print "Cannot import commands module - this is needed for this application.";
   print "Exiting..."
   sys.exit();

try:
   import shlex              # Needed for splitting input lines
except ImportError:
   print "Cannot import shlex module - this is needed for this application.";
   print "Exiting..."
   sys.exit();



if __name__ == '__main__':
    
    # List of checksum functions:
    checksum_function_list = ["md5sum", "sha1sum", "sha256sum", "sha384sum", "sha512sum"];
    file_name = "./slides_fenics05.pdf";
    
# end if


At the top part of the code, I import the modules but raise exceptions and exit if the modules can’t be found since they are a key part of the code. Then in the main part of the code I define the list of the checksum functions I will be using. These are the exact command names to be used to compute the checksums. Note that I have chosen to compute the checksum of a file using 5 different algorithms. Since we will have multiple checksums for each file it will help improve the odds of finding a file with data corruption because I can check all five checksums. Plus it might also help find a corrupt checksum since we could compare the checksum of the file against all five measures and if one of the checksums in the extended file attributes is wrong but the other four are correct they we have found a corrupted extended attribute.

For the purposes of this article I’m just going to examine one file, slides_fenics05.pdf (a file I happen to have on my laptop).

The next step in the code is to add the code that loops over all five checksum functions.

#!/usr/bin/python

#
# Test script for setting checksums on file
#

import sys

try:
   import commands                 # Needed for psopen
except ImportError:
   print "Cannot import commands module - this is needed for this application.";
   print "Exiting..."
   sys.exit();

try:
   import shlex              # Needed for splitting input lines
except ImportError:
   print "Cannot import shlex module - this is needed for this application.";
   print "Exiting..."
   sys.exit();



if __name__ == '__main__':
    
    # List of checksum functions:
    checksum_function_list = ["md5sum", "sha1sum", "sha256sum", "sha384sum", "sha512sum"];
    file_name = "./slides_fenics05.pdf";
    
    for func in checksum_function_list:
        # Create command string to set extended attribute
        command_str = func + " " + file_name;
        checksum_output = commands.getstatusoutput(command_str);
        print "checksum_output: ",checksum_output
    # end for
 
# end if


Notice that I create the exact command line I want run as a string called “command_str”. This is the command executed by the function “commands.getstatusoutput”. Notice that this function returns a 2-tuple (status, output). You can see this in the output from the sample code below.

laytonjb@laytonjb-laptop:~/$ ./test1.py
checksum_output:  (0, '4052e5dd3d79de6b0a03d5dbc8821c60  ./slides_fenics05.pdf')
checksum_output:  (0, 'cdfcadf4752429f01c8105ff15c3e24fa9041b46  ./slides_fenics05.pdf')
checksum_output:  (0, '3c2ad544ba4245dc9e300afe79b81a3a25b2ff6e71e127724acd51124c47a381  ./slides_fenics05.pdf')
checksum_output:  (0, '0761eac4323d35a62c52f3c49dd2098e8b633724ed8dec2ee2de2ddda0874874a916b99287703a9eb1886af62d4ac0b3  ./slides_fenics05.pdf')
checksum_output:  (0, '42674cebe76d0c0567cf1bed21008b005912f0df76990456b669ef3d3942e607d69079e879ceecbb198e846a042f49ee28c145f9b1dc0b4bb4c9ddadd25777c5  ./slides_fenics05.pdf')
laytonjb@laytonjb-laptop:~/Documents/FEATURES/STORAGE094$


You can see that each time the commands.getstatusoutput function is called there are two parts in the output tuple – (1) the status of the command (was it successful?) and (2) the result of the command (the actual output). Ideally we should check the status of the command to determine if it was successful but I will leave that as an exercise for the reader 🙂

At this point we want to grab the output from the command (the second item in the 2-tuple) and extract the first part of the string which is the checksum. To do this we will use the shlex.split function in the shlex module. The code at this points looks like the following:

#!/usr/bin/python

#
# Test script for setting checksums on file
#

import sys

try:
   import commands                 # Needed for psopen
except ImportError:
   print "Cannot import commands module - this is needed for this application.";
   print "Exiting..."
   sys.exit();

try:
   import shlex              # Needed for splitting input lines
except ImportError:
   print "Cannot import shlex module - this is needed for this application.";
   print "Exiting..."
   sys.exit();



if __name__ == '__main__':
    
    # List of checksum functions:
    checksum_function_list = ["md5sum", "sha1sum", "sha256sum", "sha384sum", "sha512sum"];
    file_name = "./slides_fenics05.pdf";
    
    for func in checksum_function_list:
        # Create command string to set extended attribute
        command_str = func + " " + file_name;
        checksum_output = commands.getstatusoutput(command_str);
        print "checksum_output: ",checksum_output
        tokens = shlex.split(checksum_output[1]);
        checksum = tokens[0];
        print "   checksum = ",checksum," \n";
    # end for
 
# end if


In the code, the output from the checksum command is split (tokenized) based on spaces. Since the first token is the checksum that is what we’re interested in capturing and storing in the extended file attribute, we use the first token in the list and store it to a variable.

The output from the code at this stage is shown below:

laytonjb@laytonjb-laptop:~$ ./test1.py
checksum_output:  (0, '4052e5dd3d79de6b0a03d5dbc8821c60  ./slides_fenics05.pdf')
   checksum =  4052e5dd3d79de6b0a03d5dbc8821c60  

checksum_output:  (0, 'cdfcadf4752429f01c8105ff15c3e24fa9041b46  ./slides_fenics05.pdf')
   checksum =  cdfcadf4752429f01c8105ff15c3e24fa9041b46  

checksum_output:  (0, '3c2ad544ba4245dc9e300afe79b81a3a25b2ff6e71e127724acd51124c47a381  ./slides_fenics05.pdf')
   checksum =  3c2ad544ba4245dc9e300afe79b81a3a25b2ff6e71e127724acd51124c47a381  

checksum_output:  (0, '0761eac4323d35a62c52f3c49dd2098e8b633724ed8dec2ee2de2ddda0874874a916b99287703a9eb1886af62d4ac0b3  ./slides_fenics05.pdf')
   checksum =  0761eac4323d35a62c52f3c49dd2098e8b633724ed8dec2ee2de2ddda0874874a916b99287703a9eb1886af62d4ac0b3  

checksum_output:  (0, '42674cebe76d0c0567cf1bed21008b005912f0df76990456b669ef3d3942e607d69079e879ceecbb198e846a042f49ee28c145f9b1dc0b4bb4c9ddadd25777c5  ./slides_fenics05.pdf')
   checksum =  42674cebe76d0c0567cf1bed21008b005912f0df76990456b669ef3d3942e607d69079e879ceecbb198e846a042f49ee28c145f9b1dc0b4bb4c9ddadd25777c5  

The final step in the code is to create the command to set the extended attribute for the file. I will create “user” attributes that look like “user.checksum.[function]” where [function] is the name of the checksum command. To do this we need to run a command that looks like the following:

setfattr -n user.checksum.md5sum -v [checksum] [file]


where [checksum] is the checksum that we stored and [file] is the name of the file. I’m using the “user” class of extended file attributes for illustration only. If I were doing this in production, I would run the script as root and store the checksums using the “system” class of extended file attributes since a normal user would not be able to change the result.

At this point, the code looks like the following with all of the “print” functions removed.

#!/usr/bin/python

#
# Test script for setting checksums on file
#

import sys

try:
   import commands                 # Needed for psopen
except ImportError:
   print "Cannot import commands module - this is needed for this application.";
   print "Exiting..."
   sys.exit();

try:
   import shlex              # Needed for splitting input lines
except ImportError:
   print "Cannot import shlex module - this is needed for this application.";
   print "Exiting..."
   sys.exit();



if __name__ == '__main__':
    
    # List of checksum functions:
    checksum_function_list = ["md5sum", "sha1sum", "sha256sum", "sha384sum", "sha512sum"];
    file_name = "./slides_fenics05.pdf";
    
    for func in checksum_function_list:
        # Create command string to set extended attribute
        command_str = func + " " + file_name;
        checksum_output = commands.getstatusoutput(command_str);
        tokens = shlex.split(checksum_output[1]);
        checksum = tokens[0];
        
        xattr = "user.checksum." + func;
        command_str = "setfattr -n " + xattr + " -v " + str(checksum) + " " + file_name;
        xattr_output = commands.getstatusoutput(command_str);
    # end for
 
# end if


The way we check if the code is working is to look at the extended attributes of the file (recall this article on the details of the command).

laytonjb@laytonjb-laptop:~$ getfattr slides_fenics05.pdf 
# file: slides_fenics05.pdf
user.checksum.md5sum
user.checksum.sha1sum
user.checksum.sha256sum
user.checksum.sha384sum
user.checksum.sha512sum


This lists the extended attributes for the file. We can look at each attribute individually. For example, here is the md5sum attribute.

laytonjb@laytonjb-laptop:~$ getfattr -n user.checksum.md5sum slides_fenics05.pdf 
# file: slides_fenics05.pdf
user.checksum.md5sum="4052e5dd3d79de6b0a03d5dbc8821c60"


If you look at the md5sum from earlier output listings you can see that they match the md5 checksum in the extended file attribute associated with the file, indicating that the file hasn’t been corrupted.

Ideally we should be checking the status of each command to make sure that it returned successfully. But as I mentioned earlier that exercise is left up to the user.

One other aspect we need to consider is that users may have changed the data. We should store the date and time when the checksums were computed and store that value in the extended file attributes as well. So before computing the checksum on the file to see if it is corrupted we need to check if the time stamps associated with the file are more recent than the date and time stamp when the checksum was originally computed.

Summary

Data corruption is the most feared aspects of a storage admin’s life. This is why we do backups, replication, etc. – to recover data if the original data gets corrupted. One source of corrupted data is what is called bit-rot. Basically this is when a bit on the storage device goes bad and the data using that bit cannot be read or it returns the incorrect value indicating the file is now corrupt. But as we accumulate more and more data and this data gets colder (i.e. it hasn’t been used in a while), performing backups may not be easy (or even practical) so how do we determine if our data is corrupt?

The technique discussed in this article is to compute the checksum of the file and store it in an extended file attribute. In particular, I compute five different checksums to give us even more data to determine if the file has been corrupted. By storing all of the checksums in an additional location and ensuring that the stored values aren’t corrupt, we can compare the “correct” checksum to the checksum of the file. If they are the same, then the file is not corrupt. But if it’s different and yet the file has not been changed by the user, then the file is likely to be corrupt.

To help illustrate these ideas, I wrote some simple Python code to show you how it might be done. Hopefully this simple code will inspire you to think about how you might implement something a bit more robust around checksums of files and checking for data corruption.

Switching to Scientific Linux 6.1

Introduction

When I started using Linux very seriously in about 1993 or so. I converted my home system to Linux in 1993 using Yggdrasil and a bunch of floppies (Actually I remember Linus’ posting to comp.os.minix because I was looking for a *nix that I could run on my own system because I used it so much in graduate school and I read that mailing list in hopes I could figure out how to install minix on my home system). So I started using Yggdrasil and really liked it.

However, Yggrasil’s run pretty much ended in 1995 but I used it for a while longer because it was fun (ans easy). Around 1996 or 1997 I switched over to Red Hat Linux and found that I really liked it. Plus the world as settling on rpm as the application distribution format, so I headed down the Red Hat path.

I used Red Hat for quite a while on all kinds of systems. My personal desktop at home, at desktops at work (that was an interesting adventure because of SCO and the whole lawsuit threat, and HPC systems. I was very happy with it and I tried to purchase support when I could for production systems. Then Red Hat announced their change to Red Hat Enterprise Linux (RHEL) so I decided to make a switch then.

At that time I switched over to CentOS for various reasons. For production systems I switched to RHEL but for my home systems I used CentOS. I liked CentOS very much – it was just like RHEL that I used at work but without the support costs which I couldn’t afford.

CentOS

I used CentOS all over my home when ever I needed Linux. At one point I had 13 servers and desktops all using CentOS and I was happy as a clam. The security updates came out fairly quickly, the community was fairly good and they even tolerated non-CentOS questions such as general admin questions. I used CentOS in many HPC systems and wrote lots of articles using CentOS as the OS. Sorry Red Hat – I just couldn’t afford the support costs at the time and CentOS gave me everything I needed.

During this time I also tried SuSE because we used it as Linux Networx used it since it was cheaper for HPC than Red Hat. I even had SuSE on my laptop for a few years but didn’t use it too much.

I also tried CAOS Linux because my friend Greg Kurtzer, who developed Warewulf and Perceus, was developing it. I used it on a few small clusters and wrote a few article about it. It was close enough to Red Hat that I was comfortable with it but I still used CentOS on my desktop (old habots die hard).

I also tried Ubuntu during this time and it was nice and easy to use and worked well on laptops so I switched over my laptops to use it. However it did have a slight learning curve so I didn’t switch over any HPC systems to it nor did I switch over any production systems to it.

I still used CentOS until the great “whine” debacle of 2010-2011.

CentOS Community disintegrates

I don’t know the exact data but around 2010 or 2011, the whole CentOS project started to unravel. It didn’t track RHEL updates very quickly, particularly for RHEL 6.x and RHEL 5.6 (CentOS 5.5 was slow enough).

The mailing lists soon filled with users asking about the newer versions. Then the volume on the mailing lists turned up and the “developers” became belligerent, secretive, and amazingly rude. I know they were doing CentOS work on the side, but the fast disintegration of CentOS soon became apparent.

So I was stuck. I couldn’t afford to buy RHEL from Red Hat (maybe the Workstation version but I was looking for the server version) and CentOS was rapidly becoming a steaming pile. I wasn’t ready to jump with both feet into Ubuntu or SusE (worry guys) because I knew RHEL well enough that I could focus on what I wanted to do with it, rather than how to install it and admin it.

Scientific Linux

I had know about Scientific Linux for a while since I work in that field. I had heard some good things about it and I was impressed by the speed that they put out distributions and security updates. So I thought I would give it a try.

I grabbed the SL6.1 DVD install iso and put it on my main test system (I use it for all of my storage testing). The installation went very smoothly – exactly like I’m used to. I have a tendency to go a little heavy on the initial package selection so I restrained myself with this installation. However, I did chose the alternative yum repos so I could get some extra stuff. For example, I installed my all time favorites: gkrellm, nedit, and vlc. Easy as pie. But one of the cool things that comes with SL is all of the XFS extra goodies (got to love xfs!).

Summary

I suppose CentOS 6.1 would have been just as easy to install but installing SL6.1 was just as easy, it gave me a few more goodies than CentOS and I won’t be subjected to any of the drama surrounding CentOS. So I get the exact same behavior that I want (RHEL or CentOS), the easy installation (RHEL or CentOS), and I don’t get any of the drama of CentOS and I can afford the price on SL for now. Seems like a good deal and I will definitely be switching over to SL on all my production boxes at home but I’ll still use RHEL on production systems outside home (BTW – Red Hat is doing some great things around the HPC community and file systems and storage so they deserve our support in my opinion.