NOAA

Geophysical Fluid
Dynamics Laboratory

Skip to: [content] [navigation]
search gfdl:

If you are using Navigator 4.x or Internet Explorer 4.x, this site will not render correctly!

Back to faq


[NOAA bullet] The data values in a CM2.x file I have downloaded do not seem to be correct. How can I determine if the file was corrupted during the download process?  

People who have access to machines running a Linux (and other) operating systems can perform simple checks to determine whether or not a GFDL CM2.x file was corrupted during during the download process. Since there have been isolated reports of just such file corruption occurring during transfers, we provide, for comparison purposes, checksum values for each CM2.x file available for download on the GFDL Data Portal.

Each CM2.x directory on the GFDL Data Portal contains a plain text file named checksum_report. The checksum_report file contains information we generated by using program md5sum. The md5sum utility is part of the standard Linux distribution. (The commands man md5sum and/or info md5sum provide information about this program.)

Note: For a user to generate a checksum value consistent with those we provide, the user should utilize the MD5 algorithm (Message-Digest algorithm 5, for more information see http://en.wikipedia.org/wiki/MD5 and http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html). Though the descriptions in this FAQ focus on the Linux flavor of md5sum that is readily available to many users, the MD5 algorithm has been widely ported, thus providing a mechanism to obtain results that are independent of the operating system and platform.

Consider the following example that determines if a downloaded version of the file clivi_A1.186101-200012.nc file for experiment CM2.1U-D4_1860-2000-AllForc_H2 matches what is on the GFDL Data Portal. One would need to refer to the file at nomads.gfdl.noaa.gov/dods-data/gfdl_cm2_1/CM2.1U-D4_1860-2000-AllForc_H2/pp/atmos/ts/monthly/checksum_report (When downloading CM2.x netCDF files from the GFDL Data Portal, users may want to consider routinely downloading the relatively small checksum_report files as well, in order to facilitate checks for file corruption.)

The checksum_report files contain four columns of information. At the time that this FAQ was written, the first two columns of the first line of the checksum_report file referenced above were

2005_03_03__17_23 gfdl_cm2_1/CM2.1U-D4_1860-2000-AllForc_H2/pp/atmos/ts/monthly

with the first column showing the time the checksum was computed and the second listing the directory location. The third and fourth columns list the netCDF file name and the actual checksum value generated by md5sum for that file, which in this example are

clivi_A1.186101-200012.nc    6de977d9ec1cda2b8f990cb87a90324a

From this line, one can determine that on the afternoon of 3 March 2005 the checksum calculated for the file clivi_A1.186101-200012.nc file in that GFDL Data Portal directory was 6de977d9ec1cda2b8f990cb87a90324a. That checksum value was computed at GFDL using the command

md5sum clivi_A1.186101-200012.nc

If you issue the same command on your Linux machine for the downloaded version of the clivi_A1.186101-200012.nc, it will hopefully return the identical checksum value. If there are no differences between two checksum values - the one that we have computed and stored in the checksum_report file and the second being the one you generate using md5sum for the downloaded file - then you have confirmed that no corruption occurred during the download process.

If differences in the checksums do exist, they may differ for the following reasons:

  1. A newer version of the netCDF file in question has been placed on the GFDL Data Portal, and in the past you downloaded an older version. Such a replacement of a file on our server may have been done in order to fix a problem found with an earlier version of the file or to simply offer a file with enhanced metadata. If you find a newer version of the file in question is available for download, we suggest you download the newer version of the netCDF file.
  2. The version of the netCDF file you have downloaded was corrupted during the download process or after it was downloaded onto your machine. If that is the case, you should retry downloading the netCDF file.
  3. There is a remote chance that the GFDL checksum_report may not be up to date, and contains an obsolete checksum value for a file that we have replaced, and that you have downloaded the proper and most recent version of the netCDF file. To rule out this possibility, one may wish to confirm that the modification date of the GFDL checksum_report file does not pre-date the modification date of the netCDF file of interest residing on the GFDL Data Portal. If you determine that the checksum_report is not up to date, please notify us at GFDL.Climate.Model.Info email address, and we will update the checksum_report file to permit proper checksum comparisons for that directory.

Questions related to the GFDL CM2.x models may be directed to…


[email GFDL.Climate.Model.Info at noaa dot gov]