MRGHIT

Latest: MRGHIT v1.00 (22AUG13)

Contents

22AUG13

Introduction

mrghit.f is the FORTRAN77 program to merge two or more HITRAN line data files into a single file, ordered by increasing wavenumber (e.g. incorporating the molecule-specific HITEMP files into HITRAN 2012)

The tricky part is identifying and avoiding duplicating lines which are common to both datasets but not necessarily at identical wavenumbers.

This program assumes that all data from the second-named file will be used, and any duplicate lines the first-named file removed, without any distinction between molecules. So, for example, you can't use it to replace just the ozone data in HITRAN 2008 with the ozone from HITRAN 2012 (if you wanted to do this you could first run subhit to extract the ozone data from HITRAN 2012, then use mrghit to insert it into HITRAN 2008)

Note that this program won't work on the old (pre-2004) HITRAN data.

Installing mrghit

First download the source code: [mrghit.f]

Then compile with any FORTRAN77 compiler, eg

f77 mrghit.f -o mrghit

Running mrghit

To run the program, simply type mrghit and respond to the prompts.

A typical run might be (user responses in bold)

mrghit
R-MRGHIT: Running MRGHIT v1.00
Wavenumber range (cm-1) [<CR>=all]: 700 750
Input list of up to 10 files to be merged, ending with <CR>
File# 1: HITRAN2012.par
File# 2: 01_700-800_HITEMP2010.par
File# 3: 02_625-750_HITEMP2010.par
File# 4: <CR>
Output filename: newfile.par
I-MRGHIT: Record# 100000 Wavenumber= 702.031700
I-MRGHIT: Record# 200000 Wavenumber= 704.078800
I-MRGHIT: Record# 300000 Wavenumber= 706.133800
I-MRGHIT: Record# 400000 Wavenumber= 708.166400
...
I-MRGHIT: Record# 2200000 Wavenumber= 744.471000
I-MRGHIT: Record# 2300000 Wavenumber= 746.516100
I-MRGHIT: Record# 2400000 Wavenumber= 748.555000
I-MRGHIT Summary:
No.lines: included excluded
File# 1 133377 2912
File# 2 1936083 0
File# 3 401338 0
Total: 2470798 2912
R-MRGHIT: Successful completion

The program takes a few minutes to complete.

Notes

  1. The first line tells you which version (1.0) of the program is being executed
  2. Then follows the dialogue which asks for
    1. Limits for the spectral range. The user either just types <CR> to obtain the full range contained in the input files, or enters the required lower, upper wavenumbers (700-750 in this case).
    2. The input files, in order of increasing priority in the case of duplicate lines (i.e. all lines in last-named file will be used). The list is terminated with a <CR> response
      • If more than 10 files are required, change the MAXFIL parameter in mrghit.f and recompile.
      • There may be a pause after entering each filename as the program reads sequentially through the data to locate the first record within the required output spectral range
      • If the file contains no overlap with the required spectral range a warning message is printed at this point and the program continues
    3. The name of the binary output file, here newfile.par
  3. The program then prints a series of progress messages after every 100 000 records of input data are read
  4. The program ends with a summary giving, for each input file, the number of records (i.e. spectral lines) included in the output and the number rejected as duplicates. Also the totals, the first number being the number of records in the merged file, the second being the number of records excluded.
  5. The program also generates a file excluded.par which contains all the excluded records (filename is fixed, set by the FILEXC parameter in mrghit.f).

Duplicate Lines

Within mrghit the definition of a duplicate line is that all of the following criteria have to be met
  1. (obviously) Molecule and Isotope ID have to be identical (characters 1:3 in the HITRAN record)
  2. Wavenumber separation (read from characters 4:15) has to be less than 0.0001 cm-1, set by parameter WNORNG in mrghit.f. If you increase this significantly you'll probably get a warning when you run the code similar to:
    W-MRGHIT: MAXBUF too small to check for duplicates over wavenumber range .000100[cm-1]
    in which case you have to also increase MAXBUF (this slows down the program since it increases the number of records which have to be compared).
  3. Vibrational and Rotational quanta (characters 68:127) have to be identical. This relies on the suppliers of the individual datasets being exactly consistent in their notation, including spaces. Alter the LOGICAL FUNCTION LMATCH, included in the mrghit.f source, if you want to amend these criteria.

Code Switches

Near the start of the 'EXECUTABLE CODE' in mrghit.f are three logical switches which the user can alter to change the performance of the code.
DUPCHK
.TRUE. (in provided code): the program will check for, and exclude, duplicate lines
.FALSE. : no checks, all input databases are merged in their entirety. This will also override the other two flags.
DUPSLF (see also Internal Duplicate Checks)
.FALSE. (in provided code): no checks for duplicates within each input dataset (assuming that such screening has already been carried out by the compiler of the database).
.TRUE. check for duplicates in all input files. If two lines within a database are found to match then the code (arbitrarily) selects the first (lower wavenumber) line and rejects the second.
DUPFIL
.TRUE. (in provided code): create an additional file containing all the rejected duplicate lines (name excluded.par, filename set by parameter FILEXC in mrghit.f).

Internal Duplicate Checks

The mrghit can also be used with a single input database file as a means of filtering out duplicate lines. This requires changing the
DUPSLF switch to .TRUE. and recompiling.

You may be wondering (a) why this is set to .FALSE. in the supplied code, and/or (b) what would happen if you ran this test with the HITRAN 2012 database.

The answer to (b) is that it identifies and rejects 43 lines as duplicates, the first of which is for molecule 23 (HCN) where it finds a pair of lines with identical vib/rot levels at 2.971603 and 2.971652 cm-1 (although different line strengths). Now, I have no idea whether this is an error or there really are two such distinct lines, but I'm giving the benefit of the doubt to the HITRAN compilers, which answers (a).

Version History

v1.00 (22AUG13)
Original code.