MRGHIT

Source: [Fortran90 (20Mar20)] [Python (07Jan25)]
07JAN25
Contents

Introduction

MRGHIT is a program in both Fortran90 and Python to merge two or more HITRAN files (160 character .par files) into a single file, for example
  1. Incorporating the molecule-specific HITEMP files into HITRAN 2016
  2. Merging HITRAN and GEISA data, assuming GEISA has been converted to HITRAN format, eg using GEIHIT

The tricky part is identifying and avoiding duplicating lines which are common to both datasets but not necessarily at identical wavenumbers.

This program assumes that all data from the second-named file will be used, and any duplicate lines the first-named file removed, without any distinction between molecules. So, for example, you can't use it to replace just the ozone data in HITRAN 2012 with the ozone from HITRAN 2016 (if you wanted to do this you could first run SUBHIT to extract the ozone data from HITRAN 2016, then use MRGHIT to insert it into HITRAN 2012)

Installing MRGHIT

Download the source code: [mrghit.f90] or [mrghit.py]

If using the Fortran version, compile with any generic Fortran compiler, eg

gfortran mrghit.f90 -o mrghit

which will create the executable mrghit

Running MRGHIT

To run the program, type mrghit and respond to the prompts. This is the Fortran version but the Python version is essentially the same.

A typical run might be (user responses, <CR> indicates Return/Enter key)

mrghit R-MRGHIT: Running MRGHIT v2.01 Wavenumber range (cm-1) [<CR>=all]: 700 750 [1] Input files to be merged, <CR>=finished [2] File# 1: HITRAN2016.par File# 2: 01_700-800_HITEMP2010.par [3] File# 3: 02_625-750_HITEMP2010.par File# 4: <CR> Output filename: newfile.par I-MRGHIT: Record# 100000 Wavenumber= 702.022100 I-MRGHIT: Record# 200000 Wavenumber= 704.060800 I-MRGHIT: Record# 300000 Wavenumber= 706.109100 I-MRGHIT: Record# 400000 Wavenumber= 708.133200 ... I-MRGHIT: Record# 2200000 Wavenumber= 744.321500 I-MRGHIT: Record# 2300000 Wavenumber= 746.355500 I-MRGHIT: Record# 2400000 Wavenumber= 748.388600 I-MRGHIT Summary: [4] No.lines: included excluded File# 1 131890 12512 File# 2 1936086 0 File# 3 401338 0 Total: 2469711 12512 STOP R-MRGHIT: Successful completion
The program takes a few minutes to complete.

Notes

  1. The program first asks for the limits for the output spectral range. Just typing <CR> would produce a file spanning the full range contained in the input files

  2. The input filenames are provided in order of increasing priority in the case of duplicate lines (i.e. all lines in last-named file will be used). The list is terminated with a <CR> response.

  3. If an input file contains no overlap with the required spectral range a warning message is printed at this point and the program continues

  4. The program ends with a summary giving, for each input file, the number of records (i.e. spectral lines) included in the output and the number rejected as duplicates. Also the totals, the first number being the number of records in the merged file, the second being the number of records excluded.

  5. The program also generates a file excluded.par which contains all the excluded records (filename is fixed, set by the FILEXC parameter in mrghit.f90).

Duplicate Lines

Within MRGHIT the definition of a duplicate line is that all of the following criteria are met:
  1. (obviously) Molecule and Isotope ID have to be identical (characters 1:3 in the HITRAN record)
  2. Wavenumber separation (read from characters 4:15) has to be less than 0.01 cm-1, set by parameter WNORNG under GLOBAL CONSTANTS near the top of mrghit.f90. If you increase this it slows down the program since it increases the number of records which have to be compared.
  3. Vibrational and Rotational quanta (characters 68:127) have to be identical. This relies on the suppliers of the individual datasets being exactly consistent in their notation, including spaces. Alter the LOGICAL FUNCTION LMATCH, included in the mrghit.f90 source, if you want to amend these criteria.

Code Switches

Under 'GLOBAL CONSTANTS' near the top of mrghit.f90 are three logical switches which the user can alter to change the performance of the code.
DUPCHK
.TRUE. (as provided): the program will check for, and exclude, duplicate lines
.FALSE. : no checks, all input databases are merged in their entirety. This will also override the other two flags.
DUPSLF (see also Internal Duplicate Checks)
.FALSE. (as provided): no checks for duplicates within each input dataset (assuming that such screening has already been carried out by the compiler of the database).
.TRUE. check for duplicates in all input files. If two lines within a database are found to match then the code (arbitrarily) selects the first (lower wavenumber) line and rejects the second.
DUPFIL
.TRUE. (as provided): create an additional file containing all the rejected duplicate lines (name excluded.par, filename set by parameter FILEXC in mrghit.f90).

Internal Duplicate Checks

MRGHIT can also be used with a single input database file as a means of filtering out duplicate lines. This requires changing the
DUPSLF switch to .TRUE. and recompiling.

You may be wondering (a) why this is set to .FALSE. in the supplied code, and/or (b) what would happen if you ran this test with the HITRAN 2012 database.

The answer to (b) is that it identifies and rejects 43 lines as duplicates, the first of which is for molecule 23 (HCN) where it finds a pair of lines with identical vib/rot levels at 2.971603 and 2.971652 cm-1 (although different line strengths). Now, I have no idea whether this is an error or there really are two such distinct lines, but I'm giving the benefit of the doubt to the HITRAN compilers, which answers (a).

Version History

07JAN25
Add Python version
20MAR20
Use ADVANCE='NO' and TYPE structures.
08APR18
Recoded as F90. Also removes limitations on internal array sizes.
22AUG13
Original (F77) code.