HAR file I/O¶

equilibria.babel.har provides a pure-Python reader and writer for GEMPACK HAR and .prm files. No GEMPACK installation, no harpy3 runtime dependency — the entire round trip stays in Python.

The module supports the four header types that the GTAP and PEP fixtures use:

Token	Meaning	Notes
`1CFULL`	1-D character set	Multi-record element lists handled
`REFULL`	Real dense N-D array	Repeated set names (e.g. `REG × REG`) OK
`RESPSE`	Real sparse N-D array	1-based Fortran flat indices
`2IFULL`	2-D integer dense array	Hard `TypeError` for non-`int32` dtype

Reading a HAR¶

from equilibria.babel.har import read_har

data = read_har("basedata.har")          # dict[name -> HeaderArray]
vdpp = data["VDPP"]
print(vdpp.long_name, vdpp.array.shape, vdpp.set_names)
print(vdpp.set_elements)                 # [[...COMM...], [...REG...]]

Every value is a HeaderArray carrying:

array — the dense NumPy array;
long_name — the GEMPACK long name (≤ 70 chars);
set_names, set_elements — the set descriptor;
coeff_name — the GEMPACK coefficient name (often equal to the header name).

Round-trip mutation (alter-tax tariff shock)¶

A common workflow in GTAP work is to load a baseline HAR, mutate one or more headers, and write the result back for the solver to consume.

from equilibria.babel.har import read_har, write_har

data = read_har("baserate.har")

# 10% uniform bilateral import-tariff shock
data["rTMS"].array[...] *= 1.10

write_har("baserate_shocked.har", data)

write_har writes atomically: it emits to a temporary file in the target directory and os.replaces it into place. On any emission error the destination is left untouched.

Building a HAR from scratch¶

For new datasets, use the HarWriter builder. It registers sets up front, detects conflicts (re-adding a set with different elements raises), and verifies that every array reference points to a known set.

import numpy as np
import pandas as pd
from equilibria.babel.har import HarWriter

with HarWriter("synthetic.har") as w:
    # Sets first
    w.add_set("REG",  ["USA", "ROW"])
    w.add_set("COMM", ["AGR", "MFG"])

    # numpy array
    w.add_array(
        "VDPP",
        np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32),
        set_names=["COMM", "REG"],
        long_name="domestic private purchases",
    )

    # pandas DataFrame — set names taken from index / column names
    df = pd.DataFrame(
        [[100.0, 200.0], [300.0, 400.0]],
        index=pd.Index(["USA", "ROW"], name="REG"),
        columns=pd.Index(["AGR", "MFG"], name="COMM"),
    )
    w.add_dataframe("REGY", df, long_name="regional output")

The context manager flushes to disk on __exit__; if you prefer explicit lifetime, call w.close() yourself.

Validation¶

The writer is exercised by four layers of tests:

L3 — Semantic round-trip. For every fixture, read_har → write_har → read_har returns identical HeaderArray objects (same array values, same shape, same sets). Exact on all six GTAP fixtures shipped with equilibria.
L5 — Oracle goldens. A sandboxed dev workflow runs harpy3 over every fixture and dumps per-header statistics to JSON (tests/babel/har/golden/). CI re-reads those JSONs (without installing harpy3) and asserts the writer output matches.
L7 — Integration. End-to-end tests cover (a) loading writer output with GTAPBenchmarkValues, (b) an alter-tax rTMS × 1.10 round trip, (c) a pandas-DataFrame export, and (d) reading the writer output back with harpy3 when it happens to be installed locally.
L4 — Byte-exact (documented xfails). Comparing SHA-256 of the writer output to the GEMPACK source. See the next section.

Byte-exact vs semantic equality¶

write_har(read_har(p)) does not match p byte-for-byte for the six GTAP reference fixtures. The divergence is structural, not semantic: two pieces of GEMPACK metadata are discarded on read and cannot be reconstructed.

1CFULL per-header element width. GEMPACK pads set elements to a per-header width (12 chars for short codes, 44+ for version strings). read_har strips trailing spaces, so the original width is gone by the time the writer sees the data.
Sparse vs dense storage. GEMPACK picks RESPSE vs REFULL via internal heuristics; read_har densifies on load, so the sparse-vs-dense signal isn’t available.

These divergences are documented as xfails in tests/babel/har/test_byte_exact.py with sibling .diff sidecars listing the exact offsets. They have no functional impact — any conforming HAR reader (ours, harpy3, RunGTAP, ViewHAR) decodes the writer output identically to the original.

Clean-room provenance¶

Both the reader and the writer are clean-room reimplementations. They were developed by inspecting the on-disk byte layout of HAR files produced by GEMPACK / RunGTAP and by reading public format documentation. No code was viewed, copied, translated, or otherwise derived from harpy3 / harpy (GPLv3) or from GEMPACK source.

harpy3 is used only as a black-box oracle in a sandboxed dev harness (scripts/har/oracle_check.py). It is never imported by any equilibria source and never runs in CI.

See the top-level NOTICE file for the full clean-room statement.