HAR file I/O¶
equilibria.babel.har provides a pure-Python reader and writer for
GEMPACK HAR and .prm files. No GEMPACK installation, no
harpy3 runtime dependency — the entire round trip stays in Python.
The module supports the four header types that the GTAP and PEP fixtures use:
Token |
Meaning |
Notes |
|---|---|---|
|
1-D character set |
Multi-record element lists handled |
|
Real dense N-D array |
Repeated set names (e.g. |
|
Real sparse N-D array |
1-based Fortran flat indices |
|
2-D integer dense array |
Hard |
Reading a HAR¶
from equilibria.babel.har import read_har
data = read_har("basedata.har") # dict[name -> HeaderArray]
vdpp = data["VDPP"]
print(vdpp.long_name, vdpp.array.shape, vdpp.set_names)
print(vdpp.set_elements) # [[...COMM...], [...REG...]]
Every value is a HeaderArray carrying:
array— the dense NumPy array;long_name— the GEMPACK long name (≤ 70 chars);set_names,set_elements— the set descriptor;coeff_name— the GEMPACK coefficient name (often equal to the header name).
Round-trip mutation (alter-tax tariff shock)¶
A common workflow in GTAP work is to load a baseline HAR, mutate one or more headers, and write the result back for the solver to consume.
from equilibria.babel.har import read_har, write_har
data = read_har("baserate.har")
# 10% uniform bilateral import-tariff shock
data["rTMS"].array[...] *= 1.10
write_har("baserate_shocked.har", data)
write_har writes atomically: it emits to a temporary file in the
target directory and os.replaces it into place. On any emission error
the destination is left untouched.
Building a HAR from scratch¶
For new datasets, use the HarWriter builder. It registers sets up
front, detects conflicts (re-adding a set with different elements
raises), and verifies that every array reference points to a known set.
import numpy as np
import pandas as pd
from equilibria.babel.har import HarWriter
with HarWriter("synthetic.har") as w:
# Sets first
w.add_set("REG", ["USA", "ROW"])
w.add_set("COMM", ["AGR", "MFG"])
# numpy array
w.add_array(
"VDPP",
np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32),
set_names=["COMM", "REG"],
long_name="domestic private purchases",
)
# pandas DataFrame — set names taken from index / column names
df = pd.DataFrame(
[[100.0, 200.0], [300.0, 400.0]],
index=pd.Index(["USA", "ROW"], name="REG"),
columns=pd.Index(["AGR", "MFG"], name="COMM"),
)
w.add_dataframe("REGY", df, long_name="regional output")
The context manager flushes to disk on __exit__; if you prefer
explicit lifetime, call w.close() yourself.
Validation¶
The writer is exercised by four layers of tests:
L3 — Semantic round-trip. For every fixture,
read_har→write_har→read_harreturns identicalHeaderArrayobjects (same array values, same shape, same sets). Exact on all six GTAP fixtures shipped withequilibria.L5 — Oracle goldens. A sandboxed dev workflow runs
harpy3over every fixture and dumps per-header statistics to JSON (tests/babel/har/golden/). CI re-reads those JSONs (without installingharpy3) and asserts the writer output matches.L7 — Integration. End-to-end tests cover (a) loading writer output with
GTAPBenchmarkValues, (b) an alter-taxrTMS × 1.10round trip, (c) a pandas-DataFrame export, and (d) reading the writer output back withharpy3when it happens to be installed locally.L4 — Byte-exact (documented xfails). Comparing SHA-256 of the writer output to the GEMPACK source. See the next section.
Byte-exact vs semantic equality¶
write_har(read_har(p)) does not match p byte-for-byte for the
six GTAP reference fixtures. The divergence is structural, not
semantic: two pieces of GEMPACK metadata are discarded on read and
cannot be reconstructed.
1CFULL per-header element width. GEMPACK pads set elements to a per-header width (12 chars for short codes, 44+ for version strings).
read_harstrips trailing spaces, so the original width is gone by the time the writer sees the data.Sparse vs dense storage. GEMPACK picks
RESPSEvsREFULLvia internal heuristics;read_hardensifies on load, so the sparse-vs-dense signal isn’t available.
These divergences are documented as xfails in
tests/babel/har/test_byte_exact.py with sibling .diff sidecars
listing the exact offsets. They have no functional impact — any
conforming HAR reader (ours, harpy3, RunGTAP, ViewHAR) decodes the
writer output identically to the original.
Clean-room provenance¶
Both the reader and the writer are clean-room reimplementations.
They were developed by inspecting the on-disk byte layout of HAR files
produced by GEMPACK / RunGTAP and by reading public format
documentation. No code was viewed, copied, translated, or otherwise
derived from harpy3 / harpy (GPLv3) or from GEMPACK source.
harpy3 is used only as a black-box oracle in a sandboxed dev
harness (scripts/har/oracle_check.py). It is never imported by any
equilibria source and never runs in CI.
See the top-level NOTICE file for the full clean-room statement.