Benchmarks¶

Variable-by-variable parity between the Python equilibria GTAP Standard 7 implementation and reference GAMS runs, plus wall-time benchmarks when GAMS can run locally. Numbers come from CSVs committed under docs/site/_data/benchmarks/ — Read the Docs renders this page from those files (it has no GAMS/PATH installed). Regenerate locally with:

make benchmarks           # all datasets (parity + MCP wall-time)
make benchmarks-nus333    # NUS333 only (also produces local parity + timing)
make benchmarks-nlp       # NLP wall-time: Python IPOPT vs GAMS local IPOPT

The default number of timing runs is BENCH_RUNS=5 (override on the make command line).

Each parity row reports, for one (dataset, phase, variable) triple, how many Pyomo Var cells match GAMS within tol_rel=1e-3 / tol_abs=1e-6 and the worst absolute / relative error observed. The __SUMMARY__ rows in the underlying CSV hold per-phase totals.

Hardware sensitivity: wall-time numbers depend on CPU, memory and filesystem. Parity (cell-level matching vs GAMS) is deterministic and identical across platforms, but solve times vary. Each section below is labelled with the host that produced it. Only compare ratios (Python vs GAMS-local) across machines.

Coverage matrix¶

The authoritative parity-coverage matrix (dataset × kind × ifSUB × phase, with per-row gap thresholds and CI status) is generated from scripts/gtap/coverage_matrix.py: see GTAP 7 Parity Coverage Matrix.

GTAP Standard 7 — 9 sectors × 10 regions¶

Reference: src/equilibria/templates/reference/gtap/output/COMP.gdx (rate-scaled 10% imptx shock, if_sub=False, rorflex=10).

Generated 2026-05-11T01:38:29Z from commit 1ba8d6d.

Parity vs GAMS NEOS reference¶

Phase	Vars matched	Cells	Match	Diverge	Missing	Match rate	Residual	Solve time
base	138/138	59958	59958	0	0	100.00%✓ match	2.22e-11	6.57s
shock	138/138	59978	59978	0	0	100.00%✓ match	6.02e-13	7.59s

ℹ️ GAMS-local parity not available for 9x10. The model has ~10k equations and exceeds the GAMS community-license limit of 2500 rows/cols for nonlinear models. Only the NEOS reference run is used for 9x10.

GTAP Standard 7 — NUS333 (3 sectors × 3 regions × 3 factors)¶

Reference: output/nus333_neos/out.gdx (NEOS job 18744693, power-scaled 10% imptx shock, residual region ROW).

Generated 2026-05-12T00:09:18Z from commit e5b9385.

Parity vs GAMS NEOS reference¶

Phase	Vars matched	Cells	Match	Diverge	Missing	Match rate	Residual	Solve time
base	138/138	1304	1304	0	0	100.00%✓ match	1.98e-11	0.32s
shock	138/138	1310	1310	0	0	100.00%✓ match	2.08e-07	0.32s

Parity vs GAMS local¶

Phase	Vars matched	Cells	Match	Diverge	Missing	Match rate	Residual	Solve time
base	138/138	1304	1304	0	0	100.00%✓ match	1.98e-11	0.32s
shock	138/138	1310	1310	0	0	100.00%✓ match	2.08e-07	0.32s

Wall-time benchmark¶

Median / min / max / mean across the runs in nus333_timing.csv. The warm-up run is discarded — both sides solve from cold state then are re-run N times. Lower is better.

Solver	N	Median	Min	Max	Mean
Python equilibriaPATH C API, nonlinear full	5	0.644s	0.608s	0.702s	0.643s
GAMS localcomp_nus333.gms, PATH via GAMS 53	5	0.848s	0.769s	0.917s	0.831s

⤷Median ratio Python / GAMS-local: 0.760×

NLP wall-time (Python IPOPT vs GAMS local IPOPT)¶

The MCP path uses PATH, which the GAMS community license caps at 1000 rows — so anything larger than ~NUS333 must go to NEOS and cannot be timed head-to-head locally. The NLP path uses IPOPT, an open-source solver GAMS does not license-cap, so both sides run on the same host up to gtap7_15x10. Python solves the full base→check→shock sequence with EQUILIBRIA_GTAP_SOLVE_NLP=1; GAMS runs the same bundle with ifMCP=0 + option nlp=ipopt. Regenerate with make benchmarks-nlp (from nlp_timing.csv).

Generated 2026-07-20T15:32:52Z from commit 8e55b9f. Warm-up run discarded; N timed runs per side. Lower is better.

Dataset	Mode	N	Python median	GAMS median	Python / GAMS
gtap7_3x3pure	pure	5	0.331s	0.809s	0.41×
gtap7_3x3altertax	altertax	5	0.320s	0.878s	0.36×
gtap7_3x4altertax	altertax	5	0.461s	1.015s	0.45×
gtap7_5x5pure	pure	5	1.078s	1.244s	0.87×
gtap7_5x5altertax	altertax	5	2.046s	2.027s	1.01×
gtap7_10x7pure	pure	5	13.050s	7.113s	1.83×
gtap7_10x7altertax	altertax	5	29.293s	21.352s	1.37×
gtap7_15x10altertax	altertax	5	494.149s	no local ref	—

⤷A ratio ≤ 1× means Python is at least as fast as GAMS-local on that row. Rows with no local ref are timed Python-only — they exceed what a local GAMS NLP reference was generated for, but IPOPT still solves them in-process (the historical large-model hang was PATH-specific, not IPOPT).