# Running FLARE on FormulationBench FLARE consumes pairs of MILP formulations — a markdown description and a runnable Gurobi script per formulation — and decides whether one is a constructive reformulation of the other. The [FormulationBench](https://formulation-bench.henryrobbins.com/) dataset ships both artifacts for every formulation, so it is the easiest way to drive FLARE end-to-end. ## Prerequisites - `formulation-bench` and `milp-flare` installed in the same environment. - Docker running and the `flare-agent:latest` image built (see {doc}`docker`). - A harness credential available on the host (see {doc}`../agent_harness/index`). ## Verifying a dataset pair ```python from pathlib import Path from formulation_bench import Dataset from milp_flare import FLARE, FormulationInput from milp_flare.harness import ClaudeCodeHarness ds = Dataset.load() p1 = ds.problem("p1") a = p1.formulation("a") b = p1.formulation("b") harness = ClaudeCodeHarness(model="claude-opus-4-7", effort="medium") flare = FLARE(harness=harness) a_in = FormulationInput( formulation_md=a.render_markdown(), solve_py=a.gen_solve_py() ) b_in = FormulationInput( formulation_md=b.render_markdown(), solve_py=b.gen_solve_py() ) result = flare.verify(a_in, b_in, output_path=Path("runs/p1_a_b")) print("is_reformulation:", result.is_reformulation) print("duration_s:", result.duration_s) print("cost_usd:", result.cost_usd) ``` `FormulationInput` carries the two artifacts the agent needs. `Formulation.render_markdown()` and `Formulation.gen_solve_py()` produce them directly from the dataset — see the {fb}`FormulationBench API reference `. ## Lean definitions FLARE writes Lean files against the same `MILPFormulation` / `MILPReformulation` structures used by FormulationBench. They are documented in the FormulationBench docs: - {fb}`MILPFormulation ` - {fb}`MILPReformulation ` A copy of `Common.lean` (and a minimal Lake skeleton) is bundled with `milp_flare` and copied into the agent working directory at runtime. ## Inspecting the run artifacts `output_path` is populated with everything FLARE produced: ``` runs/p1_a_b/ ├── config.json # Harness + model configuration ├── result.json # Final verdict, token usage, cost └── wd/ # Agent working directory (bind-mounted into the container) ├── A/ │ ├── formulation.md # Input written by FLARE │ ├── solve.py # Input written by FLARE │ └── Formulation.lean # Output written by the agent ├── B/ │ ├── formulation.md │ ├── solve.py │ └── Formulation.lean ├── Reformulation.lean # The proof produced by the agent ├── agent_output.jsonl # Stream of agent turns (tail -f live) └── compile_log.txt # Output of the post-hoc Lean compile ``` `result.json` records the individual sub-checks behind the verdict: whether each `Formulation.lean` was written and compiled, whether `Reformulation.lean` contains a `def : MILPReformulation`, whether it compiled, and whether it is `sorry`-free. ## Using FLARE on a non-dataset pair `FormulationInput` does not depend on `formulation_bench` — build the two inputs yourself from any markdown + `solve.py` pair and pass them to `FLARE.verify`. The markdown should follow the formulation template documented in {fb}`/en/latest/schema.html`.