FLARE

class milp_flare.flare.FLARE(harness)[source]

Agentic MILP reformulation verifier.

FLARE (Formulation-Level Automated Reformulation Evaluation) uses an LLM-based agent and the Lean proof assistant to verify MILP reformulations according to the FormulationBench definition of reformulation. See the FLARE Paper for more details.

Parameters:
harnessHarness

The agent harness to use for auto-formalization and proof synthesis. See Agent Harnesses for the available harnesses.

Attributes:
harnessHarness

The configured agent harness.

Examples

Use FLARE to verify if formulation b of problem p1 from FormulationBench is a reformulation of formulation a (also see Running FLARE on FormulationBench):

from pathlib import Path
from formulation_bench import Dataset
from milp_flare import FLARE, FormulationInput
from milp_flare.harness import ClaudeCodeHarness

ds = Dataset.load()
a = ds.problems[1].formulations["a"]
b = ds.problems[1].formulations["b"]

harness = ClaudeCodeHarness(model="claude-opus-4-7")
flare = FLARE(harness=harness)
result = flare.verify(
    FormulationInput(a.render_markdown(), a.gen_solve_py()),
    FormulationInput(b.render_markdown(), b.gen_solve_py()),
    output_path=Path("runs/p1_a_b"),
)
get_config_dict()[source]

Return a dictionary with the FLARE configuration.

Returns:
configdict[str, Any]

Harness, image, model, and reasoning configuration. Forwarded directly from Harness.get_config_dict().

verify(a, b, output_path)[source]

Run FLARE on a pair of MILP formulations.

Run FLARE to determine if formulation b is a reformulation of formulation a according to the FormulationBench definition of reformulation. This method creates the agent working directory (see below), triggers the agent, and evaluates the agent output. Finally, it populates output_path with:

  • The agent working directory (wd/)

  • The FLARE configuration (config.json)

  • The result dictionary (result.json)

The agent working directory contains descriptions of each formulation, the agent prompt prompt.txt (see Prompts), and the necessary Lake environment files.

wd/
├── A/
│   ├── formulation.md
│   ├── solve.py
│   └── Formulation.lean   # written by agent
├── B/
│   ├── formulation.md
│   ├── solve.py
│   └── Formulation.lean   # written by agent
├── Reformulation.lean     # written by agent
├── prompt.txt
├── Common.lean
├── lake-manifest.json
├── lakefile.toml
└── lean-toolchain
Parameters:
aFormulationInput

Inputs for formulation A.

bFormulationInput

Inputs for formulation B (the candidate reformulation of A).

output_pathpathlib.Path

Directory to populate with run artifacts.

Returns:
resultFLAREResult

The verdict, duration, cost, and per-check metadata.

Examples

Run FLARE on formulations a and b and inspect the result:

flare = FLARE(harness=harness)
result = flare.verify(a, b, output_path=Path("runs/a_b"))
result.is_reformulation
True
result.metadata["form_a_written"]
True
result.cost_usd
1.49
result.duration_s
322

See Running FLARE on FormulationBench for more example usage.

class milp_flare.flare.FormulationInput(formulation_md, solve_py)[source]

Formulation input for the FLARE agent.

Attributes:
formulation_mdstr

Markdown description of the formulation. Typically produced by Formulation.render_markdown() from FormulationBench.

solve_pystr

Python solver script with Gurobi implementation of the formulation. FLARE does not execute this script; it is only used as an additional reference for the agent. Typically produced by Formulation.gen_solve_py() from FormulationBench.

Examples

Construct a FormulationInput from formulation p1.a from FormulationBench:

>>> from formulation_bench import Dataset
>>> from milp_flare import FormulationInput
>>> ds = Dataset.load()
>>> a = ds.problems[1].formulations["a"]
>>> inp = FormulationInput(a.render_markdown(), a.gen_solve_py())

>>> print(inp.formulation_md)
# Amusement Park Ticket Machines

## Problem Description

An amusement park is installing cash-based machines and card-only machines...

>>> print(inp.solve_py)
import json
from gurobipy import Model, GRB
import argparse


def main(params_path: str, solution_path: str) -> None:
...
class milp_flare.flare.FLAREResult(is_reformulation, duration_s=None, cost_usd=None, metadata=<factory>)[source]

Result from FLARE.

Attributes:
is_reformulationbool

Final verdict. True if all sub-checks pass: both Formulation.lean files compile, Reformulation.lean compiles and contains a def : MILPReformulation, and the proof is sorry-free using only the STANDARD_AXIOMS.

duration_sfloat, optional

Wall-clock duration of the agent run, in seconds.

cost_usdfloat, optional

Estimated USD cost of the agent run. None when the harness does not report cost.

metadatadict[str, Any]

Per-check breakdown and harness run metadata: form_a_written, form_a_compiled, form_b_written, form_b_compiled, proof_compiled, milp_reform_found, sorry_free, no_new_axioms, axioms, agent_decision, and token counts.

milp_flare.flare.STANDARD_AXIOMS = frozenset({'Classical.choice', 'Quot.sound', 'propext'})

Axioms permitted in a verified reformulation proof. All of Mathlib is built on these three; a proof depending on anything else is not trusted by FLARE.