`harness`¶

Base¶

class milp_flare.harness.base.Harness(model, effort='medium')[source]¶

Base class for FLARE agent harness.

FLARE uses an agent harness to auto-formalize MILP formulations in Lean and do automated formal proof synthesis (AFPS) of reformulation certificates. This is the base class for a FLARE agent harness. It has methods to configure the agent working directory, run the agent in a Docker container, and return a configuration dictionary.

Parameters:

modelstr: Model identifier (e.g., "claude-opus-4-7", "gpt-5.5"). See harness subclasses for supported models.
effortstr, default "medium": Reasoning effort level ("low", "medium", "high"). See harness subclasses for supported effort levels.

Attributes:

namestr: Name of the agent harness (e.g., "claude_code").
modelstr: Model identifier this harness is configured to use.
effortstr: Reasoning effort level this harness is configured to use.

configure_wd(wd)[source]¶

Configure the agent working directory with necessary files for the harness.

The agent working directory needs the following:

An agent command script (agent.sh) that is called by the Docker container entrypoint to launch the agent. The script typically calls an agent CLI in non-interactive mode.
Any configuration files necessary to enable the lean-lsp-mcp MCP server.
Custom Agent Skills for auto-formalization of MILP formulations and automated formal proof synthesis for reformulation (see Agent Skills).

Parameters:

wdpathlib.Path: The agent working directory (on the host). Must already exist.

Raises:

RuntimeError: If wd does not exist.

get_config_dict()[source]¶

Return a dictionary with the harness configuration.

Returns:

configdict[str, Any]: Harness name, Docker image, model, and effort.

run(wd)[source]¶

Run the agent in a Docker container.

Spawns a Docker container with the image specified by IMAGE and bind-mounts wd into the container. The container entrypoint calls the agent command script which launches the agent. Agent output is written to wd/agent_output.jsonl.

The Docker image is expected to be built (see Installation).

Parameters:

wdpathlib.Path: The agent working directory to bind-mount at /workspace/wd in the container.

Returns:

resultHarnessRunResult: Duration, cost, token counts, and stop reason.

class milp_flare.harness.base.HarnessRunResult(duration_s, cost_usd, input_tokens, output_tokens, stop_reason)[source]¶

Result of a harness run returned by Harness.run().

Attributes:

duration_sfloat: Wall-clock duration of the agent run, in seconds.
cost_usdfloat, optional: Estimated USD cost of the agent run, or None if cost information is not available (e.g., model is missing from COST_PER_MTOK).
input_tokensint: Total input (prompt) tokens reported by the agent stream. Used by COST_PER_MTOK to estimate cost_usd when not directly reported by the agent.
output_tokensint: Total output (completion) tokens reported by the agent stream. Used by COST_PER_MTOK to estimate cost_usd when not directly reported by the agent.
stop_reasonstr, optional: Final stop reason reported by the agent (e.g., "end_turn", "max_tokens"). None if not reported.

milp_flare.harness.base.IMAGE = 'flare-agent:latest'¶: The name of the Docker image containing the agent environment. This image is expected to be built prior to running FLARE. See Installation.

Agent Harnesses¶

class milp_flare.harness.claude_code.ClaudeCodeHarness(model, effort='medium')[source]¶

Bases: Harness

Claude Code agent harness for FLARE.

Use the Claude Code CLI as an agent harness. Authentication is provided by a long-lived OAuth token (CLAUDE_CODE_OAUTH_TOKEN) generated via claude setup-token and is billed against a Claude subscription. See Claude Code for setup instructions.

Warning

Starting June 15, 2026, Claude Agent SDK and claude -p usage no longer counts towards the Claude plan’s usage limits. It will instead charge a separate Agent SDK monthly credit. See this article.

Parameters:

modelstr: Claude model identifier. Only supports models that are supported by the Claude Code CLI (e.g., "claude-opus-4-7", "claude-sonnet-4-6"). See Claude Code Docs for up-to-date model information.
effortstr, default "medium": Reasoning effort level ("low", "medium", "high", "xhigh", "max"). See Claude Code Docs for supported effort levels for each model.

Attributes:

namestr: Name of the agent harness: "claude_code".
modelstr: Model identifier this harness is configured to use.
effortstr: Reasoning effort level this harness is configured to use.

Examples

Configure Claude Code agent harness with Claude Opus 4.7 and high effort:

>>> from milp_flare import FLARE
>>> from milp_flare.harness import ClaudeCodeHarness
>>> harness = ClaudeCodeHarness(model="claude-opus-4-7", effort="high")
>>> print(json.dumps(harness.get_config_dict(), indent=2))
{
  "harness": "claude_code",
  "image": "flare-agent:latest",
  "model": "claude-opus-4-7",
  "effort": "high"
}

class milp_flare.harness.codex.CodexHarness(model, effort='medium')[source]¶

Bases: Harness

Codex agent harness for FLARE.

Use the Codex CLI as an agent harness. Authentication is provided by running codex login on the host to create a ~/.codex directory with the necessary credentials; this directory is bind-mounted read-write into the Docker container. See Codex for setup instructions.

Parameters:

modelstr: OpenAI model identifier. Only supports models that are supported by the Codex CLI (e.g., "gpt-5.4", "gpt-5.5"). See Codex Docs for up-to-date model information.
effortstr, default "medium": Reasoning effort level ("none", "low", "medium", "high", "xhigh"). See Codex Docs for supported effort levels.

Attributes:

namestr: Name of the agent harness: "codex".
modelstr: Model identifier this harness is configured to use.
effortstr: Reasoning effort level this harness is configured to use.

Examples

Configure Codex agent harness with GPT-5.4 and high effort:

>>> from milp_flare import FLARE
>>> from milp_flare.harness import CodexHarness
>>> harness = CodexHarness(model="gpt-5.4", effort="high")
>>> print(json.dumps(harness.get_config_dict(), indent=2))
{
  "harness": "codex",
  "image": "flare-agent:latest",
  "model": "gpt-5.4",
  "effort": "high"
}

class milp_flare.harness.opencode.OpenCodeHarness(model, effort='medium', provider=None)[source]¶

Bases: Harness

OpenCode agent harness for FLARE.

Use the OpenCode CLI as an agent harness. Authentication is provided by API key. The following API keys are automatically forwarded from the host into the container if they are set:

ANTHROPIC_API_KEY for Anthropic models
OPENAI_API_KEY for OpenAI models
GOOGLE_API_KEY for Google models
DEEPSEEK_API_KEY for DeepSeek models

See OpenCode for setup instructions.

Parameters:

modelstr: Model identifier passed to the underlying CLI. See OpenCode Docs for supported providers and models.
effortstr, default "medium": Reasoning effort level ("low", "medium", "high"). Supported reasoning effort levels vary by provider and model.
providerstr, optional: The OpenCode model provider to use. By default, the provider is inferred from the model name. See OpenCode Docs for supported providers.

Attributes:

namestr: Name of the agent harness: "opencode".
modelstr: Model identifier this harness is configured to use.
effortstr: Reasoning effort level this harness is configured to use.
providerstr: Resolved provider name this harness is configured to use.

Examples

Configure OpenCode agent harness with DeepSeek V4 Pro and high effort:

>>> from milp_flare import FLARE
>>> from milp_flare.harness import OpenCodeHarness
>>> harness = OpenCodeHarness(model="deepseek-v4-pro", effort="high")
>>> print(json.dumps(harness.get_config_dict(), indent=2))
{
  "harness": "opencode",
  "image": "flare-agent:latest",
  "model": "deepseek-v4-pro",
  "effort": "high",
  "provider": "deepseek"
}

Pricing¶

milp_flare.harness.cost.COST_PER_MTOK = {'claude-haiku-4-5': (1, 5.0), 'claude-opus-4-6': (5.0, 25.0), 'claude-opus-4-7': (5.0, 25.0), 'claude-sonnet-4-5': (3.0, 15.0), 'claude-sonnet-4-6': (3.0, 15.0), 'deepseek-v4-flash': (0.14, 0.28), 'deepseek-v4-pro': (1.74, 3.48), 'gpt-4.1': (2.0, 8.0), 'gpt-4o': (2.5, 10.0), 'gpt-4o-mini': (0.15, 0.6), 'gpt-5.4': (2.5, 15.0), 'gpt-5.4-mini': (0.75, 4.5), 'gpt-5.4-nano': (0.2, 1.25), 'gpt-5.5': (5.0, 30.0)}¶

Cost per million tokens (input, output). Used to compute cost_usd when the agent harness does not report cost directly. This is a fallback and is likely to become outdated quickly. See pricing documentation for the most up-to-date pricing information:

harness¶

Base¶

Agent Harnesses¶

Pricing¶

`harness`¶