Harness engineering - providing the backpreasure your coding agents need

The more I work with coding agents over the last couple of months, I realized that the problem isn’t that Claude Code or other tools moves fast. Even with state-of-the-art instructions like planning, small tasks and iterative development via small commits (which all are common sense in my opinion right now) there was still no deterministic way to built out that harness you need to leverage coding agents in an agentic world.

I see a lot of developers using agents to implement features, thinking about specs and task management but not focusing on taking themselves from the “in the loop” to the “on the loop” seat (more on the loop thing can be read here).

That’s the point. The problem is that it moves fast — edit, commit, PR, edit, commit, PR — and does not care about your manual reviewing speed. So you end up with a string of commits where the first three are green, the fourth breaks the build, and Claude is already on the next one even before you touched the PR to review its code.

This is the classic producer–consumer mismatch. Claude is the producer. You are the consumer. Without backpressure, the producer just keeps going.

When you let an AI agent write and commit code on your behalf, you need a mechanism that stops bad code from entering your repository — not just in CI after the fact, but before the commit even happens. Pushing resistance back upstream, as close to the source as possible. The simplest solution in the past was to instruct your coding agent with none-deterministic request to execute quality gate scripts and keep an eye on the output - never ensured that it was not swallowed by the amount of context injected into the current session (more on context engineering in my upcoming blog post).

To prevent you coding agent from commiting code deterministically that doesn’t fit through the harness you can rely on hooks, supported by multiple coding agents (e.g. claude code, github copilot). To allow for fine-grained control over tool usage, Claude Code has offered the option to preselect commands using an if condition since March 2026. This in combination provides a streamlined, deterministic way to ensure that your agent harness is executed exactly when it needs to and ensure the quality gate always runs.

More and more come up with the term of agent harness in the agentic software engineering world (Mitchell Hashimoto, Birgitta Böckeler). In the following blog post, I’d like to delve a little deeper and use Claude Code as an example to show how to setup a bulletproof agent harness for any software project you have in mind.

The Hook Configuration for Claude Code

Claude Code supports PreToolUse hooks — shell commands that run before Claude executes a specific tool. Wire them up in .claude/settings.json:

{
	"hooks": {
		"PreToolUse": [
			{
				"matcher": "Bash",
				"hooks": [
					{
						"type": "command",
						"command": "npx [email protected]"
					},
					{
						"type": "command",
						"if": "Bash(git commit*)",
						"command": "$CLAUDE_PROJECT_DIR/.claude/hooks/pre-commit-check.sh"
					}
				]
			}
		]
	}
}

Two hooks, two jobs.

block-no-verify runs on every Bash call and intercepts any git commit --no-verify attempt before it lands. Without it, Claude can and will bypass your gate if it decides that’s the path of least resistance — adding --no-verify is a completely valid shell command, and an AI agent under pressure to make progress will find it. One line of config closes that escape hatch permanently.

pre-commit-check.sh uses the if field to pattern-match against the actual command being run. Bash(git commit*) means the gate script only fires when Claude is about to commit — not on every shell invocation. No wasted cycles on pnpm install or grep. The filtering happens before your script even loads. Without if, your hook runs on everything. Fine for a lightweight check, genuinely painful for a 45-second CI suite triggered by ls.

The Gate Script

The script runs individual checks for your project sequentially, collecting all failures before reporting — so your coding agent always sees the full picture, not just the first error that tripped the gate containing formatting, linting, build & test execution. Depending on your project you should select carefully the tests you want to execute. The more tests executed inside your harness, the longer the feedback loop takes to provide the desired backpreasure on your coding agent.

The Exit Code Contract for Claude Code

The key mechanism is exit code 2. Claude Code interprets exit 2 from a PreToolUse hook as a hard block — the tool call is cancelled entirely. Exit 1 is a warning; 2 stops execution. Claude cannot proceed with the commit until all five checks pass.

When 2 fires, Claude reads your stdout, sees the formatted error list, and loops back to fix the problems before trying again. The quality of that output matters. A cryptic message means Claude spends a turn guessing. A clear “Lint failed — 3 errors in src/utils.ts” means it goes straight there. This is why you should also care about the content and format of the feedback provided by the agent harness. Claude is reading it too.

Ensure Execution On New Environments

If dependencies for your harness aren’t installed, the checks would fail with cryptic “package not found” errors rather than useful feedback. Include a guard that checks the installation status of needed dependencies and provide useful feedback for your coding agent.

This matters in fresh environments — new clone, CI setup, Claude Code web sessions — where the agent might attempt a commit before the project is fully initialized.

Why This Actually Matters

Without this gate, an AI agent operates in an optimistic loop: it writes code, commits it, and only discovers problems when CI fails minutes later — at which point it may have already built further work on top of broken code. The backpressure approach inverts this. The agent cannot commit broken code, which forces it to fix issues in the same turn before moving on. It even overtake context injected quality gates due to their non-deterministic execution.

It also protects you as the reviewer. When you look at what your coding agent did, you can trust that whatever landed in git at minimum formats correctly, passes linting, type-checks, builds, and has passing (selected) tests. The commit history becomes a reliable baseline rather than a stream of fix: broken build patches.

Motivation Behind This Post

The moment that sold me on this setup: Claude was three commits into a refactor, everything looked clean in the diffs, and the gate caught a TypeScript error in a file it hadn’t touched — a type that had depended on something quietly removed two commits earlier. Without the gate that would have merged, broken staging, and taken an hour to trace back. With it, Claude fixed the problem in the same session, in under a minute.

Forty-five seconds of backpressure. One hour of debugging avoided.

Example Quality Harness For Claude Code

The following bash script implements an agent harness for an astro web app project.

Check	Tool	What it catches
Format	Prettier	Inconsistent code style
Lint	ESLint + `eslint-plugin-astro` + `@typescript-eslint`	Code quality, TS rule violations
Astro check	`astro check`	TypeScript type errors inside `.astro` files
Build	`astro build`	Any error that breaks the production build
Unit tests	Vitest	Regressions in business logic

Playwright e2e tests are intentionally excluded. They’re slow and environment-dependent — better suited for the push pipeline than a pre-commit gate. Backpressure should slow Claude down, not bring it to a halt.

#!/usr/bin/env bash
# Pre-commit quality gate — runs all CI checks except Playwright (e2e)
# Exit code 2 blocks the commit when Claude Code is the caller.
 
set -euo pipefail
 
cd "$CLAUDE_PROJECT_DIR"
 
if [ ! -d "$CLAUDE_PROJECT_DIR/node_modules" ]; then
  echo -e "\033[0;31m✗\033[0m node_modules not found — run 'pnpm install' first"
  exit 2
fi
 
BOLD='\033[1m'
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RESET='\033[0m'
 
pass()   { echo -e "${GREEN}✓${RESET} $1"; }
fail()   { echo -e "${RED}✗${RESET} $1"; }
header() { echo -e "\n${BOLD}${YELLOW}▶ $1${RESET}"; }
 
ERRORS=()
 
run_check() {
  local name="$1"
  shift
  header "$name"
  if "$@"; then
    pass "$name passed"
  else
    fail "$name failed"
    ERRORS+=("$name")
  fi
}
 
run_check "Format check"  pnpm format:check
run_check "Lint"          pnpm lint
run_check "Astro check"   pnpm astro check
run_check "Build"         pnpm build
run_check "Unit tests"    pnpm exec vitest run
 
echo ""
if [ ${#ERRORS[@]} -gt 0 ]; then
  echo -e "${RED}${BOLD}Commit blocked — the following checks failed:${RESET}"
  for err in "${ERRORS[@]}"; do
    echo -e "  ${RED}• $err${RESET}"
  done
  echo ""
  exit 2
fi
 
echo -e "${GREEN}${BOLD}All checks passed — proceeding with commit.${RESET}"

set -euo pipefail at the top, but run_check absorbs individual failures. This is intentional — an early lint failure shouldn’t hide a broken build. run_check catches non-zero exits and pushes them to ERRORS[] instead of bailing immediately.

$CLAUDE_PROJECT_DIR is set automatically by Claude Code. Don’t hardcode paths. Skip this and things will seem fine until someone runs the hook from a subdirectory and spends twenty minutes debugging a node_modules error that was never about node_modules.