Self-Improvement

Self-improvement uses feedback from prior runs to improve future runs through reviewed changes to prompts, tools, retrieval, policies, tests, or skills.

Source and downloads

Repository source

Download code bundle

Intent

Self-improvement uses feedback from prior runs to improve future runs through reviewed changes to prompts, tools, retrieval, policies, tests, or skills.

Use When

You have run logs, eval failures, and review processes.
Improvements are applied through versioned artifacts.
Humans or automated gates can approve behavior changes.

Avoid When

The agent silently rewrites its own instructions in production.
No eval suite exists to catch regressions.
The feedback signal is noisy or easy to game.

Architecture

Use this diagram to read Self-Improvement as a system boundary, not only a code shape. The key ownership question is: the loop controller owns progress, budgets, stop conditions, and recovery state.

Self-improvement release loop

Read it as a release process: feedback can propose changes, but evals, approval, staged rollout, monitoring, and rollback decide what becomes a new version.

System Shape

Pattern boundary: a controller repeatedly chooses the next step, executes it, observes the result, and decides whether to continue.
State owner: the loop controller owns progress, budgets, stop conditions, and recovery state.
Primary artifact: reflection-and-self-improvement-pattern/ contains the runnable reference implementation and examples.
Operational promise: Self-improvement uses feedback from prior runs to improve future runs through reviewed changes to prompts, tools, retrieval, policies, tests, or skills.

Core Protocol

Initialize goal state, constraints, budgets, and stop conditions.
Choose the next action from the current state instead of assuming the whole path upfront.
Execute the action through a validated tool, worker, or local function.
Observe the result and update state with evidence, errors, and remaining work.
Stop, retry, re-plan, or escalate according to explicit policy.

Implementation Notes

Keep the pattern boundary explicit: inputs, state, side effects, and outputs should be visible.
Validate model-produced decisions before they affect tools, users, or durable state.
Emit enough trace data to debug failures after the run.

Failure Modes

The pattern is applied where a simpler deterministic workflow would be better.
State, tool calls, or model decisions are not observable enough to debug.
The system lacks clear stop, retry, or escalation behavior.

Evaluation Strategy

Test success cases, partial failure, repeated failure, budget exhaustion, and bad intermediate observations.
Assert that the loop stops for the right reason and does not hide failed steps.
Measure completion rate, number of iterations, recovery quality, cost, and latency.
Include cases that prove each “Use When” condition is true for this pattern.
Include negative cases from “Avoid When” so the system chooses a simpler or safer pattern when appropriate.

Production Checklist

Set hard iteration, cost, and time limits.
Persist state after meaningful steps if the run can be interrupted.
Make retries idempotent or add compensation.
Expose trace events for each decision, action, observation, and stop reason.
Define human escalation for ambiguous, high-risk, or policy-blocked work.
Keep the source bundle, generated chapter, tests, and deployment artifact in the same release.

Code Walkthrough

Read the excerpt as the smallest executable expression of the pattern. The surrounding chapter explains the design constraints; the code shows where those constraints become concrete interfaces, state, validation, or control flow.

Source Code

These excerpts show the implementation shape. The complete code is available in the download bundle and repository source.

`reflection-and-self-improvement-pattern/autogen_typescript_example/reflection_agent.ts`

Open full source

import dotenv from 'dotenv';
dotenv.config();
import axios from 'axios';
import readline from 'readline';

const MISTRAL_API_KEY = process.env.MISTRAL_API_KEY;
const MISTRAL_API_URL = 'https://api.mistral.ai/v1/chat/completions';

if (!MISTRAL_API_KEY) {
  console.error('Please set MISTRAL_API_KEY in your .env file');
  process.exit(1);
}

async function askMistral(messages: any[]) {
  const response = await axios.post(
    MISTRAL_API_URL,
    {
      model: 'mistral-tiny',
      messages,
    },
    {
      headers: {
        'Authorization': `Bearer ${MISTRAL_API_KEY}`,
        'Content-Type': 'application/json',
      },
    }
  );
  return response.data.choices[0].message.content.trim();
}

async function main() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  rl.question('Ask the agent a question: ', async (userInput) => {
    let messages = [
      { role: 'system', content: 'You are a helpful assistant that reflects on your answers and tries to improve them if possible.' },
      { role: 'user', content: userInput },
    ];

    // First response
    let answer = await askMistral(messages);
    console.log('\nInitial Answer:\n', answer);

    // Reflection step
    messages.push({ role: 'assistant', content: answer });
    messages.push({ role: 'system', content: 'Reflect on your previous answer. Was it correct, clear, and complete? If not, revise and improve it.' });
    let reflection = await askMistral(messages);
    console.log('\nReflected/Improved Answer:\n', reflection);

    rl.close();
  });
}

main();

`reflection-and-self-improvement-pattern/langgraph_python_example/reflection_agent.py`