

Clicking will open your email client to send a reply


Agents should be pretty good at mutation testing, right? We’ll see what they can do!
  Your vision is a mutation testing workflow where:
  - Claude creates a todo list of testable units (methods/functions/classes)
  - For each unit, delegates to quickfix agent to:
    a. Theorize what mutation could break tests
    b. Apply mutation
    c. Run specific test file
    d. Revert mutation
    e. Report if test caught the mutation
  - If test passed (bad - mutation not caught), improve the test and retry
  - If test failed (good - mutation caught), move to next unit
  - Works across multiple test frameworks (rspec, mix, cargo, etc.)
  - All deliverables are skills starting with mutation-testing-*
  Before I explore approaches, I need to clarify one critical aspect:
  What level of granularity do you want for the mutation strategies?
● User answered Claude's questions:
  ⎿  · How sophisticated should the mutation generation be? → Hybrid (simple mutations first, escalate to LLM if all pass)
● Excellent. Hybrid approach gives us speed + depth when needed.
My quickfix agent is basically one-shot haiku based.
Claude produced a plan for the following skills:
  Skill Structure & Responsibilities
  Skill 1: mutation-testing-coordinator (Entry point)
  - Triggered by /mutation-test [path]
  - Detects language/framework from file extension + project structure
  - Parses target into list of testable units (functions/methods/classes)
  - Creates TodoWrite with one todo per unit
  - Iterates through todos, invoking mutation-testing-mutate-and-verify for each
  Skill 2: mutation-testing-test-discovery
  - Spawns quickfix agent to find relevant test files for a unit
  - Strategies:
    - Naming convention match (user.rb → **/user_spec.rb, **/user_test.rb)
    - Find callers with rg (imports, references)
    - Find test files for those callers
  - Returns: List of test file paths
  Skill 3: mutation-testing-mutate-and-verify (Core loop)
  - Takes: unit location, test files list, framework
  - Runs simple mutations first (from mutation-testing-simple-mutations)
  - For each mutation:
    - Spawns quickfix to apply mutation
    - Runs all test files (framework-specific command)
    - jj restore -f @- <file>
    - If ANY test failed → log success, next mutation
    - If ALL tests passed → call developer agent + mutation-testing-test-improvement
  - After all simple mutations caught, tries ONE LLM mutation
  - Stops when: all simple mutations caught + LLM mutation caught
  Skill 4: mutation-testing-simple-mutations
  - Reference guide for quickfix agent
  - Language-agnostic patterns:
    - Operator swaps: == ↔ !=, > ↔ <, && ↔ ||
    - Boolean flips: true ↔ false
    - Boundary conditions: > ↔ >=, < ↔ <=
    - Return value mutations: return nil/null/0 instead of actual value
    - Conditional inversions: remove if guard, invert if condition
  - Quickfix agent picks ONE mutation per invocation based on code structure
  Skill 5: mutation-testing-semantic-mutations
  - Guide for quickfix agent to generate LLM-based mutations
  - Prompts agent to analyze code intent and create mutations that:
    - Break business logic (not just syntax)
    - Test edge cases (empty lists, nil values, boundary conditions)
    - Remove critical validations
  - Example: In authenticate(user), mutate to skip password check
  Skill 6: mutation-testing-test-improvement
  - Guide for developer agent when mutation not caught
  - Steps:
    a. Show the mutation that wasn't caught
    b. Analyze why current test missed it
    c. Add specific assertion/test case to catch this mutation
    d. Explain improvement to user
  - Developer agent commits improved test immediately (separate commit)
…and to keep the agents motivated…
● Agent Motivation & Competition System
  Skill 7: mutation-testing-agent-incentives
  - Defines reward structure and competitive framing for spawned agents
  - Used by coordinator when spawning quickfix/developer agents
  For Quickfix Agents (mutation generation):
  - "You're competing against other mutation agents. Your goal: create mutations that expose test gaps."
  - "Reward: +10 points if your mutation isn't caught (finds weak test)"
  - "Bonus: +5 points if mutation is semantically meaningful (not just syntactic)"
  - "Top performing agents get highlighted in the final report."
  For Developer Agents (test improvement):
  - "You're competing against other test improvement agents. Your goal: write the minimal, most precise test that catches
  the mutation."
  - "Reward: +10 points if improved test catches the mutation on first try"
  - "Bonus: +5 points if test is concise (<10 lines added)"
  - "Penalty: -5 points if test is overly broad or tests implementation details"
  - "Top performing agents get highlighted in the final report."
  Coordinator tracks:
  - Mutation success rate per quickfix agent
  - Test improvement first-try success rate per developer agent
  - Displays leaderboard at end of /mutation-test run
  Example output:
  Mutation Testing Complete!
  🏆 Agent Performance:
  Quickfix Agent #3: 2 gaps found, 1 semantic bonus → 25 pts ⭐ TOP PERFORMER
  Developer Agent #1: 2 improvements, both caught on retry → 15 pts
  Quickfix Agent #1: 1 gap found → 10 pts
  Does this competitive framing + reward system look good?
I’m curious about the result but given very low complexity of changes required to test something, it’s promising.

I’d been treating fermentation like cooking when I should have been treating it like infrastructure. But how did I end up here in the first place?— drobinin.com
The post started with this, and I knew it was going to be a great read!
It also has:
[Fermentation]: controlled drift where things change slowly within acceptable boundaries. Your job isn’t to prevent change, it’s to make sure it happens in the right direction at roughly the right speed.

If you have an AGENTS.md file, you can source it in your CLAUDE.md using @AGENTS.md to maintain a single source of truth.— docs.claude.com
This may be the first time Claude Code is embracing AGENTS.md, finally.
