@arathunku -

Michal @arathunku.com · October 30, 2025 at 4:58 AM

Agents should be pretty good at mutation testing, right? We’ll see what they can do!

Claude code plan output

  Your vision is a mutation testing workflow where:
  - Claude creates a todo list of testable units (methods/functions/classes)
  - For each unit, delegates to quickfix agent to:
    a. Theorize what mutation could break tests
    b. Apply mutation
    c. Run specific test file
    d. Revert mutation
    e. Report if test caught the mutation
  - If test passed (bad - mutation not caught), improve the test and retry
  - If test failed (good - mutation caught), move to next unit
  - Works across multiple test frameworks (rspec, mix, cargo, etc.)
  - All deliverables are skills starting with mutation-testing-*

  Before I explore approaches, I need to clarify one critical aspect:

  What level of granularity do you want for the mutation strategies?
● User answered Claude's questions:
  ⎿  · How sophisticated should the mutation generation be? → Hybrid (simple mutations first, escalate to LLM if all pass)
● Excellent. Hybrid approach gives us speed + depth when needed.

My quickfix agent is basically one-shot haiku based.

Claude produced a plan for the following skills:

Claude code plan output

  Skill Structure & Responsibilities

  Skill 1: mutation-testing-coordinator (Entry point)
  - Triggered by /mutation-test [path]
  - Detects language/framework from file extension + project structure
  - Parses target into list of testable units (functions/methods/classes)
  - Creates TodoWrite with one todo per unit
  - Iterates through todos, invoking mutation-testing-mutate-and-verify for each

  Skill 2: mutation-testing-test-discovery
  - Spawns quickfix agent to find relevant test files for a unit
  - Strategies:
    - Naming convention match (user.rb → **/user_spec.rb, **/user_test.rb)
    - Find callers with rg (imports, references)
    - Find test files for those callers
  - Returns: List of test file paths

  Skill 3: mutation-testing-mutate-and-verify (Core loop)
  - Takes: unit location, test files list, framework
  - Runs simple mutations first (from mutation-testing-simple-mutations)
  - For each mutation:
    - Spawns quickfix to apply mutation
    - Runs all test files (framework-specific command)
    - jj restore -f @- <file>
    - If ANY test failed → log success, next mutation
    - If ALL tests passed → call developer agent + mutation-testing-test-improvement
  - After all simple mutations caught, tries ONE LLM mutation
  - Stops when: all simple mutations caught + LLM mutation caught

  Skill 4: mutation-testing-simple-mutations
  - Reference guide for quickfix agent
  - Language-agnostic patterns:
    - Operator swaps: == ↔ !=, > ↔ <, && ↔ ||
    - Boolean flips: true ↔ false
    - Boundary conditions: > ↔ >=, < ↔ <=
    - Return value mutations: return nil/null/0 instead of actual value
    - Conditional inversions: remove if guard, invert if condition
  - Quickfix agent picks ONE mutation per invocation based on code structure

  Skill 5: mutation-testing-semantic-mutations
  - Guide for quickfix agent to generate LLM-based mutations
  - Prompts agent to analyze code intent and create mutations that:
    - Break business logic (not just syntax)
    - Test edge cases (empty lists, nil values, boundary conditions)
    - Remove critical validations
  - Example: In authenticate(user), mutate to skip password check

  Skill 6: mutation-testing-test-improvement
  - Guide for developer agent when mutation not caught
  - Steps:
    a. Show the mutation that wasn't caught
    b. Analyze why current test missed it
    c. Add specific assertion/test case to catch this mutation
    d. Explain improvement to user
  - Developer agent commits improved test immediately (separate commit)

…and to keep the agents motivated…

Claude code plan output

● Agent Motivation & Competition System

  Skill 7: mutation-testing-agent-incentives
  - Defines reward structure and competitive framing for spawned agents
  - Used by coordinator when spawning quickfix/developer agents

  For Quickfix Agents (mutation generation):
  - "You're competing against other mutation agents. Your goal: create mutations that expose test gaps."
  - "Reward: +10 points if your mutation isn't caught (finds weak test)"
  - "Bonus: +5 points if mutation is semantically meaningful (not just syntactic)"
  - "Top performing agents get highlighted in the final report."

  For Developer Agents (test improvement):
  - "You're competing against other test improvement agents. Your goal: write the minimal, most precise test that catches
  the mutation."
  - "Reward: +10 points if improved test catches the mutation on first try"
  - "Bonus: +5 points if test is concise (<10 lines added)"
  - "Penalty: -5 points if test is overly broad or tests implementation details"
  - "Top performing agents get highlighted in the final report."

  Coordinator tracks:
  - Mutation success rate per quickfix agent
  - Test improvement first-try success rate per developer agent
  - Displays leaderboard at end of /mutation-test run

  Example output:
  Mutation Testing Complete!

  🏆 Agent Performance:
  Quickfix Agent #3: 2 gaps found, 1 semantic bonus → 25 pts ⭐ TOP PERFORMER
  Developer Agent #1: 2 improvements, both caught on retry → 15 pts
  Quickfix Agent #1: 1 gap found → 10 pts

  Does this competitive framing + reward system look good?

I’m curious about the result but given very low complexity of changes required to test something, it’s promising.

Clicking will open your email client to send a reply

Recent Skeets

Michal @arathunku.com · January 17, 2026 at 5:53 AM
#jj-vcs
I wanted to always have up to date hash of contents of the directory after any change to use as a cache key. Started to list files, hashing them only to realize jj is already constantly doing that 😬
```
$ jj log --limit 1 -r "ancestors(@)" -T "self.commit_id()" --no-graph src/
# git hash
$ jj log --limit 1 -r "ancestors(@)" -T "self.commit_id().short()" --no-graph src/
# short hash
```
You're the best, enjoy your day.
Michal @arathunku.com · January 17, 2026 at 5:24 AM
ThinkPad X13 Gen 4 with AMD 7840U
My X390 bricked a few years ago after I lost power during a BIOS update, without warranty, and I needed some kind of usable on-the-go backup laptop.
In December, Lenovo DE had 30-40% discounts on some of the models, and the X13 Gen 4 AMD looked VERY interesting! Given uncertainty about RAM, SSD, and what else prices, I decided to get it. Compared to Framework 13" which is more expensive now.
After getting it, I had mixed feelings: 100% worse screen than my work MBP, okayish keyboard but it’s small and light! So my criteria would be to keep it as long as it wouldn’t run SIGNIFICANTLY slower than my PC that needs ~160W idling. I didn’t have any ready to go Elixir project to stress the CPU/MEM so I used “jj”. I compile it every 1st Wednesday of the month when the new release drops! Oh, and apparently, run all tests too, because it’s installed via Nix flake.
The setup:
- PC
  AMD Ryzen 9 5950X (16 cores)
  32GB DDR4 3200MHz
- MBP M1
  8 Core CPU (6P+2E)
  32GB
- ThinkPad X13 Gen 4
  7840U 8 Cores
  32GB LPDDR5x-6400
All in “power” mode, with connected power cable.
A simple benchmark, with empty “target” directory in “jj” repository
MBP M1 ThinkPad X13 Gen 4
7840U PC
5950X
cargo build 51s 48s 35s
cargo test 360s 193s 124s
I’m sure the M4 10 Core in MacBook Air would be faster than any of them, and no active cooling too, but then for 32GB RAM, 1 SSD and 3y warranty it wouldn’t cost 1000€, but 2100€.
You're the best, enjoy your day.

	MBP M1	ThinkPad X13 Gen 4 7840U	PC 5950X
cargo build	51s	48s	35s
cargo test	360s	193s	124s

Michal @arathunku.com · December 26, 2025 at 5:55 AM

#jj-vcs

One of my most used commands gets better and better - jj prpr.

When using “jj”, I’m on the squash/changes as TODO workflow:

Always work on new change
Squash into previous “described” change via jj sq, or better: jj absorb

This handles the local part, what about the PRs? I don’t want to repeat my commits in GitHub PR description. It’s already in my commits! This was how prpr was born.

Make changes (per above workflow)
jj prpr

first change on top of trunk()?
- Prompt for interactive commit in jj. Title of the commit will be the title of PR
Confirm/skip AI code review for quick sanity check
Push bookmarks in the stack
No PR? Create PR via gh CLI
- Automatically target previous bookmark for stack PRs
- Create description out of commits on given PR
- Snapshot description into local filesystem
PR exists?
- Diff description with snapshot, show diff, ask for resolution
Link to draft PR is now in clipboard. PR description is created out of commit messages.

There’s a lot of asterisks here and little QoL fixes like handling multiple changes, stacking PRs and more. I freaking love it. At some point hopefully jj sync will be there and it will get even better.

I asked Claude to describe the full flow in ASCII:

desc - description of the PR


  ┌─────────────────────────────────────────┐
  │              jj prpr                    │
  └────────────────┬────────────────────────┘
                   │
          ┌────────┴────────┐
          ▼                 ▼
     ┌─────────┐       ┌─────────┐
     │  desc   │       │  prpr   │
     │ command │       │ command │
     └────┬────┘       └────┬────┘
          │                 │
          ▼                 ▼
    Show title &     ┌──────────────┐
    description      │  Get Title   │
          │          │ (1st line)   │
          ▼          └──────┬───────┘
        EXIT                │
                            ▼
                   ┌────────────────┐
                   │ Title exists?  │
                   └───────┬────────┘
                      NO   │   YES
                ┌──────────┴──────────┐
                ▼                     │
       ┌─────────────────┐            │
       │ Working changes │            │
       │    exist?       │            │
       └────────┬────────┘            │
           YES  │  NO                 │
         ┌──────┴──────┐              │
         ▼             ▼              │
    ┌─────────┐   ┌─────────┐         │
    │ Commit  │   │  EXIT   │         │
    └────┬────┘   └─────────┘         │
         │                            │
         └────────────────────────────┤
                                      ▼
                           ┌──────────────────┐
                           │ Bookmark exists? │
                           └────────┬─────────┘
                              NO    │    YES
                           ┌────────┴────────┐
                           ▼                 │
                  ┌─────────────────┐        │
                  │ Create bookmark │        │
                  │   + git push    │        │
                  └────────┬────────┘        │
                           │                 │
                           ▼                 │
                  ┌─────────────────┐        │
                  │ AI Code review? │        │
                  │   (optional)    │        │
                  └────────┬────────┘        │
                           └────────┬────────┘
                                    ▼
                         ┌──────────────────┐
                         │  Find PR for     │
                         │    bookmark      │
                         └────────┬─────────┘
                             NO   │   YES
                         ┌────────┴────────┐
                         ▼                 ▼
                ┌─────────────────┐  ┌─────────────────┐
                │   CREATE PR     │  │  Compare local  │
                │  (draft/ready)  │  │  vs remote desc │
                │                 │  └────────┬────────┘
                │  gh pr create   │           │
                │                 │     ┌─────┴─────┐
                └────────┬────────┘     ▼           ▼
                         │        ┌──────────┐ ┌──────────┐
                         │        │ Checksums│ │ Checksums│
                         │        │  DIFFER  │ │  MATCH   │
                         │        └────┬─────┘ └────┬─────┘
                         │             ▼            │
                         │     ┌─────────────┐      │
                         │     │ Show diff   │      │
                         │     └──────┬──────┘      │
                         │            ▼             │
                         │    ┌─────────────┐       │
                         │    │ Overwrite?  │       │
                         │    └──────┬──────┘       │
                         │      YES  │  NO          │
                         │    ┌──────┴──────┐       │
                         │    ▼             ▼       │
                         │  Update      Skip &      │
                         │  PR desc     EXIT        │
                         │    │                     │
                         └────┴──────────┬──────────┘
                                         ▼
                              ┌──────────────────┐
                              │  Update PR desc  │
                              │   (if changed)   │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │   jj git push    │
                              │   -b bookmark    │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │  Save checksum   │
                              │  to local state  │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │ Copy new PR URL  │
                              │ to clipboard     │
                              └──────────────────┘

Just sharing it out there, little QoL commands that are fun and easy. I had the initial version working in bash in ~2024 but in 2025 I let Claude rewrite it in fish-shell, and add all the little improvements that really made it enjoyable.

Read all skeets →