Testing, Testing, and More Testing: A Four-Tier Strategy for Agent Workflows

💡 In the Agent era, errors are produced and spread faster than ever. Traditional testing processes can’t keep up with development speed anymore. This article proposes a four-tier testing strategy—Unit, API, Smoke, and E2E—with detailed timing for each tier within Agent workflows, helping developers balance efficiency and quality.

Agent has made code generation incredibly fast, but it’s also amplified something else exponentially: errors are produced faster and spread faster too. Previously I could make three logic changes in a day; now an Agent does it in minutes. A bug used to be a single function gone wrong, but now the Agent might produce:

• The function itself is fine, but the interface contract changed • It runs locally, but breaks after merging to branch • Individual features work, but break other chains when released

So testing can no longer be defined as a post-development process—it needs to be part of the Agent development orchestration. This article shares how I approach this now.

Four Types of Testing

My primary definitions are: unit tests, API tests, smoke tests, and E2E tests. These different test levels need to run at different times, otherwise they’ll slow down development. Unit tests catch low-level errors; API tests catch collaboration errors; smoke tests catch integration incidents; E2E tests catch user experience failures.

Test Layer The Real Question It Answers What Happens If Missing
Unit Test Did this local change break the most basic logic? Bugs slip through at the cheapest stage
API Test Can modules still collaborate per contract? Individual features work, but integration fails
Smoke Test After merging, are core paths still alive? Looks mergeable, but explodes on release
E2E Test Does the full user journey still work? Engineering works, but user paths break

When Each Test Should Run

Test Type Best Timing Primary Goal Worst Misuse
Unit Test Run immediately after Agent completes a local function/module change Catch local logic errors fast Using it to verify entire business flows
API Test Before submitting feature branch Verify module contracts, I/O, dependencies Over-relying on mocks, testing fake APIs
Smoke Test Before PR merge or before merging to release Confirm core paths survive Cramming too many scenarios
E2E Test Release candidate, before critical launch Verify real user journeys Making it the daily dev inner loop

Earlier tests should be faster, narrower, and cheaper; later tests should be fewer, heavier, and closer to production.

When Should Unit Tests Run?

Unit tests aren’t for pre-release—they’re part of the Agent development inner loop.

For these types of local, deterministic, cheap-feedback tasks, unit tests should run immediately:

• Pure function changes • Rule logic changes • State transition changes • Schema mapping changes • Data cleaning logic changes • Tool parameter assembly • Import/export field handling

In my Agent workflow, unit tests are attached to two points: run automatically after each local implementation, and run again before Agent finishes this round of modifications. These tests need to be fast, so they’re best added by the Agent automatically, triggered after local logic changes, with commits for each small feature point and pre-commit hooks to run related test files.

Change Type When to Run Unit Tests
New pure function Run immediately after writing
Modify existing rule Run related cases immediately
Fix bug Add regression test first, then code changes, then run
Refactor implementation (no behavior change) Run immediately after change

When Should API Tests Run?

With limited context leading to memory inaccuracy, or multiple agents modifying simultaneously, these issues frequently occur:

• Parameter name changed, caller still uses old field • Return structure changed, downstream parser not updated • Tool call order looks fine, but state semantics changed completely • DB write succeeded, but API response contract changed • One module added default value, another module falsely judges success

So before submitting a feature branch or opening a PR, a complete API test is needed.

My API tests cover:

• Frontend-backend API contracts • Tool call input/output contracts • DB read/write boundaries • Webhook/event payloads • File import/export structures • Prompt output schemas • Agent step state transfer formats

API tests are usually slower than unit tests, but not requiring CI yet. So I prioritize: after Agent finishes feature self-test, let it complete all API test code and verify. Trigger via hook before pre-push, and ideally run again before PR creation.

When Should Smoke Tests Run?

Smoke tests verify whether the core flow breaks after changes enter integration state. Since I manage worktree tests myself, smoke tests run at only two points: before PR merges to worktree, and before merging to the test release version.

Smoke tests basically only check:

• Application can start • Core pages can open and return data • Key APIs return normally • 1~2 main paths can run through • Key dependencies (DB/cache/queue/model service) not broken

One note: don’t write it as a mini E2E. This layer doesn’t need comprehensiveness—it needs key paths alive. I define it in CI to run on every PR/merge, and on every release.

When Should E2E Tests Run?

E2E tests are expensive but also an important guard. However, they shouldn’t be the Agent daily development inner loop. Best timing is usually after merging to release, recording frequently-missed high-risk main paths, especially for core feature modules that might have frequent changes.

E2E should verify user paths are established, ensuring business flows are smooth from start to end. My approach is adding nightly scheduled tasks in CI for more complete E2E suites. Only add local runs for high-risk changes.

My Current Flow: From Feature Issue to Release, Which Tests Should Run?

Based on my development rhythm, here’s how I designed it:

Stage Goal Required Tests
Create feature issue Define acceptance criteria Write test strategy first, don’t run yet
Agent develops local logic Catch low-level errors fast Unit tests
Feature complete Verify module collaboration API/contract tests
Open PR / prepare merge Prevent pollution to main Smoke tests + necessary regression
Merge to release candidate Verify integration stability Release smoke test
Before production Verify key user paths Key E2E

Test Section That Should Be in an Issue Template

## Feature
Support importing new vendor CSV format

## Risks
- Field mapping changes
- Null value handling
- List page display after import
- Old format compatibility

## Required tests
- [ ] Unit tests: field mapping / null handling / schema validation
- [ ] API tests: import API response / DB write result
- [ ] Smoke tests: list page viewable after import
- [ ] E2E: only on release candidate - "upload CSV → import success → list visible" main path

## Merge gate
- Unit tests pass
- API tests pass
- smoke passes

This isn’t the complete template—it’s an addition to my previous issue template. Check previous articles for the full template.

All test files need to be written by the Agent during development. This article focuses on timing for each test type. These logics can all be defined via hooks or Agent skills. I feel I’m already the slowest part of the development process, so I write more tests to reduce my workload.