Testing, Testing, and More Testing: A Four-Tier Strategy for Agent Workflows

💡 In the Agent era, errors are produced and spread faster than ever. Traditional testing processes can’t keep up with development speed anymore. This article proposes a four-tier testing strategy—Unit, API, Smoke, and E2E—with detailed timing for each tier within Agent workflows, helping developers balance efficiency and quality.

Agent has made code generation incredibly fast, but it’s also amplified something else exponentially: errors are produced faster and spread faster too. Previously I could make three logic changes in a day; now an Agent does it in minutes. A bug used to be a single function gone wrong, but now the Agent might produce:

• The function itself is fine, but the interface contract changed • It runs locally, but breaks after merging to branch • Individual features work, but break other chains when released

So testing can no longer be defined as a post-development process—it needs to be part of the Agent development orchestration. This article shares how I approach this now.

Four Types of Testing

My primary definitions are: unit tests, API tests, smoke tests, and E2E tests. These different test levels need to run at different times, otherwise they’ll slow down development. Unit tests catch low-level errors; API tests catch collaboration errors; smoke tests catch integration incidents; E2E tests catch user experience failures.

Test Layer	The Real Question It Answers	What Happens If Missing
Unit Test	Did this local change break the most basic logic?	Bugs slip through at the cheapest stage
API Test	Can modules still collaborate per contract?	Individual features work, but integration fails
Smoke Test	After merging, are core paths still alive?	Looks mergeable, but explodes on release
E2E Test	Does the full user journey still work?	Engineering works, but user paths break

When Each Test Should Run

Test Type	Best Timing	Primary Goal	Worst Misuse
Unit Test	Run immediately after Agent completes a local function/module change	Catch local logic errors fast	Using it to verify entire business flows
API Test	Before submitting feature branch	Verify module contracts, I/O, dependencies	Over-relying on mocks, testing fake APIs
Smoke Test	Before PR merge or before merging to release	Confirm core paths survive	Cramming too many scenarios
E2E Test	Release candidate, before critical launch	Verify real user journeys	Making it the daily dev inner loop

Earlier tests should be faster, narrower, and cheaper; later tests should be fewer, heavier, and closer to production.

When Should Unit Tests Run?

Unit tests aren’t for pre-release—they’re part of the Agent development inner loop.

For these types of local, deterministic, cheap-feedback tasks, unit tests should run immediately:

• Pure function changes • Rule logic changes • State transition changes • Schema mapping changes • Data cleaning logic changes • Tool parameter assembly • Import/export field handling

In my Agent workflow, unit tests are attached to two points: run automatically after each local implementation, and run again before Agent finishes this round of modifications. These tests need to be fast, so they’re best added by the Agent automatically, triggered after local logic changes, with commits for each small feature point and pre-commit hooks to run related test files.

Change Type	When to Run Unit Tests
New pure function	Run immediately after writing
Modify existing rule	Run related cases immediately
Fix bug	Add regression test first, then code changes, then run
Refactor implementation (no behavior change)	Run immediately after change

When Should API Tests Run?

With limited context leading to memory inaccuracy, or multiple agents modifying simultaneously, these issues frequently occur:

• Parameter name changed, caller still uses old field • Return structure changed, downstream parser not updated • Tool call order looks fine, but state semantics changed completely • DB write succeeded, but API response contract changed • One module added default value, another module falsely judges success

So before submitting a feature branch or opening a PR, a complete API test is needed.

My API tests cover:

• Frontend-backend API contracts • Tool call input/output contracts • DB read/write boundaries • Webhook/event payloads • File import/export structures • Prompt output schemas • Agent step state transfer formats

API tests are usually slower than unit tests, but not requiring CI yet. So I prioritize: after Agent finishes feature self-test, let it complete all API test code and verify. Trigger via hook before pre-push, and ideally run again before PR creation.

When Should Smoke Tests Run?

Smoke tests verify whether the core flow breaks after changes enter integration state. Since I manage worktree tests myself, smoke tests run at only two points: before PR merges to worktree, and before merging to the test release version.

Smoke tests basically only check:

• Application can start • Core pages can open and return data • Key APIs return normally • 1~2 main paths can run through • Key dependencies (DB/cache/queue/model service) not broken

One note: don’t write it as a mini E2E. This layer doesn’t need comprehensiveness—it needs key paths alive. I define it in CI to run on every PR/merge, and on every release.

When Should E2E Tests Run?

E2E tests are expensive but also an important guard. However, they shouldn’t be the Agent daily development inner loop. Best timing is usually after merging to release, recording frequently-missed high-risk main paths, especially for core feature modules that might have frequent changes.

E2E should verify user paths are established, ensuring business flows are smooth from start to end. My approach is adding nightly scheduled tasks in CI for more complete E2E suites. Only add local runs for high-risk changes.

My Current Flow: From Feature Issue to Release, Which Tests Should Run?

Based on my development rhythm, here’s how I designed it:

Stage	Goal	Required Tests
Create feature issue	Define acceptance criteria	Write test strategy first, don’t run yet
Agent develops local logic	Catch low-level errors fast	Unit tests
Feature complete	Verify module collaboration	API/contract tests
Open PR / prepare merge	Prevent pollution to main	Smoke tests + necessary regression
Merge to release candidate	Verify integration stability	Release smoke test
Before production	Verify key user paths	Key E2E

Test Section That Should Be in an Issue Template

## Feature
Support importing new vendor CSV format

## Risks
- Field mapping changes
- Null value handling
- List page display after import
- Old format compatibility

## Required tests
- [ ] Unit tests: field mapping / null handling / schema validation
- [ ] API tests: import API response / DB write result
- [ ] Smoke tests: list page viewable after import
- [ ] E2E: only on release candidate - "upload CSV → import success → list visible" main path

## Merge gate
- Unit tests pass
- API tests pass
- smoke passes

This isn’t the complete template—it’s an addition to my previous issue template. Check previous articles for the full template.

All test files need to be written by the Agent during development. This article focuses on timing for each test type. These logics can all be defined via hooks or Agent skills. I feel I’m already the slowest part of the development process, so I write more tests to reduce my workload.