Most QA automation is built around scripts.

A script logs in, clicks a button, checks for expected text, and reports pass or fail. That is useful. It also misses a lot of the reality inside modern software.

MyTidyBiz is a SaaS-based operating system for cleaning companies that supports the full quote-to-cash workflow: quote requests, sales follow-up by SMS or email, customer acceptance, manager review, cleaner scheduling, schedule changes, property details, completed work reports, invoices, tips, and cleaner payments.

MyTidyBiz role handoffs across customer, sales, manager, and cleaner workflows
MyTidyBiz handoffs across customers, sales managers, operations managers, cleaners, messages, scheduling, payment, and backend state.

The platform manages a complex dance between customers, sales managers, operations managers, and cleaners communicating through email, SMS, and internal messages. What matters for QA is that every arrow is a handoff: messages, schedules, permissions, job status, customer expectations, payments, and backend records all have to stay consistent. Many of the important defects do not live on one screen. They happen between roles.

So we created OAQA, pronounced "Oak-A": Orchestrated Agentic Quality Assurance. It is our internal role-based autonomous AI QA system: a coordinated QA platform that behaves more like a disciplined test team.

The Problem With Fragmented QA

Take the MyTidyBiz booking process. Broken into pieces, each step looks straightforward:

A traditional automated test can validate each bullet in isolation. The hard part is maintaining context between them: the same customer, quote, property description, accepted booking, assigned cleaners, schedule changes, invoice, tip, and cleaner payout all have to stay connected.

In many QA processes, a human tester is the thing holding that context together. They remember what the customer requested, what sales promised, what the manager scheduled, and what the cleaner actually saw. That works, but it is expensive, slow to repeat, and hard to turn into durable release evidence.

The hardest defects usually sit in those context gaps.

The UI may look correct while the database state is wrong. An invitation may send successfully but create the wrong permissions. A cleaner may see the right page but the wrong booking. A manager action may appear to save while a background process fails quietly.

When that happens, a pass/fail result is not enough. The engineering team needs evidence. What role was acting? What step failed? What changed in the database? What do the logs say? Which boundary is probably broken?

That is the gap we are trying to close.

The OAQA (OakA) Model

We use multiple AI agents operating as defined personas:

OAQA model showing a central orchestrator connecting sales manager, operations manager, customer, and cleaner personas
OAQA keeps workflow context in the orchestrator while each role persona stays bounded.

The orchestrator owns the test run. It defines the workflow, validates environment readiness, assigns bounded work to each persona, checks the evidence that comes back, and produces the final report.

The role personas stay in their lanes. Each role uses isolated browser/session context and controlled test identities so actions do not bleed across personas. They know how to do their things and what to expect: if a button does not exist to let them submit a form, they will report that and move on.

The key advantage is context continuity. Each persona only needs to understand its own role, but the orchestrator maintains the larger workflow context across all of them. It knows what the customer requested, what the sales manager quoted, what the operations manager scheduled, what the cleaner accepted, what changed later, and what the system recorded underneath.

The design goal is simple: make the system act less like one giant script and more like a small QA team that knows who is responsible for what.

What A Test Run Looks Like

The process can be managed by communicating with the orchestrator persona through whatever surface has been implemented for the team: a user interface, email, Slack, Microsoft Teams, Telegram, an API call, a DevOps message, or another workflow trigger. The interface matters less than the operating model. The orchestrator receives the request, understands the target workflow or release context, and starts coordinating the run.

OAQA follows this general execution pattern:

OAQA execution workflow showing bootstrap, role call, repeated test runs and reports, and a final wrap-up
OAQA starts with setup and role readiness, loops through bounded test runs and reports, then produces the final overall wrap-up.

During the test runs, the orchestrator assigns one bounded action at a time. For example:

The worker agents do not just report "done." They return evidence: screenshots, observed state, URLs, relevant messages, and notes about anything unexpected. The orchestrator uses that evidence to decide the next step or to stop the run and produce a useful failure report.

That distinction matters. The system is not trying to maximize clicks. It is trying to maximize reliable signal.

Output For Humans And AI Teams

The output of an OAQA run is designed for two audiences.

A human and a computer both reading QA reports from the same evidence-rich OAQA output
OAQA reports are designed for human review and AI-assisted development handoff.

The first audience is the human team. Product managers, QA leads, and developers need a clear report they can trust: what workflow was tested, which personas participated, what passed, what failed, what evidence was collected, and where the likely failure boundary is. That report supports release decisions, triage, and stakeholder communication.

The second audience is the AI-assisted development team. If AI tools are going to help fix the problem, they need more than a failing test name. They need the complete workflow context and observations from the run:

That makes the OAQA report useful as both a human QA artifact and an AI development handoff. A person can review it, challenge it, and prioritize it. An AI development agent can consume it as context for debugging, code review, or repair work. This added context saves time and tokens.

How This Benefits The SDLC

This changes QA from a late-stage bottleneck into a repeatable delivery capability.

For product and engineering teams, the benefits show up in a few practical ways.

Better cross-role coverage. The system exercises workflows the way real users experience them: across roles, emails, permissions, and state transitions.

Lower manual QA load. Repetitive smoke and regression checks can run without requiring a person to manually replay the same workflow every release.

Clearer defect evidence. Failures come with context: role, step, expected state, observed state, screenshots, logs, and database clues where available.

Faster diagnosis. Engineers get a narrower failure boundary instead of a vague "the test failed" message.

Reusable regression capability. Once a workflow is modeled, it can become part of a release smoke pack or scheduled regression run.

The point is not to remove human judgment from QA. The point is to stop spending human judgment on repetitive replay when it can be spent on risk, edge cases, product quality, and release decisions.

Self-Evolving QA Workflows

Another advantage is that OAQA can support self-evolving QA workflows.

QA workflows evolving from checklists to scripts to connected workflow tests to self-updating release-aware QA packs
OAQA workflows can evolve from static checks into release-aware QA packs that adapt as the product changes.

After each pull request, the orchestrator can use user story acceptance criteria, PR comments, commit notes, and code changes as inputs. A specialized product owner persona reviews those inputs from the feature perspective. That persona is aware of the product goals and objectives, so it can reason about what changed, which roles are affected, what customer or business outcome matters, what workflow context is at risk, and what evidence should be collected.

The orchestrator can then use the product owner persona's findings to propose test scenarios and workflow updates: new checks for new features, revised steps for changed behavior, retired checks for removed functionality, and targeted regression runs for the personas most likely to be affected.

That keeps the QA model connected to the product as it evolves. Instead of a regression suite slowly drifting away from the real application, the test workflows can be updated alongside the code changes that created the need for them. The OAQA workflow pack can also be versioned with the software release, so teams know which personas, checks, and evidence requirements belong to each product version. For sensitive areas, those updates can still go through human review before becoming part of the durable QA pack.

Implementation Details That Matter

The useful part of this model is not just the prompts. It is the operating structure around the agents.

We keep a shared QA workspace separate from the application repo. That keeps test plans, role instructions, evidence requirements, and run reports organized without turning the product codebase into an experimental scratchpad.

Each persona gets isolated runtime context. Separate browser profiles and controlled identities make it possible to test realistic cross-role behavior without contaminating sessions.

The orchestrator publishes role context before execution. Each worker knows its identity, its permitted surface area, and the exact workflow it is responsible for. The system is intentionally role-bound.

We also separate reasoning from execution. The orchestrator uses the stronger reasoning model because it is responsible for sequencing, validation, and diagnosis. The worker agents can use lower-cost models because their work is bounded and repetitive. That keeps the cost profile practical.

Finally, the system is evidence-driven. A role agent is not allowed to claim success without returning observable proof. The orchestrator is responsible for validating that proof against the larger run.

What We Still Keep Guarded

Agent-based QA is powerful, but it needs boundaries.

Simple road guardrails representing disciplined boundaries for OAQA
OAQA is designed around broad context and narrow authority.

One reasonable concern is that the orchestrator knows a lot. That is intentional, but it does not mean the orchestrator should have unlimited authority. OAQA is designed around a simple principle: broad context, narrow authority.

The orchestrator can coordinate personas, compare evidence, read approved logs or state, and produce reports. It should not mutate infrastructure, override business rules, access production data casually, or act outside the boundaries defined for the run. The orchestrator is only an observer and reporter, not an operator.

We do not want agents mutating infrastructure. We do not want them improvising outside their assigned role. We do not want them handling production data casually. We do not want a confident narrative in place of evidence.

So the system is designed with guardrails:

The best use of autonomous QA agents is not unchecked autonomy. It is disciplined autonomy inside a well-designed process.

Why This Is Commercially Interesting

AI development tools generate more code, faster. The bottleneck is shifting. Teams can now produce features, fixes, and refactors at a pace that traditional QA processes were not designed to absorb. There is simply more code moving through the system, and every extra change creates more places where workflow context can break.

That is why the output layer matters commercially. OAQA does not just say that a test failed. It creates the evidence package needed for both human review and AI-assisted repair, which makes QA a faster feedback loop instead of a slower gate at the end of development.

This model is a strong fit for platforms with:

If your team is dealing with regression bottlenecks, multi-role workflow complexity, or too much time lost reproducing defects, this is exactly the kind of problem this model is designed to solve.

Reach out if your team needs assistance. We can help you stand up your own OAQA system, or we can host a virtual QA team that you can interact with through Slack, Microsoft Teams, or your preferred workflow surface.

Want to test your workflows this way?

CloudZoom can help map your critical product workflows, design role-based QA personas, and build a practical OAQA system around your real environment.

TALK THROUGH YOUR QA WORKFLOWS