Agile & BDD for QA Engineers — Scrum, Shift-Left & Quality Ownership

How QA Fits in Agile

Traditional software development placed QA at the end of the process. Development built the software for weeks or months, and QA received it just before the release deadline to find as many defects as possible before shipping. This model treated quality as a filter — something applied at the exit to catch what development missed.

Agile fundamentally inverts this relationship. In a well-functioning Agile team, QA is not a phase that happens after development — it is a mindset that runs throughout the entire sprint. The QA engineer participates from the moment a story is conceived, challenges requirements for testability before a single line of code is written, tests alongside development as features are built, and contributes to retrospectives to improve the process for next sprint.

This shift from gatekeeper to quality owner is the most important conceptual change for QA engineers moving from Waterfall to Agile. The gatekeeper model gives QA power through approval authority. The quality owner model gives QA influence through early collaboration. The quality owner model is far more effective — and far more demanding. It requires QA engineers to have strong communication skills, domain knowledge, and the ability to work at the speed of the sprint.

The continuous testing mindset means testing never stops. Between sprint ceremonies, during development, in pull request review, in CI pipelines, in exploratory sessions — quality verification is constant rather than periodic. This requires automation for repetitive regression, manual exploratory sessions for new functionality, and API testing woven into the development cycle itself.

Scrum Cycle — Architecture Diagram

The Scrum cycle is a two-week loop. QA participates actively in every phase:

Backlog

→

Sprint Planning

→

Sprint (2 weeks)

Dev

QA parallel

→

Sprint Review

→

Retrospective

→

Release

The critical detail in this diagram is "Dev + QA parallel." In Agile, QA does not wait for all stories to be complete before testing begins. As soon as a story is marked ready for testing, QA picks it up — often while development is still working on the next story. This parallel workflow compresses the testing timeline and surfaces bugs while the developer still has the context to fix them quickly.

Scrum Ceremonies for QA

Sprint Planning

Sprint Planning is where the team selects stories from the backlog and commits to completing them in the upcoming sprint. QA's specific contributions in Sprint Planning:

Effort estimation input: QA estimates testing effort alongside development. A story that looks like 3 points of development work might be 5 points total when testing complexity is included. Complex integration scenarios, test data setup, and environment dependencies all add testing time that is invisible unless QA speaks up in planning.
Testability risk flagging: "This story requires third-party API access that is not available in our test environment" — surfacing this in planning prevents a sprint-end surprise where a story cannot be verified.
Dependency identification: Which stories must be completed before others can be tested? QA sees integration dependencies that may not be obvious from a development perspective.

Backlog Refinement

Refinement is where QA delivers the most upstream value. In a refinement session, QA challenges stories for testability — the property that makes a story verifiable in a time-constrained sprint environment.

Challenging requirements: "This story says the page should load quickly. What is the measurable definition of quickly? Under 2 seconds on a 4G connection? Under 5 seconds on 3G?" Vague requirements cannot be tested.
Identifying missing acceptance criteria: "What happens if the user's session expires mid-checkout? What should the system show if the payment gateway is down?" These edge cases need to be in the story's acceptance criteria before development begins.
Raising testability issues: "This story requires a user with admin + read-only permissions simultaneously. That combination does not exist in our test data. We need to resolve that before we can accept this into a sprint."

Daily Standup

QA's standup answers three questions with a QA lens: what did I test yesterday (and what are the results), what am I testing today, and what is blocking me? The key QA-specific items to surface: stories blocked waiting for a bug fix (don't test around a known bug and report false confidence), environment issues that affect multiple stories, and bugs that are high severity enough to warrant immediate developer attention outside the normal sprint flow.

Sprint Review

In the Sprint Review, QA contributes by presenting the test results for the sprint — what passed, what failed, what was not reached, and what bugs were found. This is not just a green/red dashboard; it is context. "We executed 47 test cases, 43 passed, 4 are in 'In Progress' status pending bug fixes. We found 11 bugs; 8 are resolved, 2 are deferred to next sprint, 1 is a P1 being addressed in a hotfix."

Retrospective

Retrospectives are where QA can advocate for process improvements: testing environments that were unstable (action: dedicated environment stability sprint), stories that lacked acceptance criteria (action: mandatory AC before refinement sign-off), automation that would have caught a regression earlier (action: add to automation backlog with priority). QA's retrospective contributions should be concrete and actionable, not complaints.

Acceptance Criteria — The QA Foundation

Acceptance criteria (AC) define the conditions under which a story can be considered done from a product perspective. Well-written AC is the single most important enabler of effective sprint testing. Without clear AC, QA is testing against an implicit standard that may differ between the developer's, QA's, and the product owner's mental models.

Given/When/Then format

BDD-style acceptance criteria use the Given/When/Then structure:

Given [precondition or starting state]
When [action the user takes]
Then [expected outcome]

Example — User login story:

Given I am on the login page and I have a valid account, When I enter correct credentials and click Login, Then I am redirected to the dashboard and my username is shown in the header.
Given I am on the login page, When I enter an incorrect password three times, Then my account is locked and I see a message telling me to contact support.
Given I am logged in on device A, When I log in on device B simultaneously, Then my session on device A is invalidated within 60 seconds.

Testable vs untestable AC

Untestable: "The system should be user-friendly." — No measurable criterion. Cannot be verified.
Testable: "A new user with no prior experience should complete the registration flow in under 3 minutes on first attempt." — Specific, measurable, verifiable.
Untestable: "Checkout should be fast." — Subjective, no benchmark.
Testable: "The checkout completion API call should respond within 2 seconds for 95% of requests under 100 concurrent users." — Measurable, can be tested with performance tooling.

Definition of Done (DoD)

The Definition of Done is the team's agreement on what conditions must be met before a story is considered complete. It is distinct from acceptance criteria — AC is story-specific, DoD applies to every story. QA's contributions to the DoD typically include:

Unit tests written and passing (developer responsibility, but QA verifies coverage is adequate)
Integration tests for API contract verified
All acceptance criteria tested and passing
At least one exploratory testing session completed for the feature
Regression suite run; no new failures introduced
Code reviewed and merged to main branch
Deployed to staging environment
Documentation updated if the feature changes user-facing behaviour
Zero P1 or P2 bugs open against this story

The DoD should be visible to the entire team — posted in the team wiki, in the Jira board description, or on the team's physical wall. When a developer says a story is "done," it means it meets every item on the DoD list, not just that coding is complete.

Shift-Left Testing

Shift-left means moving testing activities earlier in the development lifecycle — toward the left side of the timeline, where requirements and design live, rather than the right side where QA has traditionally sat.

Concrete shift-left practices:

Requirements review: QA participates in requirements review sessions, flagging ambiguity, missing edge cases, and untestable statements before any development begins.
Design review: QA reviews technical design documents for testability — are there seams in the architecture where unit testing is possible? Are the API contracts defined clearly enough to test independently?
Test case drafting during refinement: High-level test scenarios are drafted in the refinement session, before the sprint starts. When development picks up the story, QA already has a draft of what will be tested — this reduces the "what exactly should I test?" ambiguity at the end of the sprint.
Code review participation: QA reviews pull requests with a testing lens — not for code quality, but for testability, error handling, and logging adequacy. "This function catches the exception but swallows it — when it fails, we won't know why."
Unit test review: QA verifies that developer-written unit tests cover the business-critical paths, not just the implementation-easy paths.

From Virtusa: Our mobile QA team adopted shift-left practices incrementally over six months. The first change was QA attending backlog refinement — previously we had been excluded. Within two sprints, the number of stories arriving in QA with missing or ambiguous acceptance criteria dropped from roughly 40% to under 10%. The second change was drafting exploratory test charters during refinement. By the time a story was ready for testing, we had a prepared investigation plan rather than starting from scratch. Sprint velocity increased because QA was unblocked from the first day of testing.

Three Amigos — The Pre-Sprint Alignment Session

The Three Amigos session (also called "3As" or "Story Kickoff") brings together three perspectives before development begins on a story: the developer who will build it, the QA engineer who will test it, and the product owner who defined it. Each sees the story from a fundamentally different angle, and alignment requires all three perspectives in the same conversation.

What each participant brings:

Product Owner: The business intent — what user problem does this solve? What does success look like from the user's perspective? What are the non-negotiable behaviours?
Developer: Technical constraints and implementation approach — what is easy to implement vs complex? What existing systems does this interact with? What edge cases arise from the technical approach?
QA: Testing perspective — what scenarios need to be covered? What data states need to be tested? What are the failure modes? Are there testability concerns with the proposed implementation?

Running an effective 3As meeting: time-box to 30 minutes, prepare examples and counter-examples before the meeting, end with agreed acceptance criteria that all three parties sign off on, and document the output in the story's AC field immediately.

BDD in Practice — Gherkin and Cucumber

Behaviour-Driven Development (BDD) extends the Given/When/Then acceptance criteria format into executable specifications. The process: acceptance criteria written in Given/When/Then become Gherkin feature files, which are linked to step definitions (code that implements the test logic), which are executed by a framework like Cucumber (JVM/Ruby), Behave (Python), or SpecFlow (.NET).

Writing Gherkin from acceptance criteria is a translation exercise, not a technical skill. Product owners should be able to read (and ideally write) Gherkin. Example conversion:

Feature: User Login

  Scenario: Successful login with valid credentials
    Given I am on the login page
    And I have a valid account with username "user@example.com"
    When I enter my credentials and click Login
    Then I should be redirected to the dashboard
    And I should see "Welcome back" in the header

  Scenario: Account lock after three failed attempts
    Given I am on the login page
    When I enter an incorrect password 3 times
    Then I should see "Account locked" message
    And I should not be able to attempt login again

Involving the PO in feature file review ensures the executable specification matches the intended behaviour. If the PO cannot read the Gherkin and confirm it represents the feature correctly, the feature file is too technical — rewrite it in plain language. BDD's value is the living documentation it produces, not the automation it enables (though that matters too).

Test Pyramid in Agile

The test pyramid (coined by Mike Cohn) defines the ideal balance of test types: many unit tests at the base, fewer integration tests in the middle, and a small number of end-to-end tests at the top. In Agile, this balance directly affects sprint velocity.

A team that inverts the pyramid — many slow E2E tests, few unit tests — suffers in Agile: slow CI pipelines mean developers wait 45 minutes for feedback on a code change. Flaky E2E tests create false failures that erode trust in the test suite. The CI pipeline becomes a bottleneck rather than an accelerator.

Unit tests (base): Fast, isolated, run in milliseconds. Developers write these. QA advocates for coverage of business logic branches, not just happy paths. Target: 70%+ of test volume.
Integration tests (middle): Test API contracts, database interactions, and service-to-service communication. QA typically owns these. Target: 20% of test volume.
E2E tests (top): Full user journey through the UI. Slow, fragile, expensive to maintain. Reserve for the 10–15 most critical user journeys that cannot be covered lower in the pyramid. Target: 10% of test volume.

In a sprint context, the practical rule: if a test can be written at a lower level of the pyramid, it should be. A login validation test that lives at the unit level provides the same coverage as an E2E login test, runs 1000x faster, and does not require a live environment.

Bug Workflow in Agile

Not every bug has the same lifecycle in Agile. Where a bug lives depends on its severity and when it is found:

Fix in sprint: P1 bugs (system crash, data loss, critical feature broken) found during the current sprint should be fixed before the sprint closes, even if that means reducing scope. A sprint that ships with a known P1 is a failed sprint.
Next sprint backlog: P2 bugs found late in the sprint that would require significant rework to fix in the current sprint. Add to the product backlog and prioritise for next sprint. Inform the PO for priority decision.
Tech debt backlog: Minor bugs, UX issues, performance edge cases that do not affect primary user journeys. Log, label as tech debt, and address in a dedicated tech debt sprint or alongside related features.

Severity classification in Agile context:

P1 — Blocker: Core user journey broken, no workaround. Sprint must not close without resolution.
P2 — Critical: Major feature broken, workaround exists but is painful. Address before next release.
P3 — Major: Feature partially broken, workaround is acceptable. Address within two sprints.
P4 — Minor: Cosmetic issue, edge case, low-impact. Log and address opportunistically.

From Amazon: Working on Alexa device firmware, we distinguished between "ship-blocking" and "known-issue" bugs with explicit criteria: any bug that affected the primary use case of the device (wake word detection, response playback, smart home control) was ship-blocking regardless of workaround existence. This clear delineation prevented the common pattern of bugs being classified as "acceptable workaround" to avoid sprint spillover. The classification criteria were agreed in the Definition of Done, not decided per-bug at end-of-sprint under time pressure.

Quality Metrics in Agile

Agile QA metrics should be actionable and sprint-cadenced — not annual or quarterly reports, but weekly indicators that guide next sprint decisions:

Test coverage %: Percentage of stories with documented acceptance criteria tested. Target 100% for in-scope stories.
Bug escape rate: Bugs found in production divided by bugs found in QA. Should trend toward zero. Any bug found in production is a QA process improvement opportunity.
Bugs found per sprint by severity: Track P1/P2/P3/P4 counts. A spike in P1 bugs suggests a systemic issue — rushed development, unstable environment, inadequate unit testing.
Automation coverage growth: Percentage of regression covered by automation, tracked sprint-over-sprint. Target: grow 5–10% per sprint in early automation maturity phases.
Mean time to detect (MTTD): Average time from a bug being introduced to it being found in QA. Lower MTTD comes from shift-left practices and faster CI runs.
Test execution time (CI pipeline): How long the full test suite takes to run. If this exceeds 15 minutes, developers are not running the suite before committing — invest in parallelisation or test optimisation.

Mob Testing

Mob testing extends the bug bash concept into a structured, facilitated session where the entire team tests together simultaneously. Unlike a bug bash (which is event-based and often pre-release), mob testing can be run at any point in the sprint and has a stronger facilitation structure.

Setup for a mob testing session:

One driver operates the keyboard and mouse, following the mob's direction
One navigator directs the driver — "navigate to the settings page, change the notification preference"
All others observe, note findings, and suggest next actions
Roles rotate every 15 minutes
All bugs are filed in real time by observers

Mob testing benefits: knowledge sharing is the primary value — junior team members observe expert testers' mental models in action; developers see how their features are used in ways they did not anticipate; product owners see real user journey friction they had not considered. The secondary benefit is bug finding — mob testing typically has a 20–30% higher bug-find rate per hour than individual testing on the same feature.

Waterfall QA vs Agile QA vs DevOps QA

Dimension	Waterfall QA	Agile QA	DevOps QA
When testing happens	After development phase completes — one test phase	Throughout the sprint — parallel with development	Continuously — automated testing in every CI pipeline run
Documentation	Extensive — formal test plan, test cases, sign-off documents	Lightweight — acceptance criteria, DoD, sprint test report	Minimal formal docs — test code is the documentation
Release cadence	Months — big-bang releases after full test cycle	Sprints (2 weeks) — potentially releasable each sprint	Continuous — multiple releases per day in mature teams
Automation reliance	Low — manual testing dominates	Medium-High — automation for regression, manual for new features	Very High — automation is the primary quality gate
Bug discovery timing	Late — bugs found weeks after code written	Fast — bugs found same sprint as development	Very fast — bugs found within hours of commit
QA's relationship to dev	Separate phase — QA receives a build from dev	Collaborative — QA and dev work in same sprint	Integrated — QA practices embedded in dev workflow

Best Practices for QA in Agile

Be in the Three Amigos: If your team does not run 3As sessions, advocate for them. QA participation at story conception reduces rework more than any other single practice.
Challenge every story without AC: No acceptance criteria means the story is not ready for development. Do not let it enter the sprint. This is a quality advocacy act, not obstruction.
Automate within the same sprint as development: If a story is developed in Sprint 5, the automation should be in Sprint 5 or at latest Sprint 6. Automation debt compounds rapidly — stories from six months ago are much harder to automate because the context is gone and the codebase has changed.
Make the Definition of Done visible: Post it on the team wiki, in the Jira board, in the team Slack channel. If developers do not know what "done" means, they cannot meet the standard.
Treat automation failures as P1 bugs: A failing automated test that the team ignores is worse than no automated test. Address failures immediately — triage as genuine failure (bug) or flakiness (engineering debt to fix).
Communicate in sprint, not at sprint end: If QA is blocked — environment down, story not testable, bug blocking further testing — surface it in standup, not in the sprint review. Surprises at sprint review indicate a communication process problem.

Back to Blog

From Experience — Virtusa: At Virtusa, I was embedded in Agile ceremonies from day one — sprint planning, daily standups, backlog refinement, and retrospectives. The most valuable QA contribution in sprint planning was consistently pushing back on stories with no acceptance criteria. A story without clear acceptance criteria is a story without a definition of done, and these invariably cause defect spikes in the final days of a sprint. Formalising a "no ticket enters sprint without AC" team norm eliminated an entire class of end-of-sprint scrambles.

Agile & BDD for QA Engineers — Scrum, Shift-Left & Quality Ownership

How QA Fits in Agile

Scrum Cycle — Architecture Diagram

Scrum Ceremonies for QA

Sprint Planning

Backlog Refinement

Daily Standup

Sprint Review

Retrospective

Acceptance Criteria — The QA Foundation

Given/When/Then format

Testable vs untestable AC

Definition of Done (DoD)

Shift-Left Testing

Three Amigos — The Pre-Sprint Alignment Session

BDD in Practice — Gherkin and Cucumber

Test Pyramid in Agile

Bug Workflow in Agile

Quality Metrics in Agile

Mob Testing

Waterfall QA vs Agile QA vs DevOps QA

Best Practices for QA in Agile

Related Articles

Software Testing Life Cycle

Test Design Techniques

Automation Testing Introduction