Exploratory Testing — Session-Based QA Guide

What is Exploratory Testing?

Exploratory testing is a simultaneous process of learning about the system under test, designing tests on the fly, and executing them — all in the same cognitive activity. It was defined by Cem Kaner as "a style of software testing that emphasises the personal freedom and responsibility of the individual tester to continually optimise the quality of his/her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project."

This is not ad-hoc testing. Ad-hoc has no structure, no accountability, and no documentation. Exploratory testing is structured — it uses charters, time-boxing, session notes, and debriefs. The difference is that test design happens during execution rather than weeks before, guided by what the tester learns in real time.

Exploratory testing fills the gaps that scripted test cases cannot reach. Scripts cover known risks. Exploratory testing surfaces the unknown unknowns — the bugs nobody thought to write a test case for because nobody imagined them possible.

When I tested Alexa device firmware at Amazon, scripted regression covered the happy paths and known edge cases. Exploratory sessions consistently surfaced the interactions nobody had thought to specify: what happens when you interrupt an in-progress Alexa response with another wake word? What does the device do when Wi-Fi drops during active audio playback? Those are the questions exploratory testing answers.

Scripted vs Exploratory vs Ad-hoc Testing

These three approaches are often conflated. They sit at very different points on the structure spectrum:

Scripted testing — Test cases are written in advance, typically in a TMS like TestRail or Zephyr. Steps, expected results, and pass/fail criteria are defined before a single line of code is written. Repeatability is the primary value: the same person (or a different person) can run the same test months later and get a comparable result.
Exploratory testing — Tests are designed and executed simultaneously, guided by a charter. The tester brings expertise, curiosity, and heuristics. Results are documented in session notes. Structure exists (charter, time-box, debrief) but the test design emerges from the exploration itself.
Ad-hoc testing — Unstructured, undocumented, undirected. A tester clicks around with no particular goal. Bugs may be found, but they cannot be reproduced systematically because there was no record of what was done. Ad-hoc is often mistakenly called "exploratory."

The key distinction: Scripted tests tell you if the system does what you expected. Exploratory tests tell you if the system does what you didn't expect. Both are necessary. At Virtusa, our mobile test strategy used scripted regression for release sign-off and exploratory sessions in the days before release to catch the issues the scripts missed. That combination consistently delivered higher bug-find rates than either approach alone.

Testing Approach Layers — Architecture Diagram

The flow below shows how a session-based exploratory test moves from requirements to a risk map:

Requirements

→

Charter Design

→

Explore Session

→

Observations

Bugs Found

Session Notes

→

Debrief

→

Risk Map

Each box represents a deliberate phase. The charter focuses the session. The explore phase generates raw data. Observations, bugs, and notes are captured in parallel. The debrief synthesises findings. The risk map records what areas still need attention — it becomes the input for the next session's charters.

Session-Based Test Management (SBTM)

Session-Based Test Management (SBTM) was developed by Jonathan and James Bach to bring accountability and metrics to exploratory testing. It addresses the core criticism of exploratory testing — "you can't measure it, you can't manage it" — by structuring sessions around three pillars:

The charter — a focused mission for the session. What area are you exploring? What resources do you have? What questions are you trying to answer?
Time-boxing — sessions run for a fixed duration, typically 60–90 minutes. This prevents open-ended wandering and creates a predictable unit of work that managers can plan around.
The debrief — a structured conversation between the tester and a test lead at the end of the session. It covers what was tested, what bugs were found, what risks remain, and what should be chartered next.

SBTM transforms exploratory testing from an invisible activity into a measurable one. You can track sessions per area, bugs per session, coverage against charters, and debrief-derived risk areas. This gives management the visibility they need while preserving the cognitive freedom that makes exploratory testing effective.

A typical SBTM session structure: 15 minutes setup, 60–70 minutes exploration, 15 minutes documentation, 20 minutes debrief. Total: 2 hours per session including debrief. A tester might run 2–3 focused sessions per day.

Writing Test Charters

The charter template: "Explore [area] with [resources/tools] to discover [information]."

Charters should be specific enough to focus attention but open enough to allow discovery. A charter that specifies every step is a test script. A charter that says "test the login page" is too vague.

Five real charter examples across different feature types:

Authentication: "Explore the password reset flow with valid and expired tokens using a mobile browser to discover any state management issues or UI inconsistencies."
E-commerce checkout: "Explore the checkout flow with multiple items, mixed in-stock and out-of-stock, using promo codes to discover how the system handles partial fulfilment and pricing edge cases."
Push notifications (mobile): "Explore notification delivery using background app state, low battery mode, and Do Not Disturb settings on Android 13 to discover any notification suppression or duplicate delivery issues."
API contract: "Explore the /orders endpoint with malformed JSON payloads, missing required fields, and oversized request bodies using Postman to discover how the API handles bad input — status codes, error messages, and server stability."
Alexa skill: "Explore Alexa multi-turn conversation handling with rapid follow-up intents during network latency to discover timeout behaviour, partial response delivery, and session context retention."

From Amazon: When testing Alexa-enabled devices, I used charter-based exploration specifically for network interruption scenarios. The device's behaviour during poor Wi-Fi — partial audio responses, queued commands, wake-word sensitivity changes — was too complex and context-dependent to fully cover with scripted tests. Charters like "explore device behaviour during Wi-Fi reconnection events across different Alexa command types" surfaced three separate bugs in a single 75-minute session that our 200+ automated tests had never caught.

Test Heuristics

Heuristics are mental models that guide exploratory testers toward fertile areas. They do not tell you what to test — they remind you what to think about.

SFDPOT (San Francisco Depot)

Developed by James Bach, SFDPOT covers six testing dimensions:

S — Structure: Everything that makes up the product — code, schema, files, data, hardware. What are the components and how do they connect?
F — Function: Things the system does — features, capabilities, user actions. What can a user do, and what happens when they do it?
D — Data: Everything the product processes — input data, output data, stored data. What are the boundaries, special characters, empty values, extremely long values?
P — Platform: The environment the system runs on — OS, browser, device, network, third-party services. What happens on older OS versions, different browsers, low-end devices?
O — Operations: How the system will be used in practice — installation, configuration, backup, upgrade, security. What happens when you upgrade mid-session?
T — Time: Anything related to time — timing, concurrency, deadlines, expiration, race conditions. What happens at midnight? After a session expires? When two users hit the same resource simultaneously?

HICCUPS

HICCUPS is a heuristic for checking common failure areas: Hardware, Interruptions, Configurations, Capacity, Users, Platforms, Software interactions. Particularly useful for embedded and IoT device testing where environmental factors dominate failure modes.

Goldilocks

Test with values that are too small, too large, and just right. For text fields: 0 characters, max allowed characters, max+1 characters, exactly 1 character, exactly the limit, and SQL injection attempts. For numbers: -1, 0, 1, max integer, max+1, floats when integers are expected.

Running an Exploratory Session — Step by Step

A well-run session has five distinct phases with specific time allocations for a 90-minute session:

Setup (10 min): Review the charter. Open the environment. Prepare your screen recorder (start it now, not after you find something interesting). Open a notes document — Markdown works well. Set a timer for 60 minutes of exploration.
Explore (60 min): Follow the charter. Begin with the most critical area first — if you run out of time, you will have covered the highest risk. As you explore, think aloud (this is especially valuable in paired sessions). Follow interesting threads but note when you are deviating from the charter so you can charter that thread separately.
Observe (ongoing during exploration): Notice unexpected behaviour, slow responses, UI glitches, inconsistent error messages, data that does not persist, interactions between features. Note everything — even things you are not sure are bugs.
Document (10 min): Immediately after exploration ends, write up your session report while it is fresh. Record: what you explored, what you found, what you did not get to, what questions remain.
Debrief (10–20 min): Meet with your test lead or a peer. Walk through your notes. Determine which observations are bugs, which need more investigation, and which should become new charters. The debrief is where individual findings become team knowledge.

Bug Bash Facilitation

A bug bash is a structured group exploratory testing event where the entire team — including developers, product managers, designers, and sometimes customer support — tests the product simultaneously for a defined period. It is one of the most effective ways to surface a wide range of bugs in a short time.

Setting up a bug bash

Duration: 2–3 hours is ideal. Shorter than 2 hours and the group does not fully settle in. Longer than 3 hours and focus drops sharply.
Environment: Use a dedicated test environment, ideally a copy of production data (anonymised). Have test accounts and sample data pre-prepared for every participant.
Scope: Define which features or areas are in scope. For a pre-release bug bash, focus on the features shipping in this release plus integration points with existing features.

Inviting non-QA participants

Non-QA participants find bugs that QA misses because they use the product differently. Developers test the code they just wrote — their assumptions are the same as the implementation. Product managers test against their own mental model of the feature — surfacing requirement interpretation gaps. Customer support tests the scenarios they know break (from customer tickets) — surfacing known pain points.

Brief non-QA participants on how to file bugs: steps to reproduce, expected vs actual, screenshot. Keep the filing process lightweight — if it is too formal, participants stop reporting.

Charter cards for non-QA participants

Hand each participant a charter card: a physical or digital card with a specific area to explore. This prevents 10 people all testing the same login page. Sample charters for a checkout feature bug bash: payment flow with international cards, order editing after placement, email confirmation accuracy, mobile cart behaviour on slow connection, accessibility keyboard navigation through checkout.

Debrief and de-duplication

End the bug bash with a 30-minute group debrief. Each participant shares their top 3 findings. QA leads de-duplicate: compare reports, merge duplicates, assign severity. Within 24 hours, every filed bug should be triaged and either accepted or rejected with a comment. Fast triage maintains participant morale for future bug bashes.

From Viasat: During network equipment testing, we ran quarterly bug bashes that included network operations engineers alongside QA. The operations team's knowledge of real-world network conditions — their intuitions about what traffic patterns stress the equipment — surfaced failure modes in simulated throughput degradation that purely functional QA had never found. Cross-functional bug bashes consistently deliver higher value than QA-only sessions.

Tools for Exploratory Testing

Exploratory testing is deliberately low-tool — the primary tool is the tester's mind. But a few utilities make sessions significantly more effective:

Screen recording: QuickTime (macOS), OBS Studio (cross-platform), or Loom. Start recording at the beginning of every session, not just when you find something. Video is invaluable for reproducing hard-to-repeat bugs and for debrief evidence.
Session notes in Markdown: A plain Markdown file in your text editor is all you need. Structure: Charter, Environment, Start Time, Observations (numbered list), Bugs (with brief repro steps), Notes/Questions. Plain text is fast to write during exploration.
Miro or physical whiteboards: For mind-mapping the test area before starting. Draw the feature, its inputs, its integrations, its states. This mental model makes the exploration more systematic.
Rapid Reporter: A lightweight Windows tool designed specifically for session-based exploratory testing. Records notes categorised as Test, Bug, Question, or Issue in real time with timestamps.
Browser developer tools: The Network tab for watching API calls during user actions, the Console for JavaScript errors, the Application tab for local storage and cookies. Non-negotiable for web exploratory testing.
Proxy tools (Charles Proxy, mitmproxy): Intercept and inspect network traffic between your mobile device and the server. Essential for API contract exploratory testing on mobile apps.

Documenting Findings — The Session Report

Every exploratory session should produce a session report. This is the accountability mechanism that distinguishes exploratory testing from ad-hoc. The report does not need to be long — a well-structured half-page is sufficient.

Session report template:

Charter: The exact charter used for this session
Tester: Your name (important for traceability)
Date & Duration: When and how long
Environment: Build version, device, OS, browser version
Bugs Found: Short description + bug tracker ID for each
Coverage Items: What you tested — the areas and scenarios actually explored
Risks / Open Questions: Areas you did not reach, questions needing answers, suspected issues needing more investigation
Notes: Anything else worth recording — observations that are not bugs but might become relevant

Tester credibility is built on documentation quality. A tester who files 10 well-documented bugs beats one who files 30 poorly-described bugs every time. Developers can act on clear reports; vague reports get rejected or ignored.

Paired Exploratory Testing

Paired exploratory testing — two testers working together on the same session — delivers disproportionate value relative to the resource cost.

The driver/observer model works as follows: the driver operates the keyboard and mouse, making decisions about what to test next. The observer watches, takes notes, and verbalises observations — "did you notice the loading spinner disappeared before the data appeared?" The roles swap every 20–30 minutes.

Benefits of pairing:

Knowledge transfer: A senior tester pairs with a junior; the junior observes expert heuristic thinking in action. This transfers tacit knowledge that documentation cannot.
Cognitive coverage: Two minds notice different things. One tester's blind spots are another's focus area.
Bug quality: Bugs found in pairs have better repro steps because two people witnessed the issue and can confirm the sequence of events.
Reduced false positives: "Did that just happen?" is answered immediately. One tester's misinterpretation is corrected in real time by the observer.

Pairing is especially valuable when testing new feature areas where neither tester has deep context, or when testing high-stakes areas before a major release.

When to Use Exploratory Testing

Exploratory testing delivers the most value in specific situations:

New feature release: Scripts do not exist yet. The feature's behaviour is not fully known. Exploratory sessions build the mental model that future scripts will be based on.
After major refactors: The code changed significantly but the feature behaviour should not have. Exploration discovers regression bugs that scripted tests miss because they only test what the tests were written for.
Before release: As a final sanity check after scripted regression passes. The feeling that "the scripts all passed but something is off" is a signal to charter exploratory sessions on the areas of concern.
When scripted tests pass but something still feels wrong: Trust the intuition. Charter it, explore it, document what you find.
New team member onboarding: Exploratory testing of a product is one of the fastest ways to build system knowledge. Pair a new QA engineer with a senior for exploratory sessions across key feature areas in their first two weeks.

Metrics for Exploratory Testing

SBTM provides the framework for tracking exploratory testing metrics at the session level:

Coverage items explored per session: How many distinct areas, scenarios, or states were exercised. Tracks breadth of exploration.
Bugs found per session: Absolute count. Track across sessions to see if exploration is still productive or has plateaued (suggesting coverage is sufficient).
Session-to-bug ratio: Total sessions divided by total bugs found. A declining ratio over time suggests the product is becoming more stable — or the exploration is becoming less deep.
Risk areas remaining: Charters identified but not yet executed. This is the exploratory testing backlog — it gives management visibility into what coverage is outstanding.
Debrief-identified charters: New charters generated from debrief discussions. A high number here indicates the exploration is productive — each session is generating new questions rather than confirming stable behaviour.

Scripted vs Exploratory vs Ad-hoc Testing

Dimension	Scripted Testing	Exploratory Testing	Ad-hoc Testing
Structure	High — pre-written steps and expected results	Medium — charter, time-box, debrief	None — click and hope
Documentation	Full — test cases, execution logs, pass/fail records	Session notes, session reports, debrief notes	None — findings may not be recorded
Repeatability	High — same tester or different tester, same result	Medium — charter is repeatable; approach varies	Low — hard to reproduce exactly
Bug-finding rate	Low in mature areas; high for regression	High — especially for edge cases and unknown unknowns	Variable — depends entirely on tester luck and intuition
Skill required	Low-Medium — follow the script	High — heuristics, system knowledge, adaptability	Low — no formal method required
Best for	Regression, compliance, audit-ready testing	New features, pre-release, complex interactions	Quick sanity checks, personal exploration

Best Practices for Exploratory Testing

Always write a charter: No charter means ad-hoc, not exploratory. Even a one-line charter transforms the session from random clicking into directed investigation.
Time-box every session: 60–90 minutes is optimal. Shorter and you do not reach depth; longer and cognitive fatigue reduces effectiveness. Use a timer — stop when it goes off.
Debrief within 30 minutes of the session: Memory fades fast. The observations that were clear at the end of the session become vague three hours later. Debrief immediately.
Record the screen from the first minute: Not after you find something interesting — from the very beginning. You will find something interesting in the first five minutes that you did not record.
Share findings immediately: Don't sit on bugs waiting for a formal report cycle. File immediately, notify the developer, share the screen recording. Fast feedback enables fast fixes.
Rotate charter areas across the team: If the same QA engineer always explores the same module, coverage reflects that person's blind spots. Rotate deliberately.
Track your charter backlog: Every debrief generates new charters. Maintain a prioritised list. The highest-risk uncharter areas should be the first sessions of the next test cycle.

Back to Blog

From Experience — Amazon: At Amazon's Device OS team, the bar for defect reports was exceptionally high. Every bug required the exact build number, reproduction rate (e.g. "3 out of 5 attempts"), full environment configuration, and a video wherever possible. It felt like overhead at first — but it meant any developer could pick up a ticket and reproduce the issue immediately without back-and-forth. During Alexa and Echo release cycles, this discipline directly reduced the triage loop from days to hours. A well-written bug report is not documentation overhead — it is the fastest path to a fix.

Exploratory Testing — Session-Based QA Guide

What is Exploratory Testing?

Scripted vs Exploratory vs Ad-hoc Testing

Testing Approach Layers — Architecture Diagram

Session-Based Test Management (SBTM)

Writing Test Charters

Test Heuristics

SFDPOT (San Francisco Depot)

HICCUPS

Goldilocks

Running an Exploratory Session — Step by Step

Bug Bash Facilitation

Setting up a bug bash

Inviting non-QA participants

Charter cards for non-QA participants

Debrief and de-duplication

Tools for Exploratory Testing

Documenting Findings — The Session Report

Paired Exploratory Testing

When to Use Exploratory Testing

Metrics for Exploratory Testing

Scripted vs Exploratory vs Ad-hoc Testing

Best Practices for Exploratory Testing

Related Articles

Manual Testing Basics

Test Design Techniques

Bug Life Cycle