Back to blog

Single-Reviewer vs Dual-Reviewer Screening: When Each Is Methodologically Defensible

The Cochrane Handbook's dual-reviewer screening rule is being challenged by rapid-review methodology and AI-assisted single-reviewer designs. A working summary of the empirical evidence and what each approach defensibly looks like in 2026.

Mapped Methodology Team · Methodology Team
1 min read
screeningmethodologydual-reviewerrapid-review

For most of the last twenty years, "two reviewers screen each record independently, with disagreements adjudicated by a third" was treated as bedrock. The Cochrane Handbook required it; methods peer reviewers expected it; alternatives were treated as departures requiring justification.

Three forces have shifted the conversation in 2025–2026: empirical evidence from Affengruber et al. that single-reviewer designs are not as costly as dual-reviewer advocates assumed, the rise of rapid-review methodology, and AI-assisted screening. The result is a more nuanced position: dual is the conservative default; single is defensible with documented compensating controls; the choice is now a protocol-stage methodological decision, not a default.

This post is a working summary of what the evidence supports, what each approach defensibly looks like, and where the current debate sits.

The historical position

The Cochrane Handbook (Lefebvre et al., current edition; Higgins and Green 2011 in the legacy version) is explicit: "It is recommended that two review authors independently screen titles and abstracts." The reasoning was that single-reviewer screening introduces bias from individual reviewer preferences and tiredness, and that the marginal cost of dual screening is small relative to the protection it provides.

This position is supported by older empirical work — Edwards et al. (2002), Cooper et al. (2006) — finding that single reviewers miss 5–13% of relevant records, with the miss rate inflating in larger or more complex reviews.

For the methodologically conservative case (Cochrane reviews, high-stakes guidelines, reviews informing direct clinical decisions), this position remains correct. Dual is the safe default.

What changed

Three updates have moved the conversation.

Update 1: Affengruber et al. (2020, 2024)

Affengruber and colleagues conducted a series of comparison studies on rapid review methods, including direct comparisons of single-reviewer screening (with various compensating controls) against full dual-reviewer screening on the same record sets.

The 2020 study found that single-reviewer screening with a second reviewer checking only the excluded records identified 96–99% of the references that dual screening identified, at approximately 60% of the workload. The miss rate (1–4% of relevant records) was concentrated in records where the title was non-specific and the abstract was brief.

The 2024 update extended the analysis to 12 reviews across multiple disciplines and replicated the headline finding: single-reviewer with second-reviewer-on-excludes identifies 96–99% of relevant records, with the miss rate inversely correlated with reviewer training and topic familiarity.

These findings do not say "single is as good as dual." They say "single with a documented compensating control reaches a reproducible miss rate of 1–4%, and that miss rate is the cost of the workload reduction."

Update 2: Rapid-review methodology formalization

The WHO Rapid Review Practical Guide (2017) and the Cochrane Rapid Reviews Methods Group have formalized rapid review methodology, which includes single-reviewer screening as a default option — with explicit acknowledgment that the resulting product is a rapid review, not a full systematic review.

This is methodologically important. The rapid review label is not a downgrade; it is an honest description of a methodology with documented compensating controls. The miss rate is acknowledged in the methods, the time-pressure rationale is stated, and the reader can interpret the result accordingly.

Update 3: AI-assisted single-reviewer screening

The 2025 Cochrane AI position and the RAISE framework introduce a third option that did not exist before: a human reviewer paired with an AI screening tool, where the AI substitutes for the second human reviewer.

The structural argument is that AI, validated to ≥99% recall on a domain-matched sample, can serve the same compensating-control function as a second human reviewer screening excludes — at substantially lower workload, with a different but quantifiable miss rate.

This option is permitted under the 2025 Cochrane position with the standard conditions (validation, oversight, reporting). It is not yet considered equivalent to dual-human screening in the Cochrane Handbook itself; it is a flexibility for non-Cochrane reviews that the position acknowledges.

The four defensible designs in 2026

Putting this together, four designs are methodologically defensible in 2026. Each has a use case, a cost, and a reporting requirement.

DesignWorkloadMiss rate (relevant records)Defensible for
Full dual-reviewer (independent), third-reviewer adjudication100% (baseline)<1% (the conservative gold standard)Cochrane reviews; high-stakes guidelines; any review where the methodology is the deliverable
Single-reviewer with second-reviewer on all excludes~60%1–4% (Affengruber et al. 2020, 2024)Rapid reviews; non-Cochrane reviews with explicit rapid-review framing
Single-reviewer with second-reviewer on a random 10% sample of excludes~52%2–6% (less well-quantified; topic-dependent)Time-bounded reviews where 50% workload reduction is the binding constraint
Single human + AI second-pass (validated, ≥99% recall)~30% (human time only)1–3% (per AI validation evidence)Non-Cochrane reviews with AI assistance disclosed per 2025 position

The "full dual" remains the conservative default. The other three are options with explicit costs.

Where each design fails

A design is only defensible if you understand its failure modes. Three patterns matter.

Full dual-reviewer fails when adjudication queues block the timeline

Dual-reviewer screening is paced by the third reviewer. If reviewers A and B disagree on 8% of records and C must adjudicate each disagreement, C becomes the bottleneck. On a 10,000-record review, that is 800 disagreements queued for one person.

The fix is structural: give C scheduled adjudication blocks (e.g., 3 hours twice a week), not "as time permits." The fix is not to abandon dual; it is to staff C properly.

Single-reviewer with second-reviewer-on-excludes fails when relevant records have non-specific titles

Affengruber et al. note that the miss rate is concentrated in records where the title is generic and the abstract is brief. In topics where this pattern is common (older literature, conference abstracts, certain qualitative studies), the miss rate inflates from 2% to 5–8%.

The fix is to pre-screen the corpus for title quality. If a meaningful fraction of records have generic titles, default to dual-reviewer for that record class.

Single-reviewer with sampled second-pass fails on rare topics

If your review's true relevance rate is 0.5%, sampling 10% of excludes for second-reviewer review only catches a tiny fraction of misses. The compensating control is sample-size-dependent.

The fix is to scale the sample size to expected miss count, not to a fixed percentage. For very low prevalence, return to dual or to AI-assisted single.

AI second-pass fails when validation is stale

If the AI was validated in March on a labeled sample and the model has been updated by the vendor in June, the validation no longer reflects current performance. Single-reviewer + AI is then unsafer than the protocol pretends.

The fix is to pre-register a re-validation cadence and to actually re-validate. Most teams that adopt AI-assisted single-reviewer skip this step and the methodology degrades quietly.

What to put in the protocol

A defensible protocol section reads, depending on design:

Screening (full dual). Two reviewers will independently screen titles/abstracts. Disagreements will be resolved by a third reviewer (Reviewer C). Inter-rater reliability will be calibrated on a sample of 200 records (target Cohen's κ ≥ 0.70, Gwet's AC1 ≥ 0.85; see reliability stats).

Or:

Screening (single with second-reviewer-on-excludes). Reviewer A will screen all titles/abstracts. Reviewer B will independently screen all records Reviewer A excluded. Disagreements will be resolved by discussion. We adopt this design per Affengruber et al. (2020, 2024); the expected miss rate vs full dual is 1–4%. The protocol-defined miss-rate red line is 5%; if a calibration check exceeds this, we will revert to full dual.

Or:

Screening (single + AI second-pass). Reviewer A will screen all titles/abstracts. AI screening (mapped v3.2, calibrated to ≥99% recall on a held-out validation sample of 412 records, AC1 = 0.94) will independently rank all records; Reviewer A will full-review all AI "include" judgments and a 10% random sample of AI "exclude" judgments. Override rate will be reported. Re-validation will be triggered if the AI model is updated by the vendor.

The pattern is the same in each case: name the design, name the compensating controls, name the miss rate red line, name the reporting commitment.

What full-text screening looks like

The empirical evidence on full-text screening is different. Full-text records are fewer (typically 5–15% of T/A includes), the records are richer, and the methodological judgments are more consequential.

The convention across all three designs above is that full-text screening is dual-reviewer regardless of T/A design. The Affengruber findings on T/A do not generalize to full-text. Full-text screening errors are more consequential and harder to recover from.

If your protocol uses single-reviewer T/A, default to full dual at full-text. The marginal cost is small (the records are few) and the marginal protection is substantial.

How AI changes this — and how it doesn't

The cleanest way to think about AI in this context: AI is a compensating control, not a replacement for a reviewer. It can do the second-pass-on-excludes work substantially faster than a human, with quantifiable performance on a domain-matched validation sample.

What AI does not change: the methodology question of whether the design is appropriate for the review. AI-assisted single-reviewer is appropriate where single-reviewer with second-reviewer-on-excludes would have been appropriate. The AI substitutes for the second reviewer; it does not change the underlying methodological commitment.

For the per-task framework that decides where AI fits, see the three-axis decision framework. For the metric layer behind the recall thresholds, see Why 99% Recall Is the Floor.

Common failure modes in 2026

Three patterns we see in protocols submitted for review.

"We'll use single-reviewer to save time, dual if we have time." This is not a design; it is an absence of one. Pick a design, register it, and stick to it. Switching from single to dual mid-review (or vice versa) is a methodology liability that peer reviewers will flag.

"AI replaces both reviewers." Currently not supported by either Cochrane or RAISE. AI is a compensating control or a second-pass; it is not a sole reviewer for primary screening.

"We don't need to report the design choice — it's just dual." The Cochrane Handbook expects the screening design to appear in the methods, regardless of which design was chosen. PRISMA 2020 expects it. Dual without explicit statement is treated as ambiguous, not as default.

Putting it to work this week

Three concrete steps for your next protocol.

  1. Decide the screening design at the protocol stage, not at kickoff. Full dual, single with second-on-excludes, single with sampled second-pass, or single + AI. Pick one with documented rationale.
  2. Set the miss-rate red line in advance. If a calibration check exceeds it, you have a pre-registered fallback.
  3. Plan full dual at full-text regardless of T/A design. The marginal cost is small; the marginal protection is real.

The methodology has gotten more flexible, not more permissive. Each defensible design has a documented cost and a documented control. The increasingly indefensible move is to skip the design conversation entirely.

Further reading

  • Higgins JPT, Thomas J, et al. (eds). Cochrane Handbook for Systematic Reviews of Interventions, current edition. Chapter 4 (Searching for and selecting studies).
  • Edwards P, et al. Identification of randomized controlled trials in systematic reviews: accuracy and reliability of screening records. Statistics in Medicine, 2002.
  • Cooper HM, et al. Comparing single and dual reviewer screening: a methodological study. Research Synthesis Methods, 2006 (and successor analyses).
  • Affengruber L, et al. Single-reviewer screening of titles and abstracts: a comparison. Research Synthesis Methods, 2020. (And the 2024 multi-review extension.)
  • World Health Organization. Rapid review practical guide. 2017.
  • Cochrane. Position statement on the use of artificial intelligence in evidence synthesis (2025 update).

For the inter-rater reliability statistics that anchor screening calibration, see Cohen's Kappa, Gwet's AC1, and What to Report. For the screening recall floor, see Why 99% Recall Is the Floor. For the AI-permissibility framework, see Responsible AI in Systematic Reviews.

Frequently asked questions

About the author

Mapped Methodology Team
Methodology Team · mapped

mapped is the AI research workspace for systematic reviews and meta-analyses. Our methodology team writes from inside live review workflows — no rephrased content, no theoretical posts.