Risk of Bias Assessment: Choosing Between RoB 2, ROBINS-I, NOS, and QUADAS-2

Risk of bias assessment is a mandatory step in any systematic review. It determines how much confidence you can place in the results of included studies. But with several tools available — each designed for different study designs — choosing the right one can be confusing.

This guide explains when to use each of the four major risk of bias tools and what each assessment involves.

Why Risk of Bias Matters

Systematic reviews aggregate evidence from multiple studies. If some of those studies have methodological flaws — inadequate randomization, unblinded outcome assessment, selective reporting — the pooled result can be misleading.

Risk of bias assessment identifies these flaws study by study. It feeds directly into:

GRADE assessment: Risk of bias is the first domain evaluated when rating the certainty of evidence
Sensitivity analyses: Excluding high-risk studies and checking whether results change
Subgroup analyses: Comparing effect estimates across low-risk and high-risk studies
Clinical decision-making: Guideline panels weight evidence partly on the basis of risk of bias

Skipping or superficially performing risk of bias assessment is one of the most common reasons reviewers receive major revisions from journals.

The Four Major Tools

RoB 2 — Randomized Controlled Trials

The Cochrane Risk of Bias tool version 2 (RoB 2) is the standard for assessing randomized controlled trials. It evaluates bias through five domains:

Bias arising from the randomization process: Was the allocation sequence random? Was allocation concealed?
Bias due to deviations from intended interventions: Were participants and personnel aware of the assigned intervention? Were there deviations that could affect the outcome?
Bias due to missing outcome data: Was outcome data available for all or nearly all participants?
Bias in measurement of the outcome: Could the outcome assessment have been influenced by knowledge of the intervention?
Bias in selection of the reported result: Were multiple outcome measurements, analyses, or timepoints available, and was the reported result selected from among these?

Each domain is judged as "Low risk," "Some concerns," or "High risk." The overall risk of bias for each study is the most severe judgment across domains.

Use RoB 2 when: Your included studies are randomized controlled trials (parallel-group, crossover, or cluster-randomized).

ROBINS-I — Non-Randomized Studies of Interventions

ROBINS-I (Risk Of Bias In Non-randomized Studies – of Interventions) assesses bias in non-randomized studies that estimate the effect of an intervention. It was updated to version 2 in November 2025.

ROBINS-I evaluates seven domains:

Bias due to confounding: Were important confounders measured and controlled?
Bias in selection of participants into the study: Was selection into the study related to both the intervention and the outcome?
Bias in classification of interventions: Was intervention status well defined and determined at the start of follow-up?
Bias due to deviations from intended interventions: Were there deviations from intended interventions that were unbalanced between groups?
Bias due to missing data: Was missing outcome data adequately handled?
Bias in measurement of outcomes: Was the outcome measurement influenced by knowledge of intervention status?
Bias in selection of the reported result: Was there selective reporting?

Each domain is judged as "Low risk," "Moderate risk," "Serious risk," or "Critical risk." ROBINS-I uses a hypothetical target trial as the benchmark — each study is compared to the RCT that would have ideally been conducted.

Use ROBINS-I when: Your included studies are non-randomized studies evaluating the effect of an intervention (cohort studies, controlled before-after studies, interrupted time series with a comparison group).

Newcastle-Ottawa Scale (NOS) — Cohort and Case-Control Studies

The Newcastle-Ottawa Scale is a simpler assessment tool for observational studies. It uses a star-based system across three broad categories:

Selection (maximum 4 stars): representativeness of the cohort, selection of the non-exposed cohort (or case definition and selection of controls), ascertainment of exposure
Comparability (maximum 2 stars): comparability of cohorts or cases/controls on the basis of design or analysis
Outcome/Exposure (maximum 3 stars): assessment of outcome (or exposure for case-control), follow-up duration and adequacy

A study can receive a maximum of 9 stars. Studies with 7–9 stars are generally considered good quality, 4–6 fair, and 0–3 poor.

NOS is less granular than ROBINS-I but faster to complete and widely accepted by journals. It has separate versions for cohort studies and case-control studies.

Use NOS when: You are assessing observational cohort or case-control studies, particularly when the review includes many studies and a faster assessment is needed, or when the journal specifically requests NOS.

QUADAS-2 — Diagnostic Test Accuracy Studies

QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies, version 2) assesses bias and applicability in studies evaluating diagnostic tests. It uses four domains:

Patient selection: Was a consecutive or random sample of patients enrolled? Was a case-control design avoided? Did the study avoid inappropriate exclusions?
Index test: Were the index test results interpreted without knowledge of the reference standard? Was there a pre-specified threshold?
Reference standard: Is the reference standard likely to correctly classify the target condition? Were results interpreted without knowledge of the index test?
Flow and timing: Was there an appropriate interval between the index test and reference standard? Did all patients receive the same reference standard? Were all patients included in the analysis?

Each domain is assessed for both risk of bias and applicability concerns (except flow and timing, which is risk of bias only).

Use QUADAS-2 when: Your systematic review evaluates the diagnostic accuracy of a test (sensitivity, specificity, SROC curves). This aligns with DTA (Diagnostic Test Accuracy) study types.

Decision Tree: Which Tool Should You Use?

What type of studies does your review include?
│
├─ Randomized controlled trials
│  └─ Use RoB 2
│
├─ Non-randomized studies of interventions
│  ├─ Need detailed, domain-level assessment?
│  │  └─ Use ROBINS-I
│  └─ Need a faster, star-based assessment?
│     └─ Use NOS
│
├─ Diagnostic accuracy studies
│  └─ Use QUADAS-2
│
└─ Mixed study designs
   └─ Use the appropriate tool for each design
      (e.g., RoB 2 for RCTs + NOS for cohort studies)

Presenting Risk of Bias Results

Traffic Light Plots

Traffic light plots show the risk of bias judgment for each domain of each study using color-coded cells (green for low risk, yellow for some concerns, red for high risk). They provide a study-level overview and are standard in Cochrane reviews.

Summary Bar Charts

Summary bar charts show the proportion of studies at each risk of bias level for each domain. They provide a quick overview of the overall risk of bias across the review.

Both visualizations are expected by most journals and are required by Cochrane.

ROBINS-I Version 2 Updates (November 2025)

The updated ROBINS-I V2 includes several important changes:

Refined signaling questions with clearer guidance
Improved handling of time-varying confounding
Better alignment with target trial emulation frameworks
Clearer guidance on distinguishing "moderate" from "serious" risk

If you are starting a new review in 2026, use ROBINS-I V2 rather than the original 2016 version.

Risk of Bias Assessment in mapped

mapped supports all four tools with guided assessment workflows:

RoB 2: Domain-by-domain assessment with signaling questions for each included RCT
ROBINS-I: Full seven-domain assessment with ROBINS-I V2 signaling questions
NOS: Star-based assessment with separate forms for cohort and case-control studies
QUADAS-2: Four-domain assessment with both risk of bias and applicability judgments

The tool is automatically selected based on your study type. Pairwise reviews default to RoB 2, DTA reviews use QUADAS-2, and prognostic reviews can use NOS or ROBINS-I. You can also assess studies with mixed designs using different tools within the same project.

Results are visualized as traffic light plots and summary bar charts, and they feed directly into the GRADE assessment step.