Data Extraction

Extract data, not headaches

No more squinting at PDFs with a spreadsheet open in a split screen. mapped's Multimodal Extraction Engine reads research papers — tables, figures, footnotes, scanned pages — and outputs structured, validated data straight into Google Sheets.

Updated April 2026

mapped's Data Extraction transforms research-paper PDFs into structured spreadsheets. The Multimodal Extraction Engine reads documents visually and textually — handling complex tables, multi-level headers, merged cells, footnoted annotations, split-page tables, landscape orientation, and scanned (OCR) PDFs. Every extracted value is flagged with a High / Medium / Low confidence tier and reviewed by a human before it enters your dataset. Extracted data flows live into Google Sheets for collaborative editing. Extraction accuracy improved by 15% in January 2026 with upgraded models.

Data Extraction

Systematic Data Collection

SPSarah
DEX – IVUS vs Angiography PCI
Step 5 / 6 · Extract & Review
D2
Egypt
Study ID
Author
Year
Country
Study Design
Total N
Follow-up
Bendary et al. 2024
Ahmed Bendary
2024
Egypt
RCT, single-centre
181
12 months
Chieffo et al. 2013
A. Chieffo
2013
Multinational
RCT, open-label
284
24 months
Diletti et al. 2026
R. Diletti
2026
European (multi)
RCT, event-driven
2020
19.0 months
Gao et al. 2021
X.-F. Gao
2021
China
RCT, multicentre
1448
3 years
Hong et al. 2015
S.-J. Hong
2015
Korea
RCT, multicentre
1400
1 year
IVUS-ACS 2024
X. Li
2024
China · Italy · Pakistan
RCT, multicentre
3505
1 year
Jakabčín et al. 2009
J. Jakabčín
2010
Czech Republic
RCT, prospective
210
18 months
Kim et al. 2013
J.-S. Kim
2013
Korea
RCT, multicentre
543
12 months
Kim et al. 2015
B.-K. Kim
2015
Korea
RCT, multicentre
402
12 months
Liu et al. 2018
X.-M. Liu
2018
China
RCT, single-centre
348
1 year
Tan et al. 2015
Q. Tan
2015
China
RCT, single-centre
123
2 years
Testa et al. 2026
L. Testa
2026
Multinational
RCT, open-label
806
2.9 years
Wan 2014 · Median/IQR → Mean/SDUnit conversion · mmol/L ↔ mg/dLConfidence 9699% · 12 studies · 2 flagged · 53 to review
© 2026 Mapped Technologies LLC. All rights reserved.
Multimodal Extraction Engine
Complex tables: multi-level headers, merged cells, split pages
Scanned PDF (OCR) support
Live Google Sheets sync
High / Medium / Low confidence-tiered human review
Custom extraction templates

Key Capabilities

Multimodal PDF Extraction

The Multimodal Extraction Engine reads PDFs visually and textually — not just OCR'd text. It understands that a number under a '95% CI' header is a confidence interval, not a p-value. Tables, figures, and supplementary text are processed as document structure, not flattened pages.

Complex Table Handling

Multi-level column headers map values to the right variable. Merged spanning cells apply correctly to rows beneath. Footnotes and asterisk annotations are captured with their explanations. Tables continuing across pages are unified. Landscape orientation works without configuration.

Scanned-Document Support

Older trials and grey literature often arrive as scanned PDFs. mapped's extraction pipeline includes OCR plus the multimodal engine, so a scanned 1995 RCT report extracts cleanly into the same structured template as a born-digital 2026 paper. Added January 2026.

Live Google Sheets Integration

Extracted data flows directly into Google Sheets. Your extraction team edits collaboratively in real time with Google's native version history, comments, and conditional formatting. Export to Excel, CSV, or PDF whenever you need a snapshot. No separate tooling, no learning curve.

Confidence-Tiered Human Review

Every extracted value carries a confidence tier. High = unambiguous, just confirm. Medium = extracted with some context ambiguity, careful review. Low = uncertain (often complex formatting), manual verification required. No value enters the final dataset without explicit human confirmation. The audit trail records who confirmed what, and when.

Custom Extraction Templates

Standardize what gets extracted across all studies: characteristics, population, intervention, comparator, outcomes, effect measures, quality indicators. Templates are project-level so the entire team extracts consistently — and adding a new field mid-review backfills it across already-extracted studies, not just new ones.

Proof

Every value, back to its source.

Source quote. Page reference. Confidence rating. Cited conversion method — yes, including Wan et al. 2014, Scenario 3. Every cell. Every time. Reviewer queries close themselves.

OutcomesCryoballoonDay 1 Haptoglobin (g/L)
0.77 ± 0.66

“Haptoglobin (median, g/L) for CBA group at Day 1 was approximately 0.8 (range bars approximately 0.3 to 1.2). Figure 1 text: 'Haptoglobin (median, g/L) P < 0.001 … CBA group at Day 1.'

Page 3, Figure 1Medium · 90%

Median (IQR) → Mean ± SD

Method: Wan et al. 2014, Scenario 3 (Q1/M/Q3)

Frequently asked questions

What is mapped's Data Extraction?
It's an AI-powered pipeline that turns research-paper PDFs into structured spreadsheets. The Multimodal Extraction Engine reads tables, figures, footnotes, and scanned pages, then routes every value through human confirmation with confidence-tiered review and live Google Sheets sync.
Does mapped handle scanned PDFs?
Yes. The pipeline runs OCR on scanned documents and feeds the result through the same multimodal engine. Older trials and grey literature extract into the same structured template as born-digital papers. This was added in the January 2026 release alongside a 15% accuracy improvement.
How does mapped compare to EPPI-Reviewer for extraction?
EPPI-Reviewer is a respected full-pipeline tool with strong qualitative-research features. mapped focuses on AI-multimodal PDF extraction with live Google Sheets collaboration and per-value confidence tiers — a smoother fit for medical and quantitative reviews. EPPI-Reviewer is stronger on qualitative coding; mapped is stronger on PDF-to-table accuracy and team collaboration.
Who is Data Extraction for?
Anyone extracting structured data from research papers — systematic reviews, meta-analyses, scoping reviews, evidence syntheses. Particularly valuable for teams of 2+ where collaborative editing matters, and for reviews with complex tables that take an hour each to transcribe manually.
How much does Data Extraction cost?
The free tier includes basic data extraction for one active project. The Mapped Project tier (list $119/project, currently $79 launch pricing) unlocks the full Multimodal Extraction Engine, scanned-PDF support, custom templates, and live Google Sheets sync. Custom Enterprise plans add unlimited projects. See mappedresearch.com/pricing.
Can I export extracted data to Excel or statistical software?
Yes. Extracted data lives in Google Sheets natively, and exports cleanly to Excel, CSV, or PDF. From there it flows into R, Stata, SPSS, or directly into mapped's own meta-analysis module without re-entry.

Comparing tools? See how mapped stacks up against EPPI-Reviewer on the workflow you actually run.

Mapped vs EPPI-Reviewer

Ready to get started?

Create your free account and begin your first systematic review.