Data Extraction

Transform research PDFs into structured, editable spreadsheets with AI-powered extraction, Google Sheets integration, and human validation.

Data Extraction

PDFs in. Structured spreadsheets out. No more manual copy-paste marathons.

Data extraction is the bridge between your included studies and your analysis. It's also one of the most error-prone steps in a systematic review — manually transcribing values from PDFs into spreadsheets introduces mistakes that propagate through your entire analysis. mapped's Multimodal Extraction Engine reads research papers the way a human would, and outputs structured, validated data.


How AI Extraction Works

mapped's Multimodal Extraction Engine processes PDFs visually and textually:

  1. Upload your included studies — PDF or full-text format
  2. Define your extraction template — specify which data points you need (study characteristics, interventions, outcomes, effect sizes, confidence intervals, etc.)
  3. AI extracts data — the engine reads the full document, including tables, figures, and supplementary text
  4. Review and validate — every extracted value is presented for human confirmation before entering your dataset
  5. Export to spreadsheet — validated data flows into your extraction table

The engine doesn't just scrape text — it understands document structure. It knows that a number in a table cell under "95% CI" is a confidence interval, not a p-value.


Complex Table Handling

Research papers are notorious for complex tables: multi-level headers, merged cells, footnotes with asterisks, and data split across multiple pages. mapped handles:

  • Multi-level column headers — correctly mapping values to the right variable
  • Merged cells — understanding that a spanning cell applies to all rows beneath it
  • Footnotes and annotations — capturing the symbols and their explanations
  • Split tables — tables that continue across multiple pages are unified
  • Landscape tables — orientation doesn't affect extraction

Google Sheets Integration

Extracted data flows directly into Google Sheets, enabling:

  • Real-time collaboration — your extraction team can review, edit, and validate data simultaneously
  • Version history — Google's automatic versioning means you never lose an edit
  • Familiar interface — no learning curve; it's the spreadsheet environment researchers already know
  • Formula support — add calculations, conditional formatting, or data validation rules
  • Export flexibility — download as Excel, CSV, or PDF at any time

Human Validation Loop

AI extracts. You verify. This is non-negotiable.

Every AI-extracted value is flagged with a confidence indicator:

ConfidenceWhat it meansAction needed
HighClear, unambiguous value in the sourceQuick confirmation
MediumValue extracted but context is ambiguousCareful review against source
LowUncertain extraction, possibly from complex formattingManual verification required

No value enters your final dataset without explicit human confirmation. This loop ensures that your meta-analysis input is clean, accurate, and defensible.


Extraction Template Builder

mapped provides a structured template builder so you can standardize what gets extracted across all studies:

  • Study characteristics — author, year, country, setting, study design
  • Population — sample size, demographics, inclusion criteria
  • Intervention details — type, dose, duration, delivery method
  • Comparator details — control group specifics
  • Outcomes — primary and secondary, measurement methods, timepoints
  • Effect measures — means, standard deviations, odds ratios, hazard ratios, confidence intervals
  • Quality indicators — funding source, conflict of interest declarations

Custom fields can be added for domain-specific data points. The template ensures consistency across your entire team.


Handling Supplementary Materials

Research increasingly publishes detailed data in supplementary files, appendices, and online-only materials. mapped's extraction engine processes supplementary PDFs alongside the main manuscript, ensuring no data is overlooked.


Why This Step Matters

"Garbage in, garbage out" applies directly to meta-analysis. A single mistyped standard deviation or incorrectly transcribed sample size can distort your pooled estimate. mapped's combination of AI extraction, human validation, and structured templates ensures that the data feeding your analysis is accurate, complete, and traceable back to its source.


Next step: With your data extracted, move to Quality Assessment →