ETH Zurich researchers have built MetaGraph, an open-source "Google for DNA" that consolidates nearly 600 million sequences (≈21 million GB) into a single searchable index. The system converts raw reads into error-corrected graphs and achieves an average compression of about 300×, allowing some very large datasets to be reduced from ~100 TB to ~10 GB. MetaGraph lets scientists query vast collections without downloading raw files, making searches fast and inexpensive; roughly half of public sequencing data is already indexed, with the rest expected by the end of 2025.
MetaGraph: ETH Zurich’s 'Google for DNA' Lets Scientists Search Nearly 600 Million Sequences

Similar Articles

How Scientists Recover DNA from Ancient Bones: From Bone Dust to Genetic Trees
Recovering DNA from ancient bones requires ultra-clean labs, careful sampling and precise chemical processing. Researchers dr...

DNA Breakthroughs in Idaho and Rachel Morin Cases Signal Shift — Experts Urge IGG as a First‑Line Investigative Tool
Overview: Othram founder David Mittelman says investigative genetic genealogy (IGG) has become fast and affordable enough to ...

SPT Labtech and Alithea Genomics Automate Single‑Cell RNA Workflows with MERCURIUS FLASH‑seq and firefly
SPT Labtech and Alithea Genomics have teamed up to automate single‑cell RNA sequencing by integrating Alithea’s MERCURIUS FLA...

How the Discovery of DNA Transformed Science, Medicine and Everyday Life
The discovery of DNA turned heredity into a molecular science by revealing the double helix and its code. Key contributors—Me...

AlphaFold2 at Five: How AI Transformed Protein Science
AlphaFold2, a deep-learning protein-structure predictor released in 2020, has transformed biological research by making relia...

Beyond DNA: How Exposomics Maps Lifetime Environmental Risks to Transform Medicine
Exposomics is an emerging field that catalogs the chemical, physical, social and biological exposures people accumulate over ...
Zuckerberg and Priscilla Chan Pivot Philanthropy to AI-Powered Biology, Backing Biohub’s 10,000‑GPU Plan
The Chan Zuckerberg Initiative is redirecting its philanthropic focus to AI-powered biology by concentrating support on Biohub, a collaborative laboratory network it has backe...

Ben Lamm: AI-Driven Synthetic Biology Could ‘Change Everything’ — Inside Colossal Biosciences’ Ambitious De‑extinction Plan
Ben Lamm and Colossal Biosciences have turned de‑extinction from a headline concept into a funded scientific program, raising...

Zuckerberg and Chan Redirect Most Philanthropy to Biohub, Betting on AI and Biology to Accelerate Disease Research
The Chan–Zuckerberg family is concentrating most of its philanthropic funding on Biohub, prioritizing AI-driven biology to ac...

Scientists Find 'Mutation Hotspots' at Gene Start Sites — A Hidden Source of Genetic Change
Researchers discovered mutation hotspots at transcription start sites (TSSs), where RNA polymerase opens DNA to begin transcr...
GRAPE: Geminivirus-Based Directed Evolution Platform Strengthens Crop Disease Resistance
Researchers led by Gao Caixia and Qiu Jinlong developed GRAPE (Geminivirus Replicon-Assisted in Planta Directed Evolution), a platform that links mutated gene variants to engi...

Quantum-Inspired Hack Removes Censorship From DeepSeek R1 and Cuts Model Size by 55%
Researchers at Spanish firm Multiverse say they used a quantum-inspired method called CompatifAI to prune and compress DeepSe...

Scientists Sequence Woolly Mammoth RNA for the First Time, Revealing Cellular Secrets of a 39,000‑Year‑Old Juvenile
Researchers have, for the first time, sequenced RNA from woolly mammoth remains using 10 Siberian specimens dated between abo...

How GPT-5 Is Accelerating Breakthroughs in Math, Physics and Cancer Research
The OpenAI-led report presents case studies showing GPT-5 assisting research in physics, math and biomedical labs: it reprodu...

Hitler's DNA Study Sparks Ethical Debate — What the Genome Can (and Can't) Tell Us
The Channel 4-linked project sequenced DNA from an 80-year-old blood-stained swatch believed to come from Hitler's Berlin bun...

From Concrete to Community: How Synthetic Data Makes Urban Digital Twins Human-Centered
Key idea: Synthetic data can close the human gap in urban digital twins by modeling residents’ movements and activities while...

AI Finds ~1,300 Hidden Brain "Neighborhoods" — A New High‑Resolution Map to Study Disease and Consciousness
Researchers at UCSF and the Allen Institute used a transformer-based model called Cell Transformer to map roughly 1,300 cellu...

Pinprick blood test could detect diseases up to 10 years before symptoms, new UK Biobank dataset suggests
The UK Biobank and Nightingale Health have released a final dataset of nearly 250 blood metabolites measured in about 500,000...

MAGIC: AI System Uncovers the Earliest Chromosomal Errors That Can Trigger Cancer
EMBL researchers developed MAGIC, an AI-powered platform that combines microscopy, machine learning and laser tagging to iden...

40,000‑Year‑Old RNA from Yuka the Mammoth Reveals Clues to Its Final Moments
Key points: Scientists sequenced the oldest-known RNA—from a 40,000‑year‑old juvenile mammoth named Yuka—using permafrost-pre...
