MetaGraph: ETH Zurich’s 'Google for DNA' Lets Scientists Search Nearly 600 Million Sequences

Nov 24, 2025•3 min read

ETH Zurich researchers have built MetaGraph, an open-source "Google for DNA" that consolidates nearly 600 million sequences (≈21 million GB) into a single searchable index. The system converts raw reads into error-corrected graphs and achieves an average compression of about 300×, allowing some very large datasets to be reduced from ~100 TB to ~10 GB. MetaGraph lets scientists query vast collections without downloading raw files, making searches fast and inexpensive; roughly half of public sequencing data is already indexed, with the rest expected by the end of 2025.

DNA sequencing has revolutionized our understanding of cancer, neurodegenerative disorders and many other conditions — but it has also produced a flood of data. Public archives now contain petabytes of raw reads, making large-scale search and comparative analysis slow, expensive and technically challenging. Researchers at ETH Zurich have developed MetaGraph, a searchable index that consolidates vast DNA and RNA datasets into a single, efficient resource to tackle this problem.

What MetaGraph is

MetaGraph is an open-source, full-text searchable index that brings together nearly 600 million distinct sequences and roughly 21 million gigabytes (~21 PB) of sequencing data. Described by Professor Gunnar Rätsch of ETH Zurich as a "Google for DNA," the project is presented in a paper in Nature and aims to make massive sequence collections quickly queryable without requiring users to download terabytes of raw files.

How it works

The system converts raw read data into error-corrected, refined graphs and merges them into a unified index. By organizing sequence data and metadata with advanced mathematical graph structures and removing redundancies, MetaGraph achieves dramatic compression — on average about 300×, with some datasets reduced far more (for example, the team reports compressing certain ~100 TB collections down to ~10 GB). The index preserves searchability while shrinking storage needs substantially.

What’s included

The indexed material spans viruses, bacteria, fungi, plants, microbes and human sequences, including human gut metagenomes and metazoan samples, plus raw metagenomic datasets. About half of the world’s publicly available sequencing data is already indexed, and the team expects the remainder of public collections to be online by the end of 2025.

Practical benefits

Instead of downloading large datasets before searching them, researchers can query the compressed index directly. This reduces time, bandwidth and storage costs: individual queries can execute for a few cents, and the full public index can fit on a handful of hard drives with estimated infrastructure costs on the order of $2,500. MetaGraph is designed to scale so that search performance remains high as the archive grows.

Who will use it and why it matters

MetaGraph is open source and intended for a broad audience — academic researchers, pharmaceutical companies, educators and potentially private users. As Dr. André Kahles of ETH Zurich’s Biomedical Informatics Group noted, search engines often find unexpected uses; as sequencing becomes cheaper and routine, tools like MetaGraph could enable everyday applications such as quickly identifying plant species or tracking antimicrobial-resistance genes.

Examples and next steps

Faster, cheaper search could accelerate workflows that rely on large-scale comparisons, from mapping viral genomes (as in SARS-CoV-2 surveillance) to evolutionary studies. The MetaGraph project provides an Open Data repository and web examples that allow users to try queries and view visualizations of proteins and resistance genes.

Bottom line

MetaGraph lowers the barrier to working with enormous sequencing archives by compressing data into a searchable index that preserves utility while cutting cost and time. By making these resources easier to explore, the platform could speed discovery across genetics, infectious disease research and biodiversity studies.

Help us improve.

MetaGraph: ETH Zurich’s 'Google for DNA' Lets Scientists Search Nearly 600 Million Sequences

Trending

Related Articles

Biological Time Capsules: How Cave Sediment DNA Is Rewriting Human and Neanderthal History

This Week in Science: Fungal Anti‑Cancer Breakthrough, Mosquito DNA Reveals 86 Species, and a Golden Shark

Jülich Team Aims To Simulate a Human Brain on the JUPITER Exascale Supercomputer

New Low-Cost Imaging Reveals Microscopic Fiber Networks Throughout the Human Body

Jurassic Park Was Closer to Reality Than You Think: Mosquitoes Can Carry Extensive Libraries of Animal DNA

SPARDA Uncovered: A Bacterial Self‑Destruct System That Could Power Next‑Gen Diagnostics

Trending

Rights Group: Iran Protest Death Toll Tops 3,300 as Leaders Trade Blame

Khamenei Says ‘Thousands’ Died in Iran Protests, Blames Trump as Crackdown and Internet Blackout Continue

Wife Of Former U.S. Detainee Freed After More Than A Year In Venezuelan Prison

Oxfam: Billionaire Wealth Soars to Record $18.3T in 2025, Deepening Political Influence

New York Times: Kash Patel Directed FBI To Hunt For Dirt On Trump Critics

More Than 20 Feared Dead After High-Speed Train Derails and Collides in Southern Spain

Limited Internet Briefly Restored in Iran Amid Deadly Protest Crackdown

DOJ Shift Toward 'Reverse Discrimination' Claims Spurs Turf Battle Over Civil Rights Enforcement

Rare 'Puffy' Baby Planets Around V1298 Tau Provide a Missing Link in Planet Formation

Home Security Audio Turned an 'Accident' Into Murder: How a Wife’s Final Minutes Helped Convict Her Husband

Solved: JWST's "Little Red Dots" Are Young, Gas‑Shrouded Black Holes

Mars Might Have Had an Arctic-Sized Ocean — Ancient Sea Level Reconstructed

Iran’s Leadership Vacuum: Why There’s No Clear Successor to Khamenei

'Enormous Pain In My Heart': Mass Evictions Threaten Families Near Al-Aqsa In East Jerusalem

The White House vs. Blue America: Trump’s Threats to Cut Funding and Deploy Federal Forces

Slotkin Says Trump Is Using an 'Authoritarian' Playbook to Intimidate Critics — She Defends Video Urging Troops Not To Follow Illegal Orders

US-Backed Palestinian Committee Publishes Mission Statement for Gaza Reconstruction

911 Transcripts, Video and Reports Show ICE Agent Fired At Point-Blank Range, Killing Mother of Three

Filming ICE Is A Civic Duty: Why Video Evidence Matters After the Killing of Renée Nicole Good

Trump’s Second-Term Clemency Surge — A Rare Double Pardon and a Broad Sweep of Pardons

Artemis II Rocket Reaches Launch Pad as Final Tests Begin Ahead of First Crewed Lunar Flight in 50+ Years

Haitian Forces Launch Major Operation in Port-au-Prince, Target Gang Leader's Home