Genome Statistics

Genome research links DNA variation to health from inherited risk to treatment decisions, spanning conditions like cancer and other complex diseases. It uses methods such as genome-wide association studies, pharmacogenomics, and prenatal whole-genome sequencing to compare variants across people. The page also explores non-coding regions, structural variants, and copy number changes—plus how the human genome was mapped and quantified—so you can see how findings translate into medical care.

Key Takeaways

Genome-wide association studies link 7,000 SNPs to disease risk
Pharmacogenomics identifies 300 actionable variants for 100+ drugs
Prenatal whole-genome sequencing detects 13% more pathogenic variants than microarrays
The human genome contains an estimated 20,000-25,000 protein-coding genes
Non-coding RNAs number over 20,000 in the human genome including lncRNAs and miRNAs
Pseudogenes in humans total around 14,000, mostly processed pseudogenes
The common single nucleotide polymorphisms (SNPs) number over 10 million in the human genome with minor allele frequency >1%
Structural variants (SVs) affect 20-50 kb per individual, totaling 1-2% of genome difference
Copy number variations (CNVs) cover 12% of the human genome across populations
The human genome contains approximately 3.2 billion base pairs of DNA sequence
The haploid human genome size is measured at 3,054,815,472 base pairs in the GRCh38.p14 assembly
Eukaryotic genomes like humans have linear chromosomes, with 22 autosomes and 2 sex chromosomes totaling 24 unique chromosomes
Human Genome Project officially completed in 2003 with 99% coverage at 1x depth
The first human genome sequence cost $2.7 billion and took 13 years
Illumina HiSeq platform enabled 100x coverage human genomes for under $1,000 by 2015

From 7,000 disease linked SNPs to 1000 Genomes scale sequencing, modern genomics is accelerating actionable insights for care.

01 · Category

Applications And Impacts20 stats

Genome-wide association studies link 7,000 SNPs to disease risk

Pharmacogenomics identifies 300 actionable variants for 100+ drugs

Prenatal whole-genome sequencing detects 13% more pathogenic variants than microarrays

Cancer precision medicine matches therapies to mutations in 30% of advanced cases

Polygenic risk scores predict 10-20% lifetime risk for coronary artery disease

CRISPR-Cas9 gene editing corrected 80% of sickle cell mutations in stem cells

Non-invasive prenatal testing (NIPT) screens 99.9% of trisomy 21 cases from cell-free DNA

Carrier screening panels detect 85% of cystic fibrosis carriers in Caucasians

Whole genome sequencing reduces neonatal ICU diagnosis time from months to days in 40% cases

Forensic DNA phenotyping predicts eye color with 90% accuracy from SNPs

Genomic selection in cattle breeding increased milk yield by 100 kg/year

Bt corn genome editing reduced pesticide use by 37% globally

Human genome editing trials for HIV cure edited CCR5 in 12 patients safely

Ancestry DNA tests trace 80% of Ashkenazi Jewish ancestry accurately

Metagenomics sequenced 200,000 microbial genomes from human gut microbiome

AlphaFold predicted structures for 200 million protein sequences from genomes

Liquid biopsy ctDNA detects 87% of stage I cancers via genome sequencing

Gene drive mosquitoes edited genomes reduced malaria vectors by 99% in trials

Direct-to-consumer genetic testing reached 30 million users by 2023

Genome editing in rice increased yield by 20% via promoter swaps

Interpretation

Applications And Impacts Interpretation

Across Applications and Impacts, genome tools are delivering tangible clinical gains, from linking 7,000 SNPs to disease risk and matching therapies in 30% of advanced cancers to improving prenatal variant detection by 13% and editing 80% of sickle cell mutations in stem cells.

02 · Category

Gene Content24 stats

The human genome contains an estimated 20,000-25,000 protein-coding genes

Non-coding RNAs number over 20,000 in the human genome including lncRNAs and miRNAs

Pseudogenes in humans total around 14,000, mostly processed pseudogenes

The average human gene spans 27 kb with 9 exons on average

Histone genes in humans number about 100, clustered on chromosomes 1 and 6

Olfactory receptor genes total 391 functional in humans, part of 800+ gene family

MHC genes on chromosome 6 number over 200, highly polymorphic

HOX gene clusters in humans consist of 39 genes across 4 clusters

Immunoglobulin genes on chromosome 14 total hundreds in variable/diversity/joining segments

T-cell receptor genes number over 100 loci across multiple chromosomes

G-protein coupled receptors (GPCRs) genes total 816 in humans

Kinase genes number approximately 518 in the human kinome

Zinc finger genes exceed 700 in humans, largest transcription factor family

Cytochrome P450 genes total 57 functional in humans

Collagen genes number 28 in humans

The fruit fly genome encodes about 14,000 protein-coding genes

Arabidopsis has 27,655 protein-coding genes

Yeast S. cerevisiae has 6,300 genes, 5,500 protein-coding

E. coli has 4,300 genes, mostly protein-coding

Mouse genome has 22,000 protein-coding genes

The rice genome encodes 41,000 genes

Wheat genome has ~110,000 genes due to polyploidy

Chimpanzee genome has ~19,000 protein-coding genes

C. elegans has 20,400 protein-coding genes

Interpretation

Gene Content Interpretation

Gene content in the human genome is dominated by complexity beyond protein coding, with 20,000 to 25,000 protein genes paired with more than 20,000 non coding RNAs and about 14,000 pseudogenes.

03 · Category

Genetic Variation20 stats

The common single nucleotide polymorphisms (SNPs) number over 10 million in the human genome with minor allele frequency >1%

Structural variants (SVs) affect 20-50 kb per individual, totaling 1-2% of genome difference

Copy number variations (CNVs) cover 12% of the human genome across populations

The nucleotide diversity π in humans is 0.001 between any two individuals

African populations have 60% higher SNP density than non-Africans

HLA region shows highest polymorphism with over 20,000 alleles cataloged

Mobile element insertions vary by 1,500 events per individual genome

Inversions larger than 1 kb occur at 12,000-15,000 per diploid genome

Microsatellite repeat variations contribute to 3% of human genetic diversity

Somatic mutations in cancer genomes average 100-1,000 per tumor exome

The chimpanzee-human divergence is 1.23% at aligned sites

Archaic admixture from Neanderthals contributes 1-2% of non-African genomes

Denisovan DNA admixture up to 5% in some Oceanian populations

Rare variants (<0.1% MAF) constitute 86% of SNPs in human populations

gnomAD database catalogs 676,000 exomes with 3.1 million loss-of-function variants

Population bottleneck reduced human diversity to 10,000 individuals ~70,000 years ago

Fst genetic differentiation between continents averages 0.11

GWAS identified 12,000 trait-associated loci across 3,300 traits

Polygenic risk scores explain up to 20% heritability for height in Europeans

CRISPR off-target mutations occur at rates below 0.1% in edited genomes

Interpretation

Genetic Variation Interpretation

Genetic variation across humans is strikingly widespread, from over 10 million common SNPs and CNVs spanning about 12% of the genome to HLA polymorphism exceeding 20,000 alleles, while nucleotide diversity stays tightly constrained at π 0.001 between any two individuals, and African populations show 60% higher SNP density than non-Africans.

Data Science AnalyticsTop 10 Best Biostatistics Software of 2026

04 · Category

Genome Size And Structure29 stats

The human genome contains approximately 3.2 billion base pairs of DNA sequence

The haploid human genome size is measured at 3,054,815,472 base pairs in the GRCh38.p14 assembly

Eukaryotic genomes like humans have linear chromosomes, with 22 autosomes and 2 sex chromosomes totaling 24 unique chromosomes

The total length of the human genome is about 6.4 billion base pairs when considering the diploid state

Mitochondrial DNA in humans contributes an additional 16,569 base pairs to the total genomic content

The largest human chromosome, chromosome 1, spans 249 million base pairs

Chromosome Y in humans is the smallest, with about 59 million base pairs

Introns make up roughly 25% of the human genome, while exons constitute about 1.5%

Repetitive DNA elements occupy over 50% of the human genome, including LINEs and SINEs

The human genome has approximately 1.5% of its sequence coding for proteins

Centromeric regions in the human genome total around 4-5% of the chromosomal length

Telomeres in human chromosomes consist of TTAGGG repeats averaging 5-15 kb in length

Heterochromatin comprises about 30% of the human genome, often gene-poor

The effective genome size after masking repeats is about 2.5 billion bp for mapping purposes

Human genome GC content averages 40.9% across all chromosomes

The genome of the fruit fly Drosophila melanogaster is 180 million base pairs

Arabidopsis thaliana genome size is 135 million base pairs with 5 chromosomes

Baker's yeast Saccharomyces cerevisiae genome is 12 million base pairs across 16 chromosomes

Escherichia coli K-12 genome is 4.6 million base pairs, circular chromosome

The mouse genome Mus musculus is 2.8 billion base pairs, highly similar to human

Rice Oryza sativa genome is 430 million base pairs with over 400 Mb euchromatin

The wheat genome Triticum aestivum is approximately 17 billion base pairs, hexaploid

Corn Zea mays genome size is 2.3 billion base pairs

The chimpanzee genome Pan troglodytes is 3.0 billion base pairs, 98.8% identical to human

Neanderthal genome draft size matches modern humans at ~3.1 billion bp

The bacterial genome of Mycobacterium tuberculosis is 4.4 million base pairs

Caenorhabditis elegans genome is 100 million base pairs with 6 chromosomes

The pufferfish Takifugu rubripes genome is 400 million base pairs, compact vertebrate genome

Plasmodium falciparum malaria parasite genome is 23 million base pairs

Interpretation

Genome Size And Structure Interpretation

Within the Genome Size And Structure category, humans have a detailed genomic layout where the haploid genome is 3,054,815,472 base pairs in GRCh38.p14 and the diploid genome totals about 6.4 billion base pairs, reflecting how linear 24-chromosome organization scales up to roughly twice the DNA content plus an extra 16,569 base pairs from mitochondrial DNA.

05 · Category

Sequencing Projects19 stats

Human Genome Project officially completed in 2003 with 99% coverage at 1x depth

The first human genome sequence cost $2.7 billion and took 13 years

Illumina HiSeq platform enabled 100x coverage human genomes for under $1,000by 2015

The 1000 Genomes Project sequenced 2,504 individuals from 26 populations at 6x coverage

UK Biobank sequenced exomes of 500,000 participants at 30x depth

The Cancer Genome Atlas (TCGA) generated 2.5 petabytes of genomic data from 11,000 tumors

Earth BioGenome Project aims to sequence all 1.8 million eukaryotic species by 2028

The first bacterial genome, Haemophilus influenzae, sequenced in 1995 at 1.8 Mb

Human ENCODE project mapped functional elements across 30% of the genome using multiple assays

The Neanderthal genome sequenced from three individuals at 1.3x average coverage in 2010

PacBio long-read sequencing achieved N50 contig size of 13 Mb for human genome CHM13 assembly

Oxford Nanopore MinION sequenced entire human genome in real-time at 30x coverage

The Telomere-to-Telomere (T2T) consortium completed the first fully gap-free human genome in 2022

All of Us Research Program plans to sequence 1 million diverse U.S. genomes

BGI sequenced the first individual human genome (YH) in 2008 using SOAP assembler

The rice genome fully sequenced in 2005 by Beijing Institute of Genomics at 95% coverage

Mouse genome sequenced by Celera and public consortium in 2002 at 7x coverage

The human reference GRCh38 released in 2013 incorporating 75 new assemblies

GTEx project sequenced RNA from 54 tissues across 948 donors

Interpretation

Sequencing Projects Interpretation

Sequencing has rapidly scaled from the Human Genome Project’s 99% coverage at 1x over 13 years costing $2.7 billion to projects like the UK Biobank’s 500,000 exomes at 30x and TCGA’s 2.5 petabytes from 11,000 tumors, showing that sequencing efforts now deliver much higher depth and far larger cohorts at dramatically lower cost and faster timelines.

Reference

Cite This Report

This report is designed to be cited. We maintain stable URLs and versioned verification dates. Copy the format appropriate for your publication below.

APA

Emilia Santos. (2026, February 13). Genome Statistics. Gitnux. https://gitnux.org/genome-statistics

MLA

Emilia Santos. "Genome Statistics." Gitnux, 13 Feb 2026, https://gitnux.org/genome-statistics.

Chicago

Emilia Santos. 2026. "Genome Statistics." Gitnux. https://gitnux.org/genome-statistics.

Key Takeaways

Related reading

Applications And Impacts20 stats

Applications And Impacts Interpretation

Gene Content24 stats

Gene Content Interpretation

Genetic Variation20 stats

Genetic Variation Interpretation

More related reading

Genome Size And Structure29 stats

Genome Size And Structure Interpretation

Sequencing Projects19 stats

Sequencing Projects Interpretation

Cite This Report

Sources & references