Key Takeaways
- The human genome consists of approximately 3.1 billion base pairs of DNA
- There are about 20,000-25,000 protein-coding genes in the human genome
- Non-coding RNA genes make up around 10% of the human genome
- Whole genome sequencing cost was $2.7 billion for the Human Genome Project in 2003
- By 2023, the cost of human genome sequencing dropped to $562
- Illumina NovaSeq can sequence 20,000 genomes per year at 30x coverage
- The 1000 Genomes Project sequenced 2,504 individuals
- dbSNP database contains 1 billion+ variants as of 2023
- ENCODE project mapped functional elements in 1% then whole genome
- The average human heterozygosity is 0.1% or 1 in 1,000 bases
- Common SNPs (MAF>1%) number 84 million in 1000 Genomes
- Structural variants cover 25 Mb per human genome
- BRCA1/2 mutations confer 72% lifetime breast cancer risk
- CFTR deltaF508 mutation causes 70% of cystic fibrosis cases in Caucasians
- APC mutations underlie 80% of familial adenomatous polyposis
Human genome sequencing has become dramatically faster, cheaper, and more comprehensive, enabling breakthroughs in medicine and agriculture.
Applied Genomics
- Maize genome size is 2.3 Gb with 32,000 genes
- Rice genome sequenced at 430 Mb with 37,000 genes
- CRISPR improved wheat yield by 20% via gene editing
- GMO Bt corn reduces insecticide use by 37%
- Soybean genome has 1.1 billion bases and 46,000 genes
- Cattle genome project identified 22,000 genes
- Dog genome reveals 19,000 genes similar to human
- Arabidopsis thaliana genome is 135 Mb with 27,000 genes
- Golden rice with beta-carotene boosts vitamin A in rice
- Salmonella typhimurium genome 4.9 Mb used for vaccine development
- Yeast synthetic genome project rewrote 16 chromosomes
- Mosquito genome editing reduces malaria transmission 99% in labs
- Pig genome aids xenotransplantation with 25 edits
- Banana genome sequencing combats Panama disease
- CRISPR tomatoes with GABA boost flavor and shelf life
- E. coli minimal genome has 473 genes for synthetic biology
- Coronavirus genome 30 kb sequenced for vaccine design
- Cotton genome polyploidy decoded for fiber improvement
- Chicken genome has 1.05 Gb and aids avian flu research
- Genomic selection increases dairy cattle milk yield 100 kg/yr
- Virus-resistant papaya saved Hawaiian industry via transgene
- Atlantic salmon genome duplicated aids aquaculture
- Sugarcane genome 10 Gb sequenced for biofuel
- Genomic prediction accuracy 70% for pig growth traits
- Fungus-resistant wine grapes via CRISPR
Applied Genomics Interpretation
Disease Genomics
- BRCA1/2 mutations confer 72% lifetime breast cancer risk
- CFTR deltaF508 mutation causes 70% of cystic fibrosis cases in Caucasians
- APC mutations underlie 80% of familial adenomatous polyposis
- HTT CAG repeat >36 causes Huntington's disease
- FMR1 CGG repeat >200 leads to fragile X syndrome in 1/4,000 males
- TP53 mutations in 50% of all cancers
- KRAS mutations drive 30% of colorectal cancers
- EGFR mutations in 10-15% non-small cell lung cancers in Asians
- PCSK9 loss-of-function variants reduce LDL by 30%
- Factor V Leiden mutation increases VTE risk 5-fold
- GBA mutations increase Parkinson's risk 5-10 fold
- APP/PSEN1 mutations cause 5% early-onset Alzheimer's
- LDLR mutations cause 90% familial hypercholesterolemia cases
- SMN1 deletions cause 95% spinal muscular atrophy
- DMD deletions in 65% Duchenne muscular dystrophy
- Polygenic risk scores explain 20% schizophrenia heritability
- GWAS identified 100+ loci for type 2 diabetes
- Heritability of height is 80% from 12,000 loci
- Coronary artery disease PRS predicts 10% risk variance
- Somatic JAK2 V617F in 95% polycythemia vera
- CALR mutations in 25% essential thrombocythemia
- FLT3-ITD in 30% acute myeloid leukemia
- IDH1/2 mutations in 75% low-grade gliomas
- PTEN loss in 40-50% endometrial cancers
- MSI-high in 15% colorectal cancers responsive to immunotherapy
- TERT promoter mutations in 70% melanomas
- Genome-wide association studies link 500+ loci to breast cancer risk
- Alpha-1 antitrypsin deficiency from PI*Z allele in 1/2,500 Europeans
- Hemochromatosis HFE C282Y homozygotes 0.4% in Northern Europe
- Genome editing corrects 60% of DMD mutations in mice
Disease Genomics Interpretation
Genetic Variation
- The average human heterozygosity is 0.1% or 1 in 1,000 bases
- Common SNPs (MAF>1%) number 84 million in 1000 Genomes
- Structural variants cover 25 Mb per human genome
- Inversions affect 1% of the human genome per individual
- Mobile element insertions number 100+ de novo per generation
- Tandem repeats vary in 10% of human disease loci
- African populations have 19% more genetic diversity than Europeans
- Neanderthal admixture contributes 1-2% DNA to non-Africans
- Denisovan DNA in Oceanians up to 5%
- Mutation rate is 1.2 x 10^-8 per base per generation
- De novo mutations average 60-70 per diploid genome
- Loss-of-function variants tolerated in 100 genes per person
- HLA alleles number 20,000+ in human population
- ABO blood group polymorphism affects 20% frequency variation globally
- Lactase persistence allele frequency 90% in Northern Europeans
- Sickle cell allele frequency 10-20% in malaria-endemic Africa
- CCR5-delta32 mutation frequency 10% in Europeans
- Copy number variants >1kb in 12% of genome per individual
- Microsatellite instability in 15% of colorectal cancers
- Haplotype blocks average 22 kb in Europeans
- Fst genetic differentiation between continents averages 0.11
- Mitochondrial haplogroups divide populations with 50% variance
- Y-chromosome haplogroups show 80% population structure
- Runs of homozygosity >1Mb in 10% of outbred individuals
- Segmental duplications cover 5% of human genome
- Karyotype abnormalities occur in 0.5-1% of newborns
- Trinucleotide repeats expand in 40+ disorders like Huntington's
- Somatic mutations accumulate 10^4 per cell per year post-puberty
- Driver mutations in cancer average 2-8 per tumor
Genetic Variation Interpretation
Genome Structure
- The human genome consists of approximately 3.1 billion base pairs of DNA
- There are about 20,000-25,000 protein-coding genes in the human genome
- Non-coding RNA genes make up around 10% of the human genome
- The human genome has over 3 million single nucleotide polymorphisms (SNPs)
- Introns account for approximately 25% of the human genome
- The average gene density in the human genome is one gene per 100,000 base pairs
- Euchromatin regions comprise about 92% of the human genome
- The human genome contains around 1,800 ribosomal RNA genes
- Telomeres in humans consist of 5-15 kilobases of TTAGGG repeats
- Centromeres in human chromosomes average 1-4 Mb in size
- The Y chromosome is the smallest human chromosome with about 59 million base pairs
- Chromosome 1 is the largest human chromosome with 249 million base pairs
- Mitochondrial DNA in humans is 16,569 base pairs long
- The human genome has approximately 200,000 copy number variations (CNVs)
- Pseudogenes number around 14,000 in the human genome
- The haploid human genome size is 3,054,815,472 base pairs according to GRCh38
- Repeat elements constitute 50% of the human genome
- Alu elements number over 1 million in the human genome
- LINE-1 elements make up 17% of the human genome
- The human genome has 23 pairs of chromosomes
- Exons comprise only 1.5% of the human genome
- The p53 gene spans 20 kb with 11 exons
- BRCA1 gene is 81 kb long with 24 exons
- The HOX gene cluster spans 100 kb on chromosome 17
- Immunoglobulin heavy chain locus is 1.25 Mb on chromosome 14
- The major histocompatibility complex (MHC) spans 3.6 Mb on chromosome 6
- The alpha-globin gene cluster is 28 kb on chromosome 16
- Beta-globin locus control region is 10 kb upstream
- The dystrophin gene is the largest known human gene at 2.4 Mb
- Titin gene (TTN) has 363 exons and spans 282 kb
- The human genome has 19,000 lncRNA genes
Genome Structure Interpretation
Genomic Databases
- The 1000 Genomes Project sequenced 2,504 individuals
- dbSNP database contains 1 billion+ variants as of 2023
- ENCODE project mapped functional elements in 1% then whole genome
- GENCODE annotates 59,000+ human genes
- ClinVar has 2 million+ variant pathogenicity assertions
- gnomAD aggregates variants from 807,162 exomes and 1.3 million genomes
- UCSC Genome Browser hosts 50+ assemblies
- Ensembl database covers 500+ species
- RefSeq has 300,000+ reference sequences
- GTEx portal analyzes eQTLs from 49 tissues in 948 donors
- Roadmap Epigenomics profiled 111 reference epigenomes
- 100,000 Genomes Project sequenced 85,000 cancer and rare disease genomes
- UK Biobank genotyped 500,000 participants
- All of Us Research Program aims for 1 million diverse genomes
- TCGA analyzed 11,000+ tumor samples across 33 cancers
- ICGC sequenced 2,500 cancer genomes initially
- GEO database has 5 million+ samples
- SRA stores 40 petabases of sequencing data
- COSMIC catalogs 37 million coding mutations in cancer
- OMIM documents 8,000+ Mendelian disorders
- GWAS Catalog lists 6,000+ studies with 250,000+ associations
- STRING database has 2.4 billion interactions for 12,000 species
- Reactome pathways number 2,800 for human
- KEGG has 18,000 pathways across organisms
- Pfam database classifies 19,000 families
- UniProt has 570,000 reviewed protein entries
- PDB structures 200,000+ macromolecular structures
- AlphaFold predicted structures for all 20,000 human proteins
- Human Protein Atlas maps 20,000 proteins in 47 tissues
- DepMap CRISPR screens 1,000+ cancer cell lines
- CCLE profiles genomics of 1,400 cancer cell lines
Genomic Databases Interpretation
Sequencing Technology
- Whole genome sequencing cost was $2.7 billion for the Human Genome Project in 2003
- By 2023, the cost of human genome sequencing dropped to $562
- Illumina NovaSeq can sequence 20,000 genomes per year at 30x coverage
- Oxford Nanopore MinION reads up to 2.8 Gb per flow cell in 72 hours
- PacBio HiFi reads achieve 99.9% accuracy for 15-20 kb reads
- CRISPR-Cas9 editing efficiency reaches 80% in human cells
- Single-cell RNA-seq profiles 10,000+ cells per run with 10x Genomics
- Long-read sequencing assembles 99% of human genome including centromeres
- Third-generation sequencing error rate improved to <1% in 2022
- BGISEQ-500 sequences 75 Gb per run
- Ion Torrent S5 sequences 15 Gb in 7 hours
- Hi-C chromatin mapping captures 1 billion contacts per diploid genome
- Optical genome mapping detects 90% of SVs missed by short-reads
- Spatial transcriptomics resolves 1 μm resolution with Visium
- ATAC-seq identifies 100,000+ open chromatin regions per cell type
- ChIP-seq peaks average 500-1000 bp for histone marks
- RNA-seq detects 150,000 transcripts in human cells
- Whole exome sequencing covers 98% of coding regions at 20x depth
- Nanopore direct RNA sequencing reads full-length transcripts without fragmentation
- Linked-read sequencing phases 90% of human haplotypes
- Ultra-long reads >100 kb enable telomere-to-telomere assemblies
- Base editing efficiency >50% for C-to-T transitions
- Prime editing corrects 89% of mutations without DSBs
- Illumina iSeq 100 sequences 1.5 million reads per run
- Element Biosciences AVITI achieves Q40 accuracy
- MGI Tech DNBSEQ-T7 produces 12 Tb per run
Sequencing Technology Interpretation
Sources & References
- Reference 1GENOMEgenome.govVisit source
- Reference 2NCBIncbi.nlm.nih.govVisit source
- Reference 3NATUREnature.comVisit source
- Reference 4CELLcell.comVisit source
- Reference 5GENOMEgenome.ucsc.eduVisit source
- Reference 6GENOMEBIOLOGYgenomebiology.biomedcentral.comVisit source
- Reference 7GENOMEgenome.cshlp.orgVisit source
- Reference 8MEDLINEPLUSmedlineplus.govVisit source
- Reference 9ILLUMINAillumina.comVisit source
- Reference 10NANOPORETECHnanoporetech.comVisit source
- Reference 11PACBpacb.comVisit source
- Reference 1210XGENOMICS10xgenomics.comVisit source
- Reference 13ANNUALREVIEWSannualreviews.orgVisit source
- Reference 14ENen.genomics.cnVisit source
- Reference 15THERMOFISHERthermofisher.comVisit source
- Reference 16BIONANOGENOMICSbionanogenomics.comVisit source
- Reference 17ELEMENTBIOSCIENCESelementbiosciences.comVisit source
- Reference 18INTERNATIONALGENOMEinternationalgenome.orgVisit source
- Reference 19ENCODEPROJECTencodeproject.orgVisit source
- Reference 20GENCODEGENESgencodegenes.orgVisit source
- Reference 21GNOMADgnomad.broadinstitute.orgVisit source
- Reference 22ENSEMBLensembl.orgVisit source
- Reference 23GTEXPORTALgtexportal.orgVisit source
- Reference 24ROADMAPEPIGENOMICSroadmapepigenomics.orgVisit source
- Reference 25GENOMICSENGLANDgenomicsengland.co.ukVisit source
- Reference 26UKBIOBANKukbiobank.ac.ukVisit source
- Reference 27ALLOFUSallofus.nih.govVisit source
- Reference 28CANCERcancer.govVisit source
- Reference 29DCCdcc.icgc.orgVisit source
- Reference 30CANCERcancer.sanger.ac.ukVisit source
- Reference 31OMIMomim.orgVisit source
- Reference 32EBIebi.ac.ukVisit source
- Reference 33STRING-DBstring-db.orgVisit source
- Reference 34REACTOMEreactome.orgVisit source
- Reference 35GENOMEgenome.jpVisit source
- Reference 36PFAMpfam.xfam.orgVisit source
- Reference 37UNIPROTuniprot.orgVisit source
- Reference 38RCSBrcsb.orgVisit source
- Reference 39ALPHAFOLDalphafold.ebi.ac.ukVisit source
- Reference 40PROTEINATLASproteinatlas.orgVisit source
- Reference 41DEPMAPdepmap.orgVisit source
- Reference 42PORTALSportals.broadinstitute.orgVisit source
- Reference 431000GENOMES1000genomes.orgVisit source
- Reference 44SCIENCEscience.orgVisit source
- Reference 45PLOSGENETICSplosgenetics.orgVisit source
- Reference 46NEJMnejm.orgVisit source
- Reference 47ALZFORUMalzforum.orgVisit source
- Reference 48BLOODJOURNALbloodjournal.orgVisit source
- Reference 49PNASpnas.orgVisit source
- Reference 50GSEJOURNALgsejournal.biomedcentral.comVisit source





