Key Takeaways
- The human genome contains approximately 3.2 billion base pairs of DNA sequence
- The haploid human genome size is measured at 3,054,815,472 base pairs in the GRCh38.p14 assembly
- Eukaryotic genomes like humans have linear chromosomes, with 22 autosomes and 2 sex chromosomes totaling 24 unique chromosomes
- The human genome contains an estimated 20,000-25,000 protein-coding genes
- Non-coding RNAs number over 20,000 in the human genome including lncRNAs and miRNAs
- Pseudogenes in humans total around 14,000, mostly processed pseudogenes
- Human Genome Project officially completed in 2003 with 99% coverage at 1x depth
- The first human genome sequence cost $2.7 billion and took 13 years
- Illumina HiSeq platform enabled 100x coverage human genomes for under $1,000 by 2015
- The common single nucleotide polymorphisms (SNPs) number over 10 million in the human genome with minor allele frequency >1%
- Structural variants (SVs) affect 20-50 kb per individual, totaling 1-2% of genome difference
- Copy number variations (CNVs) cover 12% of the human genome across populations
- Genome-wide association studies link 7,000 SNPs to disease risk
- Pharmacogenomics identifies 300 actionable variants for 100+ drugs
- Prenatal whole-genome sequencing detects 13% more pathogenic variants than microarrays
The human genome contains billions of base pairs, thousands of genes, and vast repetitive regions.
Applications and Impacts
- Genome-wide association studies link 7,000 SNPs to disease risk
- Pharmacogenomics identifies 300 actionable variants for 100+ drugs
- Prenatal whole-genome sequencing detects 13% more pathogenic variants than microarrays
- Cancer precision medicine matches therapies to mutations in 30% of advanced cases
- Polygenic risk scores predict 10-20% lifetime risk for coronary artery disease
- CRISPR-Cas9 gene editing corrected 80% of sickle cell mutations in stem cells
- Non-invasive prenatal testing (NIPT) screens 99.9% of trisomy 21 cases from cell-free DNA
- Carrier screening panels detect 85% of cystic fibrosis carriers in Caucasians
- Whole genome sequencing reduces neonatal ICU diagnosis time from months to days in 40% cases
- Forensic DNA phenotyping predicts eye color with 90% accuracy from SNPs
- Genomic selection in cattle breeding increased milk yield by 100 kg/year
- Bt corn genome editing reduced pesticide use by 37% globally
- Human genome editing trials for HIV cure edited CCR5 in 12 patients safely
- Ancestry DNA tests trace 80% of Ashkenazi Jewish ancestry accurately
- Metagenomics sequenced 200,000 microbial genomes from human gut microbiome
- AlphaFold predicted structures for 200 million protein sequences from genomes
- Liquid biopsy ctDNA detects 87% of stage I cancers via genome sequencing
- Gene drive mosquitoes edited genomes reduced malaria vectors by 99% in trials
- Direct-to-consumer genetic testing reached 30 million users by 2023
- Genome editing in rice increased yield by 20% via promoter swaps
Applications and Impacts Interpretation
Gene Content
- The human genome contains an estimated 20,000-25,000 protein-coding genes
- Non-coding RNAs number over 20,000 in the human genome including lncRNAs and miRNAs
- Pseudogenes in humans total around 14,000, mostly processed pseudogenes
- The average human gene spans 27 kb with 9 exons on average
- Histone genes in humans number about 100, clustered on chromosomes 1 and 6
- Olfactory receptor genes total 391 functional in humans, part of 800+ gene family
- MHC genes on chromosome 6 number over 200, highly polymorphic
- HOX gene clusters in humans consist of 39 genes across 4 clusters
- Immunoglobulin genes on chromosome 14 total hundreds in variable/diversity/joining segments
- T-cell receptor genes number over 100 loci across multiple chromosomes
- G-protein coupled receptors (GPCRs) genes total 816 in humans
- Kinase genes number approximately 518 in the human kinome
- Zinc finger genes exceed 700 in humans, largest transcription factor family
- Cytochrome P450 genes total 57 functional in humans
- Collagen genes number 28 in humans
- The fruit fly genome encodes about 14,000 protein-coding genes
- Arabidopsis has 27,655 protein-coding genes
- Yeast S. cerevisiae has 6,300 genes, 5,500 protein-coding
- E. coli has 4,300 genes, mostly protein-coding
- Mouse genome has 22,000 protein-coding genes
- The rice genome encodes 41,000 genes
- Wheat genome has ~110,000 genes due to polyploidy
- Chimpanzee genome has ~19,000 protein-coding genes
- C. elegans has 20,400 protein-coding genes
Gene Content Interpretation
Genetic Variation
- The common single nucleotide polymorphisms (SNPs) number over 10 million in the human genome with minor allele frequency >1%
- Structural variants (SVs) affect 20-50 kb per individual, totaling 1-2% of genome difference
- Copy number variations (CNVs) cover 12% of the human genome across populations
- The nucleotide diversity π in humans is 0.001 between any two individuals
- African populations have 60% higher SNP density than non-Africans
- HLA region shows highest polymorphism with over 20,000 alleles cataloged
- Mobile element insertions vary by 1,500 events per individual genome
- Inversions larger than 1 kb occur at 12,000-15,000 per diploid genome
- Microsatellite repeat variations contribute to 3% of human genetic diversity
- Somatic mutations in cancer genomes average 100-1,000 per tumor exome
- The chimpanzee-human divergence is 1.23% at aligned sites
- Archaic admixture from Neanderthals contributes 1-2% of non-African genomes
- Denisovan DNA admixture up to 5% in some Oceanian populations
- Rare variants (<0.1% MAF) constitute 86% of SNPs in human populations
- gnomAD database catalogs 676,000 exomes with 3.1 million loss-of-function variants
- Population bottleneck reduced human diversity to 10,000 individuals ~70,000 years ago
- Fst genetic differentiation between continents averages 0.11
- GWAS identified 12,000 trait-associated loci across 3,300 traits
- Polygenic risk scores explain up to 20% heritability for height in Europeans
- CRISPR off-target mutations occur at rates below 0.1% in edited genomes
Genetic Variation Interpretation
Genome Size and Structure
- The human genome contains approximately 3.2 billion base pairs of DNA sequence
- The haploid human genome size is measured at 3,054,815,472 base pairs in the GRCh38.p14 assembly
- Eukaryotic genomes like humans have linear chromosomes, with 22 autosomes and 2 sex chromosomes totaling 24 unique chromosomes
- The total length of the human genome is about 6.4 billion base pairs when considering the diploid state
- Mitochondrial DNA in humans contributes an additional 16,569 base pairs to the total genomic content
- The largest human chromosome, chromosome 1, spans 249 million base pairs
- Chromosome Y in humans is the smallest, with about 59 million base pairs
- Introns make up roughly 25% of the human genome, while exons constitute about 1.5%
- Repetitive DNA elements occupy over 50% of the human genome, including LINEs and SINEs
- The human genome has approximately 1.5% of its sequence coding for proteins
- Centromeric regions in the human genome total around 4-5% of the chromosomal length
- Telomeres in human chromosomes consist of TTAGGG repeats averaging 5-15 kb in length
- Heterochromatin comprises about 30% of the human genome, often gene-poor
- The effective genome size after masking repeats is about 2.5 billion bp for mapping purposes
- Human genome GC content averages 40.9% across all chromosomes
- The genome of the fruit fly Drosophila melanogaster is 180 million base pairs
- Arabidopsis thaliana genome size is 135 million base pairs with 5 chromosomes
- Baker's yeast Saccharomyces cerevisiae genome is 12 million base pairs across 16 chromosomes
- Escherichia coli K-12 genome is 4.6 million base pairs, circular chromosome
- The mouse genome Mus musculus is 2.8 billion base pairs, highly similar to human
- Rice Oryza sativa genome is 430 million base pairs with over 400 Mb euchromatin
- The wheat genome Triticum aestivum is approximately 17 billion base pairs, hexaploid
- Corn Zea mays genome size is 2.3 billion base pairs
- The chimpanzee genome Pan troglodytes is 3.0 billion base pairs, 98.8% identical to human
- Neanderthal genome draft size matches modern humans at ~3.1 billion bp
- The bacterial genome of Mycobacterium tuberculosis is 4.4 million base pairs
- Caenorhabditis elegans genome is 100 million base pairs with 6 chromosomes
- The pufferfish Takifugu rubripes genome is 400 million base pairs, compact vertebrate genome
- Plasmodium falciparum malaria parasite genome is 23 million base pairs
Genome Size and Structure Interpretation
Sequencing Projects
- Human Genome Project officially completed in 2003 with 99% coverage at 1x depth
- The first human genome sequence cost $2.7 billion and took 13 years
- Illumina HiSeq platform enabled 100x coverage human genomes for under $1,000 by 2015
- The 1000 Genomes Project sequenced 2,504 individuals from 26 populations at 6x coverage
- UK Biobank sequenced exomes of 500,000 participants at 30x depth
- The Cancer Genome Atlas (TCGA) generated 2.5 petabytes of genomic data from 11,000 tumors
- Earth BioGenome Project aims to sequence all 1.8 million eukaryotic species by 2028
- The first bacterial genome, Haemophilus influenzae, sequenced in 1995 at 1.8 Mb
- Human ENCODE project mapped functional elements across 30% of the genome using multiple assays
- The Neanderthal genome sequenced from three individuals at 1.3x average coverage in 2010
- PacBio long-read sequencing achieved N50 contig size of 13 Mb for human genome CHM13 assembly
- Oxford Nanopore MinION sequenced entire human genome in real-time at 30x coverage
- The Telomere-to-Telomere (T2T) consortium completed the first fully gap-free human genome in 2022
- All of Us Research Program plans to sequence 1 million diverse U.S. genomes
- BGI sequenced the first individual human genome (YH) in 2008 using SOAP assembler
- The rice genome fully sequenced in 2005 by Beijing Institute of Genomics at 95% coverage
- Mouse genome sequenced by Celera and public consortium in 2002 at 7x coverage
- The human reference GRCh38 released in 2013 incorporating 75 new assemblies
- GTEx project sequenced RNA from 54 tissues across 948 donors
Sequencing Projects Interpretation
Sources & References
- Reference 1GENOMEgenome.govVisit source
- Reference 2NCBIncbi.nlm.nih.govVisit source
- Reference 3ENen.wikipedia.orgVisit source
- Reference 4MEDLINEPLUSmedlineplus.govVisit source
- Reference 5GHRghr.nlm.nih.govVisit source
- Reference 6NATUREnature.comVisit source
- Reference 7GENOMEgenome.ucsc.eduVisit source
- Reference 8ENSEMBLensembl.orgVisit source
- Reference 9GATKgatk.broadinstitute.orgVisit source
- Reference 10FLYBASEflybase.orgVisit source
- Reference 11ARABIDOPSISarabidopsis.orgVisit source
- Reference 12YEASTGENOMEyeastgenome.orgVisit source
- Reference 13WHEATGENOMEwheatgenome.orgVisit source
- Reference 14MAIZEGDBmaizegdb.orgVisit source
- Reference 15WORMBASEwormbase.orgVisit source
- Reference 16PLASMODBplasmodb.orgVisit source
- Reference 17GENCODEGENESgencodegenes.orgVisit source
- Reference 18IMGTimgt.orgVisit source
- Reference 19GUIDETOPHARMACOLOGYguidetopharmacology.orgVisit source
- Reference 20KINASEkinase.comVisit source
- Reference 21DRNELSONdrnelson.uthsc.eduVisit source
- Reference 22ECOCYCecocyc.orgVisit source
- Reference 23RICErice.uga.eduVisit source
- Reference 24ILLUMINAillumina.comVisit source
- Reference 25INTERNATIONALGENOMEinternationalgenome.orgVisit source
- Reference 26UKBIOBANKukbiobank.ac.ukVisit source
- Reference 27CANCERcancer.govVisit source
- Reference 28EARTHBIOGENOMEearthbiogenome.orgVisit source
- Reference 29SCIENCEscience.orgVisit source
- Reference 30ENCODEPROJECTencodeproject.orgVisit source
- Reference 31NANOPORETECHnanoporetech.comVisit source
- Reference 32ALLOFUSallofus.nih.govVisit source
- Reference 33GTEXPORTALgtexportal.orgVisit source
- Reference 341000GENOMES1000genomes.orgVisit source
- Reference 35EBIebi.ac.ukVisit source
- Reference 36CBIOPORTALcbioportal.orgVisit source
- Reference 37GNOMADgnomad.broadinstitute.orgVisit source
- Reference 38PHARMGKBpharmgkb.orgVisit source
- Reference 39NEJMnejm.orgVisit source
- Reference 40ACOGacog.orgVisit source
- Reference 41ISAAAisaaa.orgVisit source
- Reference 42ANCESTRYancestry.comVisit source
- Reference 43ANNALSOFONCOLOGYannalsofoncology.orgVisit source
- Reference 44GENOMEWEBgenomeweb.comVisit source






