Key Takeaways
- GPT-3 pre-training compute: 3.14 × 10^23 FLOP.
- PaLM 540B pre-training compute: 2.5 × 10^25 FLOP.
- LLaMA 65B pre-training compute: 1.2 × 10^24 FLOP.
- GPT-3 dataset size: approximately 300 billion tokens.
- PaLM 540B dataset size: 780 billion tokens.
- LLaMA 65B dataset size: 1.4 trillion tokens.
- GPT-3 training cost estimate: $4.6 million.
- PaLM 540B training cost: approximately $8 million.
- LLaMA 65B training cost: under $100k on public clouds.
- GPT-3 parameter count: 175 billion.
- PaLM parameter count: 540 billion.
- LLaMA parameter count: 65 billion.
- GPT-3 training energy: 1,287 MWh.
- PaLM 540B training energy: ~10,000 MWh estimate.
- LLaMA 65B training energy: 784 MWh.
AI training stats cover compute, model, cost, energy across models.
Compute Resources
Compute Resources Interpretation
Dataset Sizes
Dataset Sizes Interpretation
Energy Consumption
Energy Consumption Interpretation
Parameter Counts
Parameter Counts Interpretation
Training Costs
Training Costs Interpretation
Sources & References
- Reference 1ARXIVarxiv.orgVisit source
- Reference 2HUGGINGFACEhuggingface.coVisit source
- Reference 3OPENAIopenai.comVisit source
- Reference 4Xx.aiVisit source
- Reference 5INFLECTIONinflection.aiVisit source
- Reference 6MISTRALmistral.aiVisit source
- Reference 7DATABRICKSdatabricks.comVisit source
- Reference 8QWENLMqwenlm.github.ioVisit source
- Reference 9SEMIANALYSISsemianalysis.comVisit source
- Reference 10LIFEARCHITECTlifearchitect.aiVisit source
- Reference 11BIGSCIENCEbigscience.huggingface.coVisit source
- Reference 12EPOCHepoch.aiVisit source






