Key Takeaways
- Phi-3-mini has 3.8 billion parameters and outperforms models twice its size on HumanEval
- Gemma-2B model contains exactly 2 billion parameters optimized for mobile deployment
- TinyLlama 1.1B has 1.1 billion parameters trained on 3 trillion tokens
- Phi-3-mini trained on 3.3 trillion tokens costing under $10M
- Gemma 2B trained with 6 trillion tokens in under 1 week on TPUs
- TinyLlama 1.1B trained on 3T tokens using only 16 A100 GPUs
- Phi-3-mini achieves 1.5 tokens/second on iPhone 14 CPU inference
- Gemma-2B runs at 20+ tokens/sec on single GPU quantized
- TinyLlama 1.1B infers at 50 tokens/sec on A100 GPU
- Phi-3-mini scores 68.8% on MMLU 5-shot
- Gemma-2B achieves 64.3% on MMLU benchmark
- TinyLlama scores 58.8% on MMLU zero-shot
- Phi-3-mini deployed on Azure AI at 10x cost savings vs Llama2-70B
- Gemma-2B integrated into Android apps for on-device AI
- TinyLlama adopted in 1M+ HuggingFace downloads monthly
Small language models have varied params, performance, training, deployment, stats.
Benchmark Results
Benchmark Results Interpretation
Deployment and Adoption
Deployment and Adoption Interpretation
Inference Speed
Inference Speed Interpretation
Model Parameters and Size
Model Parameters and Size Interpretation
Training Efficiency
Training Efficiency Interpretation
Sources & References
- Reference 1ARXIVarxiv.orgVisit source
- Reference 2AIai.google.devVisit source
- Reference 3HUGGINGFACEhuggingface.coVisit source
- Reference 4QWENLMqwenlm.github.ioVisit source
- Reference 5AZUREazure.microsoft.comVisit source
- Reference 6BLOGblog.googleVisit source
- Reference 7MICROSOFTmicrosoft.comVisit source
- Reference 8STABILITYstability.aiVisit source
- Reference 9H2Oh2o.aiVisit source
- Reference 10MACHINELEARNINGmachinelearning.apple.comVisit source
- Reference 11GITHUBgithub.comVisit source






