Key Takeaways
- DeepSeek-V2 utilizes a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token.
- DeepSeek-V2 consists of 60 layers in its transformer structure with MLA (Multi-head Latent Attention) compressing KV cache by 93.6%.
- DeepSeek-Coder-V2-Base has 236B total parameters and activates 21B per token using MoE with 162 experts.
- DeepSeek-V2 was trained on 8.1 trillion tokens including 1.5T high-quality filtered data.
- DeepSeek-Coder-V2 pretraining used 10.2T tokens with 6T code-related data from 338 programming languages.
- DeepSeek LLM 67B was trained on 2T tokens using 512 H800 GPUs over 2.8M GPU hours.
- DeepSeek-V2 achieves 78.5% on MMLU benchmark for 5-shot evaluation.
- DeepSeek-Coder-V2 scores 90.2% on HumanEval for code generation pass@1.
- DeepSeek LLM 67B reaches 73.8% on MMLU and 82.6% on GSM8K math benchmark.
- DeepSeek-V2 has over 500K downloads on HuggingFace within first month of release.
- DeepSeek-Coder-V2 models accumulated 1.2M downloads on HuggingFace by Q3 2024.
- DeepSeek API platform serves over 10B tokens daily to 100K+ developers.
- DeepSeek-V2 outperforms Llama 3 70B by 5.2% on MMLU benchmark.
- DeepSeek-Coder-V2 beats GPT-4-Turbo by 12.1% on HumanEval coding metric.
- DeepSeek LLM 67B surpasses Qwen2 72B by 3.4% average on Open LLM Leaderboard.
DeepSeek models have diverse stats, benchmarks, performance, and data.
Comparisons with Other Models
Comparisons with Other Models Interpretation
Model Architecture
Model Architecture Interpretation
Performance Benchmarks
Performance Benchmarks Interpretation
Training Details
Training Details Interpretation
User Adoption and Usage
User Adoption and Usage Interpretation
Sources & References
- Reference 1DEEPSEEKdeepseek.comVisit source
- Reference 2HUGGINGFACEhuggingface.coVisit source
- Reference 3GITHUBgithub.comVisit source
- Reference 4PLATFORMplatform.deepseek.comVisit source
- Reference 5ARXIVarxiv.orgVisit source
- Reference 6ARENAarena.lmsys.orgVisit source
- Reference 7KAGGLEkaggle.comVisit source
- Reference 8CHATchat.deepseek.comVisit source






