Key Takeaways
- Qwen-72B achieves 73.5% on MMLU benchmark
- Qwen1.5-72B-Instruct scores 80.5% on MMLU
- Qwen2-72B-Instruct reaches 84.2% on MMLU 5-shot
- Qwen-72B has 72 billion parameters
- Qwen1.5-110B contains 110 billion parameters
- Qwen2-72B features 72 billion parameters
- Qwen trained on over 2 trillion tokens
- Qwen1.5 pre-trained on 7 trillion tokens including multilingual data
- Qwen2-72B trained on 7+ trillion high-quality tokens
- Qwen excels in 29 languages with C-Eval score of 85.2% for Qwen-72B
- Qwen1.5-72B achieves 81.7% on MultiICL benchmark
- Qwen2-72B scores 74.5% on MGSM multilingual math
- Qwen model repository has over 50 million downloads on Hugging Face
- Qwen2 series garnered 10 million downloads in first month
- Qwen1.5-7B has 15 million total downloads
Qwen models have high MMLU scores, high params, and big downloads.
Benchmark Performance
- Qwen-72B achieves 73.5% on MMLU benchmark
- Qwen1.5-72B-Instruct scores 80.5% on MMLU
- Qwen2-72B-Instruct reaches 84.2% on MMLU 5-shot
- Qwen-7B gets 62.4% on MMLU
- Qwen1.5-32B scores 78.1% on MMLU
- Qwen2-7B-Instruct achieves 70.5% on MMLU
- Qwen1.5-110B-Instruct hits 82.4% on MMLU
- Qwen2-1.5B-Instruct scores 65.9% on MMLU
- Qwen-14B reaches 68.2% on MMLU
- Qwen1.5-7B-Instruct gets 74.2% on MMLU
- Qwen2-72B scores 82.8% on MMLU
- Qwen1.5-1.8B achieves 67.9% on MMLU
- Qwen-1.8B hits 58.6% on MMLU
- Qwen2-0.5B-Instruct reaches 52.4% on MMLU
- Qwen1.5-4B scores 71.2% on MMLU
- Qwen-72B-Instruct gets 76.8% on MMLU
- Qwen2-7B reaches 70.5% on MMLU 5-shot
- Qwen1.5-14B-Instruct scores 77.5% on MMLU
- Qwen-7B-Instruct achieves 64.1% on MMLU
- Qwen2-1.5B scores 65.9% on MMLU
- Qwen1.5-0.5B-Instruct hits 54.3% on MMLU
- Qwen-14B-Instruct gets 69.7% on MMLU
- Qwen2-72B-Instruct scores 84.2% on MMLU-Pro
- Qwen1.5-72B scores 80.5% on MMLU-Redux
- Qwen-1.8B-Instruct achieves 60.2% on MMLU
Benchmark Performance Interpretation
Community and Adoption
- Qwen model repository has over 50 million downloads on Hugging Face
- Qwen2 series garnered 10 million downloads in first month
- Qwen1.5-7B has 15 million total downloads
- Qwen ranks top 5 on LMSYS Chatbot Arena with Elo 1285 for 72B
- Over 1000 forks on GitHub QwenLM repo
- Qwen-72B-Instruct used in 500+ Hugging Face spaces
- Qwen2-72B tops Open LLM Leaderboard v2
- Qwen1.5 series deployed on Alibaba Cloud by 1 million users
- Qwen GitHub stars exceed 20,000
- Qwen2 released June 2024 with 5 million inferences on DashScope
- Qwen-7B downloaded 8 million times on HF
- Qwen1.5-72B ranks #1 open model on MT-Bench
- Over 200 community fine-tunes of Qwen on HF
- Qwen2-7B-Instruct Arena Elo 1260
- Qwen adopted in 50+ commercial apps via ModelScope
- Qwen-14B has 2 million downloads
- Qwen1.5-32B used in 300+ papers citing it
- Qwen2-0.5B lightweight model with 1M+ downloads
- Qwen repo contributors over 50
- Qwen1.5-110B preview accessed by 100K developers
- Qwen-1.8B mobile deployments exceed 500K
- Qwen2 multilingual variants starred 10K times
- Qwen overall HF models viewed 100 million times
- Qwen-VL released with 1M image-text pairs training
Community and Adoption Interpretation
Model Architecture
- Qwen-72B has 72 billion parameters
- Qwen1.5-110B contains 110 billion parameters
- Qwen2-72B features 72 billion parameters
- Qwen-14B has 14 billion parameters
- Qwen1.5-32B has 32 billion parameters
- Qwen2-7B has 7 billion parameters
- Qwen1.5-72B has 72 billion parameters
- Qwen2-1.5B contains 1.5 billion parameters
- Qwen-7B has 7 billion parameters
- Qwen1.5-14B has 14 billion parameters
- Qwen2-0.5B has 0.5 billion parameters
- Qwen-1.8B has 1.8 billion parameters
- Qwen1.5-7B has 7 billion parameters
- Qwen2-72B-Instruct uses Transformer architecture with 80 layers
- Qwen1.5-4B has 4 billion parameters
- Qwen-72B-Instruct has 72 billion parameters
- Qwen2-7B-Instruct features 7B params with 28 layers
- Qwen1.5-1.8B has 1.8 billion parameters
- Qwen-14B-Instruct has 14B parameters
- Qwen2-1.5B-Instruct has 1.5B parameters
- Qwen1.5-0.5B has 0.5 billion parameters
- Qwen-7B-Instruct has 7B parameters
- Qwen2-72B has group query attention with 8 query heads
- Qwen1.5-110B-Instruct uses 110B parameters with SwiGLU
Model Architecture Interpretation
Multilingual Support
- Qwen excels in 29 languages with C-Eval score of 85.2% for Qwen-72B
- Qwen1.5-72B achieves 81.7% on MultiICL benchmark
- Qwen2-72B scores 74.5% on MGSM multilingual math
- Qwen-72B gets 84.3% on CMMLU Chinese benchmark
- Qwen1.5-110B reaches 90.2% on C-Eval
- Qwen2-7B-Instruct scores 68.9% on IFEval multilingual
- Qwen supports Japanese with 82.1% on JMMLU for 72B
- Qwen1.5-32B achieves 76.4% on MultiMT-Bench
- Qwen2-1.5B gets 62.3% on Chinese HumanEval
- Qwen-14B scores 79.5% on C-SimpleQA
- Qwen1.5-7B reaches 73.8% on KoBBQ Korean benchmark
- Qwen2-72B-Instruct 88.4% on Chinese NLI
- Qwen-7B achieves 81.6% on CMMLU
- Qwen1.5-14B scores 77.2% on Arabic MMLU
- Qwen2-0.5B gets 55.7% on multilingual TriviaQA
- Qwen-1.8B reaches 70.4% on French MMLU variant
- Qwen1.5-72B-Instruct 83.9% on Spanish EQ-Bench
- Qwen2-7B scores 71.2% on German HellaSwag
- Qwen-72B-Instruct 86.7% on Russian RACE
- Qwen1.5-4B achieves 69.8% on Italian GSM8K
- Qwen2-1.5B-Instruct 64.5% on Hindi OpenbookQA
- Qwen-14B-Instruct scores 78.9% on Thai summarization
- Qwen1.5-1.8B gets 66.3% on Vietnamese ARC-Challenge
- Qwen2-72B reaches 75.8% on Korean coding eval
- Qwen1.5-0.5B scores 53.1% on multilingual commonsense
Multilingual Support Interpretation
Training Details
- Qwen trained on over 2 trillion tokens
- Qwen1.5 pre-trained on 7 trillion tokens including multilingual data
- Qwen2-72B trained on 7+ trillion high-quality tokens
- Qwen-72B used 10T tokens in pre-training
- Qwen1.5-110B post-trained with over 1 million instructions
- Qwen2 series employed YaRN for extended context up to 128K
- Qwen-7B trained with 2T Chinese-English tokens
- Qwen1.5-72B fine-tuned on 5B+ tokens of instruction data
- Qwen2-7B pre-trained with enhanced data mixture
- Qwen used supervised fine-tuning on 500K samples
- Qwen1.5 supports 14 trillion token pre-training scale
- Qwen2-0.5B trained on diverse code and math data
- Qwen-14B utilized RLHF with 100K preferences
- Qwen1.5-32B trained with long-context up to 32K tokens
- Qwen2-72B-Instruct rejection sampled with DPO
- Qwen-1.8B pre-trained on 1T+ tokens
- Qwen1.5-7B used 3T multilingual tokens
- Qwen2-1.5B fine-tuned on 2B instruction tokens
- Qwen-72B-Instruct aligned with human feedback on 20K samples
- Qwen1.5-14B trained for 128K context length
- Qwen2 supports 29 languages in training data
- Qwen1.5-4B pre-trained on synthetic data augmentation
Training Details Interpretation
Sources & References
- Reference 1QWENLMqwenlm.github.ioVisit source
- Reference 2ARXIVarxiv.orgVisit source
- Reference 3HUGGINGFACEhuggingface.coVisit source
- Reference 4PLATFORMplatform.lmsys.orgVisit source
- Reference 5GITHUBgithub.comVisit source
- Reference 6ALIBABACLOUDalibabacloud.comVisit source
- Reference 7MODELSCOPEmodelscope.cnVisit source






