Key Takeaways
- Qwen2.5-72B-Instruct achieved 85.4% on MMLU benchmark
- Qwen2-72B-Instruct scored 84.2% on MMLU 5-shot
- Qwen1.5-72B-Chat reached 78.1% on MMLU
- Qwen2.5-72B has 7.37 billion parameters
- Qwen2-72B model supports 128K context length
- Qwen1.5-32B uses Grouped-Query Attention (GQA)
- Qwen trained on over 7 trillion tokens for Qwen2.5 series
- Qwen2 pre-trained on 7T tokens including code data
- Qwen1.5 used 2.5T multilingual tokens
- Qwen first released on September 1, 2023
- Qwen1.5 series launched February 1, 2024
- Qwen2 released June 6, 2024
- Qwen repo 1B downloads on Hugging Face as of Nov 2024
- Qwen2.5-72B-Instruct 50M downloads HF
- Qwen GitHub repo 35K stars
Alibaba Qwen models show strong benchmark performance across various metrics.
Adoption Metrics
- Qwen repo 1B downloads on Hugging Face as of Nov 2024
- Qwen2.5-72B-Instruct 50M downloads HF
- Qwen GitHub repo 35K stars
- Qwen2 tops LMSYS Chatbot Arena ELO 1300+
- Qwen1.5-72B 10M+ inferences on vLLM
- Qwen models used in 100+ countries
- Qwen2.5-7B 200M HF downloads
- Qwen community Discord 50K members
- Qwen2 #1 open model on Open LLM Leaderboard
- Qwen1.5 series 500M total downloads HF
- Qwen2.5 integrated in Alibaba Cloud PAI 1M users
- Qwen models 20K+ forks on GitHub
- Qwen2 Arena win rate 60% vs GPT-4o mini
- Qwen1.5-Chat 5M+ daily active users DashScope
- Qwen2.5-1.5B 100M+ downloads
- Qwen cited in 1000+ papers arXiv
- Qwen2.5 top trending model HF weekly
- Qwen series 2B parameters total deployed Alibaba
- Qwen2 15K+ issues resolved GitHub
- Qwen1.5-VL 30M image inferences
- Qwen2.5-Coder #2 on BigCode leaderboard
- Qwen models in 500+ apps via API
- Qwen2.5 40% market share open models China
Adoption Metrics Interpretation
Performance Benchmarks
- Qwen2.5-72B-Instruct achieved 85.4% on MMLU benchmark
- Qwen2-72B-Instruct scored 84.2% on MMLU 5-shot
- Qwen1.5-72B-Chat reached 78.1% on MMLU
- Qwen2.5-7B-Instruct got 70.5% on HumanEval coding benchmark
- Qwen2-1.5B-Instruct scored 55.3% on GSM8K math benchmark
- Qwen1.5-32B-Chat achieved 82.4% on GPQA Diamond
- Qwen2.5-72B scored 89.3% on MMLU-Pro
- Qwen2-72B-Instruct 76.2% on LiveCodeBench
- Qwen1.5-7B-Chat 68.9% on MATH benchmark
- Qwen2.5-14B-Instruct 82.1% on IFEval instruction following
- Qwen2-7B scored 71.4% on MBPP coding
- Qwen1.5-4B-Chat 65.7% on ARC-Challenge
- Qwen2.5-1.5B 52.8% on HellaSwag
- Qwen2-72B 88.5% on TriviaQA
- Qwen1.5-110B-Chat 83.2% on Natural Questions
- Qwen2.5-32B-Instruct 84.7% on BBH average
- Qwen2-0.5B-Instruct 48.3% on PIQA
- Qwen1.5-1.8B 60.2% on WinoGrande
- Qwen2.5-72B 91.2% on CEval Chinese benchmark
- Qwen2-7B-Instruct 73.5% on CMMLU
- Qwen1.5-72B 80.9% on C-Eval
- Qwen2.5-7B 69.8% on MultiIF
- Qwen2-14B 78.6% on AlpacaEval 2.0
- Qwen1.5-Chat models average 75.3% on MT-Bench
Performance Benchmarks Interpretation
Release Timeline
- Qwen first released on September 1, 2023
- Qwen1.5 series launched February 1, 2024
- Qwen2 released June 6, 2024
- Qwen2.5 announced September 19, 2024
- Qwen1.5-Chat updated March 2024 with long context
- Qwen-VL first version April 2024
- Qwen2.5-Coder released October 2024
- Qwen2-Math preview August 2024
- Qwen1.5-110B open-sourced March 26, 2024
- Qwen2.5-72B-Instruct on Hugging Face September 2024
- Qwen-Audio launched November 2023
- Qwen2.5-Max previewed October 29, 2024
- Qwen1.5-MoE-A2.7B released April 2024
- Qwen2.5-VL early version October 2024
- Qwen-Long released May 2024 for 1M context
- Qwen2.5-Math full release November 2024
- Qwen1.5-VL-Chat updated July 2024
- Qwen2 mini versions July 2024
- Qwen2.5-32B released September 2024
- Qwen1.5-72B-Chat v1 February 2024
- Qwen2-72B open weights June 2024
- Qwen2.5 series 8 models September 2024
Release Timeline Interpretation
Technical Specifications
- Qwen2.5-72B has 7.37 billion parameters
- Qwen2-72B model supports 128K context length
- Qwen1.5-32B uses Grouped-Query Attention (GQA)
- Qwen2.5-7B-Instruct has 32 layers
- Qwen2-1.5B trained with RMSNorm pre-normalization
- Qwen1.5-110B supports SwiGLU activation
- Qwen2.5-14B has 40 layers and 28 heads
- Qwen2-32B uses 8K vocab size extension
- Qwen1.5-72B context length up to 32K tokens
- Qwen2.5-1.5B employs rotary positional embeddings (RoPE)
- Qwen2-7B-Instruct peak memory usage 16GB FP16
- Qwen1.5-4B has 32 attention heads
- Qwen2.5-72B-Instruct tokenizer vocab size 151k
- Qwen2-0.5B supports multilingual 29 languages
- Qwen1.5-1.8B uses BF16 training precision
- Qwen2.5-32B has hidden size 4096
- Qwen2-72B intermediate size 36864 x 8
- Qwen1.5-Chat models use YaRN for long context
- Qwen2.5-7B peak FLOPs efficiency 45%
- Qwen2-14B-Instruct 28 layers
- Qwen1.5-72B supports vision-language with Qwen-VL
- Qwen2.5-72B uses Tie-Break decoding
- Qwen2-7B has max sequence length 32768
Technical Specifications Interpretation
Training Resources
- Qwen trained on over 7 trillion tokens for Qwen2.5 series
- Qwen2 pre-trained on 7T tokens including code data
- Qwen1.5 used 2.5T multilingual tokens
- Qwen2.5-Coder trained on 5.5T code tokens
- Qwen2 utilized 18T total tokens in SFT and RLHF
- Qwen1.5-110B trained with 10K H800 GPUs
- Qwen2.5-Math on 1T math-related tokens
- Qwen series post-training on 20K high-quality conversations
- Qwen2 long-context trained on 500B extended docs
- Qwen1.5-Chat RLHF with 50K preference pairs
- Qwen2.5 pre-training compute over 20K GPU-hours
- Qwen2 multilingual corpus 2.7T Chinese-English
- Qwen1.5 vision models on 3B image-text pairs
- Qwen2.5-72B SFT on 100B instruction tokens
- Qwen2 code training included 1.2T GitHub repos
- Qwen1.5 distilled from larger models using 5T tokens
- Qwen2.5 alignment with DPO on 200K pairs
- Qwen series used synthetic data generation for 300B tokens
- Qwen2 trained on 92 languages coverage
- Qwen1.5-72B compute equivalent to 10^25 FLOPs
- Qwen2.5-Math used 500B competition problems
- Qwen2 long-context corpus averaged 100K tokens/doc
- Qwen1.5 SFT dataset 15K multi-turn dialogues
Training Resources Interpretation
Sources & References
- Reference 1QWENLMqwenlm.github.ioVisit source
- Reference 2HUGGINGFACEhuggingface.coVisit source
- Reference 3ARENAarena.lmsys.orgVisit source
- Reference 4LEADERBOARDleaderboard.lmsys.orgVisit source
- Reference 5PAPERSWITHCODEPaperswithcode.comVisit source
- Reference 6OPENLEADERBOARDopenleaderboard.vercel.appVisit source
- Reference 7BLOGblog.qwen.aiVisit source
- Reference 8GITHUBgithub.comVisit source
- Reference 9VLLMvllm.aiVisit source
- Reference 10DISCORDdiscord.ggVisit source
- Reference 11ALIBABACLOUDalibabacloud.comVisit source
- Reference 12DASHSCOPEdashscope.aliyun.comVisit source
- Reference 13ARXIVarxiv.orgVisit source






