Key Takeaways
- GPT-4o supports a context window of 128,000 tokens for input.
- Claude 3.5 Sonnet has a 200,000 token context window.
- Gemini 1.5 Pro offers up to 1 million tokens in context window.
- GPT-3.5 Turbo has 16,385 token context window.
- Llama 3.1 8B processes 50 tokens/second on A100 GPU.
- Mistral 7B Instruct achieves 70 tokens/sec inference speed.
- GPT-4 Turbo input speed 4000 tokens/sec.
- Llama 3.1 405B requires 810 GB VRAM for 128k context.
- Mixtral 8x22B uses 140 GB RAM at FP16 for full context.
- RAG systems with LlamaIndex reduce context by 70% via retrieval.
- LangChain RAG pipelines achieve 25% accuracy boost on HotpotQA.
- FAISS index retrieval latency averages 5ms for 1M docs.
- Llama 3.1 MMLU score 88.6% with 128k context.
- GPT-4o achieves 88.7% on MMLU benchmark.
- Claude 3.5 Sonnet GPQA score 59.4%.
Model context protocols cover window sizes, speeds, VRAM, RAG metrics, benchmarks.
Benchmark Performance Scores
Benchmark Performance Scores Interpretation
Context Window Capacities
Context Window Capacities Interpretation
Memory Consumption Stats
Memory Consumption Stats Interpretation
Retrieval Augmentation Metrics
Retrieval Augmentation Metrics Interpretation
Token Processing Speeds
Token Processing Speeds Interpretation
Sources & References
- Reference 1OPENAIopenai.comVisit source
- Reference 2ANTHROPICanthropic.comVisit source
- Reference 3BLOGblog.googleVisit source
- Reference 4AIai.meta.comVisit source
- Reference 5MISTRALmistral.aiVisit source
- Reference 6COHEREcohere.comVisit source
- Reference 7DEEPMINDdeepmind.googleVisit source
- Reference 8HUGGINGFACEhuggingface.coVisit source
- Reference 9Xx.aiVisit source
- Reference 10AZUREazure.microsoft.comVisit source
- Reference 11DATABRICKSdatabricks.comVisit source
- Reference 12AI21ai21.comVisit source
- Reference 13PLATFORMplatform.openai.comVisit source
- Reference 14ARTIFICIALANALYSISartificialanalysis.aiVisit source
- Reference 15AIai.google.devVisit source
- Reference 16ARXIVarxiv.orgVisit source
- Reference 17LLAMAINDEXllamaindex.aiVisit source
- Reference 18PYTHONpython.langchain.comVisit source
- Reference 19GITHUBgithub.comVisit source
- Reference 20PINECONEpinecone.ioVisit source
- Reference 21WEAVIATEweaviate.ioVisit source
- Reference 22HAYSTACKhaystack.deepset.aiVisit source
- Reference 23DOCSdocs.trychroma.comVisit source
- Reference 24DOCSdocs.llamaindex.aiVisit source
- Reference 25MICROSOFTmicrosoft.github.ioVisit source






