LLMs are large neural networks (usually transformer-based) trained to consume human-like language as input and produce human-like language as output, exhibiting emergent reasoning behavior through probabilistic next-token prediction.
Model Name
Company
Year
Parameters (Est.)
Context Length
Performance (MT-Bench / MMLU / HumanEval)
Multimodal
Claude 3 Opus
Anthropic
2024
Undisclosed
200k
MT-Bench: 9.9 / MMLU: 86.8% / HumanEval: 88%+
Yes
GPT-4.5 / GPT-4-turbo
OpenAI
2023
~1.8T (MoE*)
128k
MT-Bench: ~9.9 / MMLU: ~87% / HumanEval: ~83%
Yes
Gemini 1.5 Pro
Google DeepMind
2024
Undisclosed
1M
MT-Bench: ~9.7 / MMLU: ~86% / HumanEval: ~80%
Yes
LLaMA 3 70B
Meta
2024
70B
8k
MT-Bench: ~8.9 / MMLU: 83.2% / HumanEval: ~74%
No
Grok-1.5
xAI
2024
~314B (MoE)
128k
MT-Bench: ~8.7 / MMLU: ~80% / HumanEval: ~72%
No
Mistral Large
Mistral
2024
~12.9B
32k
MT-Bench: 8.6 / MMLU: ~81% / HumanEval: ~69%
No
Mixtral 8x7B
Mistral
2023
12.9B x8 (MoE)
32k
MT-Bench: ~8.5 / MMLU: ~78% / HumanEval: ~65%
No
Command R+
Cohere
2024
~52B
128k
MT-Bench: ~8.4 / MMLU: ~79% / HumanEval: ~66%
No
Phi-3-mini (3.8B)
Microsoft
2024
3.8B
128k
MMLU: ~71% (no MT-Bench)
No
LLaMA 3 400B
Meta (internal)
2024
400B
128k (rumored)
Not benchmarked
Yes
What LLMs do better than existing software
Understand human intent from human language prompts. Software was limited to structured inputs, specifically defined apis that needed to be called. LLMs can take in vague inputs such as “build me a dashboard showing customer churn trends”. This greatly disrupts the UX of existing software.
Handling unstructured data. LLMs can inputs pdfs, emails, images etc.
Reasoning / decomposing tasks. This is still an emerging ability. LLMs can break problems into steps, suggest what data is needed and choose tools (APIs, DBs, Search )
Generate A) Code B) Documents such as Word Docs or Powerpoints.
Where LLMs are actually weak
Accuracy / hallucination
Determinism
Real-time data access (without tools)
Precise computation
Long multi-step workflows without orchestration
LLMs in different verticals
1. General-Purpose / Chat Assistants (Horizontal)
Company
Product
Underlying Model
Primary Use
Differentiation
OpenAI
ChatGPT
GPT-4.1 / GPT-4o
General chat, reasoning
Best ecosystem + tool use
Anthropic
Claude
Claude 3.x
Long-context chat
Strong safety + context window
Google
Gemini
Gemini 1.5
Multimodal assistant
Native search + YouTube
Microsoft
Copilot
GPT-4 (Azure)
Enterprise chat
Deep M365 integration
Meta
Meta AI
LLaMA 3
Consumer chat
Distribution via WhatsApp
2. Coding / Software Development
Company
Product
Model
Target User
Notes
OpenAI
Codex
Codex / GPT-4
Developers
Code-native reasoning
GitHub
Copilot
GPT-4
Developers
IDE-embedded, massive adoption
Anthropic
Claude for Code
Claude 3
Backend / infra
Strong refactoring
Google
Gemini Code Assist
Gemini
Enterprise devs
GCP + IDEs
Replit
Replit AI
GPT-4 / Claude
Solo builders
End-to-end app build
3. Enterprise Productivity / Knowledge Work
Company
Product
Vertical
Differentiation
Microsoft
Copilot for M365
Docs, Excel, Email
Deep workflow lock-in
Google
Gemini for Workspace
Docs, Sheets
Search + data leverage
OpenAI
GPTs (Custom)
Internal tools
Low-code AI apps
Notion
Notion AI
Knowledge mgmt
Context-aware writing
4. Medical / Healthcare (Regulated & Semi-Regulated)