Gemini (Google) - capabilities
The long-context heavyweight and image powerhouse of the big three: reach for Gemini when you need to feed in an entire document or generate slide-grade visuals. A standard member of our three-model toolkit alongside Claude and ChatGPT. Our view from 500+ client engagements; capabilities evolve quickly.
Best at
- Huge context windows (~1M tokens) - reading entire documents end-to-end
- Full-context injection alongside Claude for long memos and contracts
- Image generation via Nano Banana Pro (a reasoning image model, 4K, editable)
- Accessible programmatically via the AI Studio API + MCP
- Strong general reasoning (Gemini 3.x) in the frontier pack
Capability snapshot
| Capability | Verdict | What that means |
|---|---|---|
| Long documents / context | ✅ Leads | Our go-to for ~1M-token context and true full-context reading of whole documents, not just scanning. |
| Image generation | ✅ Strong | Nano Banana Pro "thinks before it creates" and follows prompts precisely - slide-grade visuals. |
| AI agents / API | ✅ Strong | Exposes an AI Studio API he wires into MCPs so agents can generate images on command. |
| Overall capability | ✅ Strong | A genuine frontier model - he recommends trying it for a month just to experience what's possible. |
| Reasoning | 🟡 Capable | Strong in the frontier pack; he also uses it as the live example of how any LLM can guess wrong on hard maths. |
| Cross-model use | 🟡 Capable | Run next to Claude and ChatGPT as a standard multi-model practice (challenge them against each other). |
| Workspace integration / grounding | 🟡 Capable | Part of the Google ecosystem; direct workshop evidence on Workspace/deep research is thinner. |
In Wouter's words
Gemini is actually very good at huge context limits.
What is so special about Gemini Nano Banana Pro is that it's a reasoning image model - it thinks before it creates, which means it follows your prompt much more precisely.
Watch-outs
- Like all LLMs, it guesses on complex calculations - he uses Gemini as the live example of a confidently-wrong answer on hard maths.
- Big context is double-edged: don't dump everything in, or the model loses context awareness; truly massive inputs still hit a limit.
- Image editing degrades after about four or five edit iterations on a slide.
- Our evidence on Workspace integration is thinner than for Claude/ChatGPT - treat those as general practice, not hard Gemini-specific claims.
Our take
We treat Gemini as a genuine frontier model and a default member of our three-model toolkit - the one to reach for when context size matters, because it actually reads every page rather than skimming. We're especially keen on Nano Banana Pro as a reasoning image model for slide-grade visuals, and on the AI Studio API that lets us wire Gemini into agents and MCPs. The honest caveat: big context only helps when you're disciplined about what you feed it, and like any model it can confidently guess wrong on hard maths.
Just Gemini - I can tell you it's mind-blowing what's possible these days.
- Wouter van Haaften, WAIMAKERS
