Frontier-Model Notes: Picking a Brain for My Agent

Translation in progress

An ongoing log of testing frontier models — Claude, GPT, Gemini, DeepSeek, MiniMax, and Zhipu GLM — not just as raw models but through each vendor's own chatbot and agent products. The goal started practical: find the right brain to wire into Nova, my agent running on OpenClaw, and learn which model fits which scene. I compare context handling, multimodality, cost, hard-task quality, response speed, and specific generation skills (image / video / voice / text). It's a living document, not a finished scorecard — and it's honest about its limits: I never benchmarked coding on purpose (I leave coding to Gemini and Claude), and ~80% of my use is Chinese, so I didn't deliberately compare Chinese vs. English.

The Chinese version below covers the motivation, the evaluation dimensions and their boundaries, my (admittedly informal) methodology, a real usage history for each model, and the findings I keep revising.

Read the full Chinese version Back to projects

Frontier-Model Notes: Picking a Brain for My Agent

English summary