AI Champions League - Which LLM to embed into a digital processes?


DESCRIPTION – A framework and report designed to compare LLMs across key dimensions: their Speed (Time to First Token, Tokens per Second/Throughput), their Efficiency (token consumption in/out, context window utilisation, pricing model), and their Risk (temperature, confidence/log probabilities, hallucination rate). The ultimate goal is to determine which model performs best and to understand why, by auditing all chess moves with the above metadata to analyse deductive and inductive reasoning and style of output.

Tooling – I built this app (see iframe below) and deployed it as a Docker container on Google Cloud Platform using Cloud Run. Core technologies include React (frontend framework), TypeScript and JavaScript (languages), Vite (build tool), Tailwind CSS (styling), chess.js (game logic), and several supporting libraries. All results and log files are stored on the server. Note: You will need to provide your own API keys to run the app, as its not cheap running these in autoplay.


Google OpenAI Anthropic DeepSeek Mistral IBM

GCP Serverless App URL,

AI Comparison Reoport Access
We'll never share your email with anyone else.