04 · Consilium

One debate, two flagship AIs, one tested answer.

Consilium is a debate arena for hard questions. Ask one, and two of the strongest AI models go at it, one as host, the other as invited expert, until they agree or make clear exactly where they clash. You read along, steer, and can add a third model. What's left is a tested answer, with conclusions you can export and share straight away.

In production

Role

Originator & builder

Timeline

In production, extended modularly

Team

Solo build

Stack

Claude + Gemini · Hono · SQLite · Railway

debate · several rounds Claude host Gemini expert Tested answer consensus or clear difference
Claude as host, a second model as expert, you read along, steer, and add a third voice if you want
01 · the problem

One model gives you one opinion, not the test.

Ask an AI for a hard judgement and you get one answer, full of confidence, but you've no idea whether it holds up or where it wobbles. A model agrees easily, and every model has its own blind spot. For a real decision that's too thin.

Consilium grew out of my own need. I was building a photo pipeline with Gemini and groping in the dark about the right approach. What if Claude and Gemini could work it out together? That's how the idea was born, and it grew into a full opinion panel for my hardest questions.

01

One perspective

One model, one lens. You see the answer, but not the assumptions under it or the place where it breaks.

02

No pushback

An AI nods along easily. Without someone arguing the other way, you miss exactly the weak spot.

03

Models think differently

Claude weighs things differently from Gemini or GPT. Different training, different angles. One voice doesn't make use of that.

Put two of the strongest models against each other, let them fight it out, and what's left is tested from both sides.

02 · the debate

Not one opinion. An answer that survives the friction.

Two models each take a position, go back and forth for rounds, and what they hold up together is the tested answer. Where they keep clashing, you read that too.

CClaudehost

"Ignoring 30% of your revenue from one client because it doesn't fit the roadmap is a luxury you can't afford right now."

GGPTexpert

"That's exactly why you should be careful. Exclusive bespoke work fragments your product and deepens a dependency that's already large."

=Consensus3 rounds

Build it as product value, not as client size. Decide on the feature for everyone, not on this one client.

The crux is in the follow-up question. After each round there's not just an interim standing, but also the question that opens the next layer. That's what makes a debate a deep dive instead of a round of opinions.

03 · up close

The application itself, in production.

Five screens, each with its piece of the story above it: how it works, and what it solves. Consilium is at its best on desktop, so I'm leaving the mobile version out here.

01 · New debate

Pick an arena, ask your question.

A debate starts with focus. Pick the arena, from product and strategy to society, and frame what you really want to know. Optionally you pass in your current best answer to peel apart.

Key takeaway

A good question in the right context yields a sharper debate.

Consilium — New debate
Start a new debatearena
General
free exploration, technology
Product & Strategy
building, positioning, go-to-market
Technology & AI
models, architecture, ethics
Society
policy, ethics, the long term
What do you want to discuss?
Our biggest client, worth 30% of revenue, is asking for an exclusive bespoke feature that doesn't fit the roadmap.
Start
02 · Model selector

Who do you put against each other?

Fully modular. Pick two flagship models, each with its own strengths, and add a third if you want. Claude, GPT, Gemini, Llama, you assemble the panel that fits your question.

Key takeaway

Different strengths against each other give a sharper, tested answer.

Consilium — Model selector
Who do you put against each other?pick two
C
Claude
Anthropic

Nuanced and thorough. Strong on trade-offs and long lines of reasoning.

nuancereasoning
G
GPT
OpenAI

Broad and fast. Strong on synthesis and clear structure.

synthesisbreadth
Gemini
Google

Multimodal and factual. Strong on current context and large data.

multimodalfacts
L
Llama
Meta

Open weights. Fully open and able to run locally.

open sourcelocal
03 · The debate

Read along while they fight it out.

Round by round the models take a position, rebut each other and sharpen the question. You read along, steer, and see exactly how the answer takes shape.

Key takeaway

No black box: you see the reasoning, not just the outcome.

Consilium — The debate

Do we build bespoke for our biggest client?

C
Claude
argues for building
vs
G
GPT
argues for holding focus
Round 1 · opening position
CClaudebuild

30% of your revenue isn't a detail, it's your right to exist. A strong relationship compounds: they stay, they introduce, they grow with you.

GGPThold focus

One side steering your product is the start of a consultancy, not a product. 30% with one client isn't strength, it's a single point of failure.

04 · Summary

The tested answer, plus where they clashed.

At the end there's a clear answer, with a degree of agreement, the points where they came together, and honestly too where the difference stayed. Ready to export and share.

Key takeaway

A conclusion you can trust, because you see how sharp the consensus is.

Consilium — Summary
The tested answerhigh consensus

Don't build it as exclusive bespoke work under their deadline. Decide on product value, not on client size, and productise the feature for everyone, on your terms and timing.

86%
Where they agreed
Decide on product value, not on client size.
Generalising is fine, exclusive bespoke work isn't.
Where they clashed
Does the revenue risk weigh heavier than the damage to your roadmap?
Can you say no without losing the client?
05 · Archive

Every conclusion neatly kept.

Every debate lands in the archive, findable, manageable and shareable. A growing library of tested answers to your hardest questions.

Key takeaway

Your decisions and their grounding stay kept, not fleeting.

Consilium — Archive
Archivetested answers
Do we build bespoke for our biggest client?
product & strategy · Claude × GPT · 3 rounds
86%
Which model for our RAG pipeline?
technology & AI · Claude × Gemini · 5 rounds
92%
Remote-first or hybrid after the growth?
work & life · Claude × GPT · 4 rounds
71%
04 · under the hood

Modular, and deliberately kept clean.

Every model is a plug. Claude, GPT, Gemini, Llama, you pick host and expert, and set the number of rounds, ten by default, until they reach consensus or clearly name the difference. After each round an interim standing with a follow-up question, the engine under the deep dive. Conclusions go into the archive, exportable and shareable. Consilium belongs to the same family as Open Brain, the layer beneath everything, but I deliberately don't feed the debate from it: a judgement has to form cleanly, without polluted context. Only writing back to Open Brain is a logical next step.

ClaudeGeminiGPTHonoSQLiteRailwayConfigurable rounds
05 · the setup
2
flagship models against each other, host and expert
+1
third model added for a broader picture
10
rounds by default, until consensus or clear difference
5
arenas, from strategy to society

Every major model is selectable, and it's often precisely the differences that make it interesting. They're clearly trained differently, and those other angles are exactly what opens a question up.

06 · in production

The real screens.

Not dressed up, just proof that it runs. This is Consilium as it stands day to day.

Summary
Real capture
assets/consilium-01-samenvatting.png
Summary

Summary. The tested answer.

New debate
Real capture
assets/consilium-02-nieuwdebat.png
New debate

New debate. Arena and question.

The debate
Real capture
assets/consilium-03-debat.png
The debate

The debate. Round by round.

Model selector
Real capture
assets/consilium-04-modelselector.png
Model selector

Model selector. Who debates.

07 · the value

An opinion panel for your hardest decisions.

Born out of my own need, grown into a tool I wouldn't want to do without. For an organisation it's the same: for a heavy choice you don't want any AI that bends along, but pushback, deliberation and an answer that passes the test. You're no longer alone with one model.

Pushback built in

No yes-nodding AI.

Two models force each other sharp. The weak spot surfaces because someone actively argues the other way.

Tested, not guessed

An answer with a degree of agreement.

You see not just the conclusion, but also how sharp the consensus is and where the difference stayed.

Fully modular

Your panel, your rounds.

Pick the models that fit, add a third, set the number of rounds. The panel follows the question.

Shareable & kept

Every conclusion exportable.

Answers go neatly into the archive, ready to share and to find again when the question comes back.

Part of the same family as Open Brain, the layer beneath everything, deliberately kept separate to keep the debate clean.

A hard question? I'll happily have it fought out live.

Consilium runs in production. Give me a real question, and we'll put two models against each other so you can see a tested answer take shape live.