Live Experiment / Model Behavior Playground

Blooming

What happens when you stop treating AI models as interchangeable and start working with their personalities?

Model Behavior Playground GPT vs Claude

The Problem

GPT and Claude don't think the same way. Anyone who uses both knows this. Claude tends to be more cautious and structured. GPT tends to be more generative and exploratory. As a daily user of both, I naturally started splitting tasks between them based on these personality differences.

But the workflow was painful. Every time I switched models, I had to copy-paste the entire context. Conversations didn't carry over. Insights from one session were invisible to the next. I was doing the integration work that should have been automatic.

And underneath the workflow friction, a deeper question kept surfacing: if these models genuinely behave differently, what happens when you make them work together? Does the output get better? Does it get worse? Does "personality" even matter for quality, or is it just a feeling?

The Approach

I built Blooming as a single interface where multiple models coexist. Not a model comparison tool. A playground for observing how models behave differently, and whether those differences can be leveraged.

The core experiments I'm running:

Cross-model debate — Claude and GPT take different perspectives on the same question and work toward a shared conclusion. Does structured disagreement between models produce better reasoning than either alone?
Model-generated questions — Instead of me asking the question, one model generates a better-structured version of my question, and the other answers it. Does letting a model reframe the prompt improve the response?
Persistent context across sessions — Conversations share context through folders. Knowledge accumulates across sessions without manual copy-paste. A right-panel summary visualizes what has been learned.
Personality-aware task routing — Assigning tasks based on observed model strengths rather than treating all models as identical.

Demo

Architecture

What I'm Observing

Claude and GPT show distinct reasoning patterns on the same input. These differences are consistent, not random.
When models cross-question each other, the second-round responses are measurably more structured than the initial answers.
Structured context (folder-based, accumulated across sessions) significantly improves response consistency compared to single-session conversations.
Model "personality" is not just a user perception. It produces different outputs in ways that affect usefulness for specific tasks.

What's Next

Quantifying quality improvement from cross-model interaction (baseline vs. debate vs. cross-questioning)
Building the persistent context folder system with cross-session knowledge visualization
Expanding to more models to test whether the behavioral patterns hold beyond Claude and GPT
Publishing experiment results on Substack as they emerge

Try Blooming live →