AI Product + Model Behavior

I translate research into products people actually use.

I build at the edge of research, product, and real-world use. After shipping systems across fintech, government, commerce, and AI, I'm now focused on how AI behaves in the contexts where people actually use it.
Harvard Design Engineer 2026 8+ years Product AI Behavior Evaluation

About
The short version

I'm a product builder and founder with 8+ years of experience shipping systems used by tens of millions of people across fintech, government, commerce, and AI. My work has always sat between complex systems and real-world adoption: finding the research, translating it into product decisions, and shipping things people actually use.

At Harvard MDE and Scale AI, that focus led me to model behavior. I became interested in a gap that benchmarks often miss: AI can be fluent, safe, and factually correct, yet still fail in the social contexts where people actually use it: workplaces, healthcare, public services, education, and other high-context domains.

I'm now building tools for understanding and improving how AI behaves in real-world human contexts, combining model behavior evaluation, human-centered research, and product systems.

The next AI product bottleneck is not just capability. It is behavior in context.

52M
People reached through national-scale digital products
8+ yrs
Product leadership across fintech, government, commerce, and AI
45+
Frontier models evaluated across reasoning, multimodal, and behavior tasks
200+
Evaluations of model behavior against human expectations

Projects
What I built
Kairos
Kairos· 2025-26

Model behavior infrastructure that takes AI from accurate to contextually reliable, making socially misaligned answers observable and steerable.

AI Evaluation Model Behavior Cross-Cultural
View details →
MemeSonic
MemeSonic· 2026

How should a meme sound? Modeling image-text conflict to understand memes and turn their affect into expressive voice.

Multimodal Affective Computing Voice AI
View details →
Blooming
Blooming· 2026

A model behavior playground that continuously tests how different models and versions respond, shared as an open channel.

Model Behavior Playground GPT vs Claude
View details →
ChoLab
ChoLab· 2025

A cholera-detecting device that cuts a $1,000 lab test to $10, handing testing and agency back to local communities. iF Design Award Gold.

Biotech Design Engineering India
View details →
Medly
Medly· 2024

An OTC AI pharmacist agent that closes the information gap around everyday medication. 500 users from zero, seed funded.

AI Agent Healthcare 0 to 1
View details →
AIRQUA
AIRQUA· 2024

An atmospheric water generator harvesting about 300ml of clean water per hour from humid post-flood air. Harvard President's Challenge, 1st Place.

Hardware Social Impact Thailand
View details →

Experience
Where I built things
2025
Scale AI
Researcher, Humanity's Last Exam
Analyzed behavioral patterns across 45+ frontier AI systems. Built structured taxonomy for evaluation and prioritization. Classified failure modes by type, complexity, and trigger conditions. A/B tested rubric formats for SFT annotation quality. Experienced firsthand the gap between benchmarks and real user expectations, which became the motivation for Kairos.
AI Research Model Evaluation
2024 - 2026
Harvard University
Master in Design Engineering
Built 4 ventures across hardware, biotech, and AI. Teaching Fellow for Product Management and HCI courses. Campus Ambassador for Anthropic Claude: organized 5 non-technical hackathons, industry-academy exchange sessions, and published guides on using Claude with non-technical students. Firsthand experience of how real students interact with and feel about AI. XR Club President (60 members, 600+ participant conference with Meta, Nvidia, Samsung).
Design Engineering Teaching Fellow Claude Ambassador
2020 - 2024
Toss (Viva Republica)
Lead Product Owner, promoted to Domain Leader
Korea's leading fintech platform. Joined at 300 employees, grew to 4,000. Promoted to youngest Domain Leader. Led vision, strategy, and roadmap across 5 product verticals and 50+ cross-functional teams. Drove 15M+ user inflow, improved MoM retention from 70% to 85% through behavioral cohort segmentation and rapid A/B experimentation. Digital Platform Government kickoff member: led digital government services for 52M citizens including tax payments, vaccine certification, and public document issuance. Ministerial Commendation under the Presidential Digital Platform Government initiative. Opened new revenue category through investment ad platform in regulatory sandbox.
Product Management Growth Government
2017 - 2020
LG Fashion (LF Corp)
E-Commerce Product Manager
Full-stack PM across 7-domain e-commerce platform: search/SEO, product discovery, checkout, payments, returns, customer service. 2-year call center embed: turned direct user pain into 130+ product fixes, reduced inquiry rate from 15% to 5%, generating ~$770K/month in savings. Launched Korea's first ML-based Virtual Fitting service end-to-end, coordinating with Berlin AI research team. Showcased at CES 2019.
Product Management ML/AI E-Commerce

Approach
How I work
01
"I start where the user is, not where the system is."
At LG Fashion, I sat in the call center for two years to see what users actually go through. That became 130 product fixes. At Toss, user interviews and behavioral data drove every growth experiment. At Scale AI, I looked past benchmarks to ask what real users experience. Kairos started the same way: not from model architecture, but from what 50+ users across three cultures actually expect.
LG Fashion call center Toss UT/IDI Scale AI Kairos 50+ users
02
"I move to where I don't belong."
Fashion e-commerce to fintech to government to AI. Each time, a domain I'd never worked in. At Toss, I deliberately rotated through public services, user growth, and investment, choosing unfamiliar territory each time. The pattern isn't restlessness. It's how I learn fastest and find what others miss.
4 domains 4 Toss silos 4 Harvard ventures
03
"I build to learn."
Four ventures at Harvard in two years across hardware, biotech, and AI. Thailand, India, Dubai, Germany. Not to build four companies, but to repeat 0-to-1 in different contexts until the pattern became clear: every domain has a version of the same question. Is the technology actually helping the person on the other side?
AIRQUA Thailand ChoLab India Medly US Kairos cross-cultural
Want to know more?

Recognition
Awards and honors
Berkman Klein Center Incubator
AI Behavior Public-Interest Technology
Harvard Berkman Klein Center
2026 · AI behavior evaluation venture
Learn more ↗
iF Design Award Gold
Design Engineering Health Innovation
iF International Forum Design
2026 · ChoLab
Learn more ↗
Harvard President's Innovation Challenge, 1st Place
Social Impact Climate Resilience
Awarded by President Alan Garber
2025 · AIRQUA
Learn more ↗
Ministerial Commendation
National-Scale Implementation Public Sector
Ministry of Interior and Safety
2022 · Digital Platform Government
Learn more ↗
Global Top 100 Prototype
Social Impact Future Design
Dubai Future Foundation
2025 · ChoLab
Learn more ↗
1st Place, Global AI Health-Tech Competition
AI Health Product Design
MIT PathCheck Foundation
2024 · Medly
Learn more ↗

Research
Recent papers
Whose Expectations Do LLMs Follow? Evaluating Socially Situated Appropriateness in AI Advice
AI Behavior Human Evaluation Model Alignment
Working paper / under review · 2026
A framework for evaluating when an AI response is technically correct but socially misaligned with the expectations, constraints, and contexts of real users. This work studies the gap between benchmark performance and human expectations, and explores lightweight steering methods for improving model behavior in context.
MemeSonic: Affective Meme Audio Generation
Multimodal AI Affective Computing Voice AI
Research paper · 2026
An incongruity-aware framework for meme understanding and expressive speech generation. This work models image-text conflict as a continuous signal, uses adaptive fusion strategies across meme understanding tasks, and shows that aligned expressive speech can improve tri-modal sentiment understanding.
Cleaning the Right Coasts: An Auditable Framework for Equity-Aware Marine Debris Allocation
Applied AI Machine Learning Social Impact
Workshop submission · 2026
An applied research project exploring how AI can support marine debris detection, monitoring, or intervention in environmental contexts. The work reflects my broader interest in translating technical research into systems for public and planetary impact.

Writing
Thinking out loud
I write about AI model behavior, cultural bias, and what it means for AI to treat people well.
I Know This. So What?
Apr 16, 2026
I Know This. So What?
When AI gives you a safe answer and you still leave with nothing. A response can be harmless and still useless.
Read More ↗
The Question I Can't Stop Thinking About
Mar 9, 2026
The Question I Can't Stop Thinking About
Who decides what "helpful" means? From 8 years of growth PM to AI research, the question that connects everything.
Read More ↗
Neutral Isn't Neutral
Mar 7, 2026
Neutral Isn't Neutral
Sycophancy isn't a single message. It's a shift. And what counts as a shift depends on where you're from.
Read More ↗
Read all posts on Substack →

Contact
Let's talk

I'm always happy to talk about AI model behavior, user-facing product, AI safety, or just a scratch of a new idea. If any of that sounds interesting to you, reach out.