Stanford Medicine’s AI Clinical Coach app (AI CliC) promotes conversation and reflection on critical thinking during clinical training and patient care. It uses AI to analyze patient case presentations and provide attending physicians and trainees with feedback and coaching prompts to foster what we call a "Thinking Mindset Revolution," sorely needed in healthcare.

We aim to systematically improve the thinking habits trainees need to master during patient care. It employs our “Thinking Habits Matrix,” a framework we created based on educational research, which breaks down abstract concepts such as self-reflection, metacognition, and critical thinking into six helpful "thinking habits:" self-reflection on frames of mind, knowledge, problem-definition, strategy, solution, and interpretation.

In this product design deep dive, I'll share some thoughts on the background, development, outcomes, and my role in creating AI Clinical Coach.
The Need for Thinking skills education

This diagram from our colleagues at the Stanford Graduate School of Education summarizes their finding that experts across dozens of STEM fields use many of the same problem-solving techniques and that all of them require self-reflection. My colleagues Shima Salehi, Marcos Rojas, Sharon Chen, and Kathleen Gutierrez have demonstrated that this model of "expert problem-solving" extends to clinical practice as well.

Thinking Skills Are The Key to Expert Decision-Making, Yet They're Rarely Taught
It is both intuitively and empirically true that "thinking skills" such as critical thinking, metacognition, self-regulation, and self-reflection distinguish expert problem-solvers from novices across many fields. Yet thinking skills aren't taught in medical training (or, more broadly, at all), leaving the development of these important skills to chance, talent, and trial and error.

There are several barriers to teaching thinking skills: institutional inertia, physician bandwidth, and cognitive overload. A key barrier is the lack of a shared language and frameworks for discussing "thinking." For instance, there is no universal definition of critical thinking, and terms are often confused. "Clinical reasoning" is taught but mainly emphasizes doing over thinking. Even experts struggle to articulate the skills they use. If we can't discuss thinking, how can we teach it?
The future of Thinking in the ai era
Medical students and their patients deserve what we call "metacognitive equity;" all physicians should have the opportunity to develop thinking skills during training. Variations in thinking skills have been a known issue in medical training for decades. These differences translate into variable care quality. Poor care leads to harmful, expensive errors, one of the leading reasons for unnecessary costs in healthcare.

From the beginning, medical education has centered on knowledge above all else: the more you know, the better you are. There are many reasons for this — easier knowledge testing and teaching, for example — but the "knowledge mindset" of medical training is becoming obsolete. Even today, generative AI is quickly becoming the primary source of knowledge in clinical workplaces. Every physician I know uses some sort of AI tool, such as OpenEvidence, to support their decision-making.

If humans are going to remain relevant in this era, we believe we need to re-center medical education toward a "thinking mindset" culture, where the emphasis is on developing the interpretive and judgmental skills needed to apply generated knowledge.

This belief has led us to form the AI Clinical Coach team to actively research the implications of AI on thinking skills through both quantitative and product-led research, testing solutions in real clinical environments.
AI Clinical coach & The THinking habits matrix

The AI Clinical Coach app in 2025.

The Thinking Habit Matrix
The Thinking Habit Matrix
Self-reflective questions for each thinking habit
Self-reflective questions for each thinking habit
AI CliC is an app developed to study how thinking-skills training affects residents and fellows early in their clinical work. It aims to foster a culture of discussing thinking during patient care. We believe studying the necessary affordances for thinking education in clinical settings is as important as studying its effects. The app helps attendings coach trainees to improve their "thinking habits” using the Thinking Habits Matrix, which includes six behaviors as a heuristic for detecting abilities like self-reflection, metacognition, critical thinking, and self-regulation.

For the 2025 pilot, focus was on patient case presentations, a key coaching interaction where trainees report patient details, and attendings listen, question, and provide feedback—often 10+ times daily. These interactions are short and rushed.

AI CliC records, transcribes, and analyzes presentations to detect thinking habits, generating a “Thinking Habits Report.” This report rates the trainee’s thinking, summarizes the case, highlights strengths, and suggests coaching questions for discussion and improvement. The app tracks thinking over time, helping identify blind spots and growth.
My Role
In 2023, I developed Clinical Coach independently as part of my role at Stanford to pilot new technologies. Using ChatGPT, I quickly created a prototype in about a week that included three patients—Albert Einstein, Dorothy Gale, and Homer Simpson—where users addressed their complaints and received feedback via a rubric I designed. Dr. Sharon Chen, a pediatric infectious diseases professor, joined as principal investigator. Inspired by her research on critical thinking, we shifted focus from simulated role-play to coaching trainees on thinking skills during actual clinical work with patients.

Flash forward to now. We've run a pilot study and are expanding our research and product development in several directions in 2026. Here are some of my key contributions to the project:

• Product design lead (UX/UI, information architecture, interaction design)
• Project vision, strategy, budgeting, and funding
• Prompt engineering and model behavior shaping
• Team formation and leadership
• Small language model (SLM) product development

DESIGN PROCESS
Below I'll share some highlights from our design process, including discovery of key constraints and risks, design decisions we made, and the experience of designing alongside our pilot users.
Constraints & Risks
Collaboration with our subject-matter experts, literature review, and direct user observation revealed core constraints and risks that informed our design decisions. Here are a few:

Cognitive overload is a systemic healthcare issue with wide-ranging effects. One of them is an impact on teaching during clinical care. Education is, by nature of working with patients, often deprioritized as a result. Adding a new topic to teach trainees presented additional challenges on top of existing ones.

Even the most experienced attending physicians found it challenging to articulate their thinking processes, let alone detect and teach them to trainees.

Attending physicians may teach a given resident or fellow for a single week or less. User feedback suggested that this inconsistent learning experience prevented continuity of evaluation and growth.

The time constraints and expectations of our users were also important: if it didn't provide value fast, it wouldn't be used.

The risk of exposing patient data by sending data to AI models made a secure solution non-negotiable.

Patient safety was another obvious concern: if the models suggested inappropriate care, could that result in patient harm?
A Participatory Design Process
One unique aspect of our design process was our participatory design methodology. Rather than treat our users as objects of study, we included them in the process as active participants. As I mentioned earlier, we knew our users had to learn about thinking and thinking habits to practice and teach them. To do so, we held five workshops where we both taught and received input and feedback on thinking habits and how to use AI CliC. The workshops, led by Tela Vessa and Dr. Chen, fostered the sense of shared ownership and community we aimed for, and led to a variety of insights, including changes to the language in the Thinking Habits Matrix and a more minimal, faster Clinical Coach experience.

Workshop activities like this helped us teach about thinking habits, while also getting feedback on the language and activities we were integrating into the app.

Why AI?
Our team didn't intentionally create an AI product; our focus is on learning. However, AI offered features that addressed multiple problems.

1. First, it reduced cognitive load by processing many prompts instantly, acting like a teaching assistant.

2.  AI analysis enabled asynchronous coaching without losing detail, as AI doesn't forget in stressful contexts.

3. AI CliC provides longitudinal analysis of multiple sessions, helping users track trends in learners' thinking habits, especially when time is limited.

A user journey map I created early on that aligned our user experience with the AI workflow I designed.

Key Decisions
I want to highlight just a few design decisions we made to address the core constraints and risks I brought up earlier:

Lack of Shared Language/Frameworks: It was clear that onboarding our users to the Thinking Habits Matrix and the concept of thinking skills was a prerequisite for creating a product they use to teach those skills. This was, in part, why we arranged workshop classes with our users.

Lack of Continuity: While our app isn't for grading or formal evaluation, tracking trainees' thinking habits over time was essential. So, we created a color-coded dashboard visible on learners' profiles to track their thinking habits at a glance.

Time Constraints: Coaching interactions were often under a minute, so our application needed to deliver results skimmable in under 30 seconds. We minimized report text and used simple visuals, balancing model intelligence with speed. This required a flexible backend that could switch to new models.

Patient Privacy: We collaborated with our infrastructure team to use SecureGPT, Stanford Health Care's AI platform, for secure access to consumer large language models. Our interface with SecureGPT is REDCap, a data platform used across research projects at Stanford. This gave us free data partitioning features and set us up to conduct research.

Patient Safety: Incorrect clinical advice can be deadly, costly, and damage physicians' trust. To prioritize safety, Clinical Coach was designed not to make medical recommendations, reserving decision-making support for attending physicians. AI CliC only generates content about thinking habits.

AI Clinical Coach tracks learners' thinking habits over time.

Our Prototype Experience

Three of our core pages: the learner profile, the recording screen, and the Thinking Habits Report.

Below you will see a walkthrough of the prototype experience we created for our pilot users:
Product Differentiators
Designed for research: with secure data partitioning and analysis tools through its integration with REDCap.

Refined prompts: Our prompts have undergone nearly 100 iterations through an analytical process that includes peer review of the outputs by our subject matter experts.
Automated refinements are coming: Our coaching questions are rateable in the app, enabling us to refine the outputs based on user feedback in our planned small language model.

Built by subject-matter experts and field-tested: I believe our greatest asset is our community of experts. Our primary subject-matter experts, Dr. Chen and Dr. Kathleen Gutierrez, and our pilot cohort have provided invaluable input on the design and language of the application and the Thinking Habits Matrix.

Small Language Model in 2026: I am developing a small language model trained to detect thinking habits, targeting <3B parameters. A local model will address many of today's privacy concerns and enable our project to deliver greater value and improve over time.

Example coaching questions generated by AI Clinical Coach.

Outcomes
Our pilot study ran from February 2025 through October. We began by running five workshops on thinking habits. In that initial period, we collected baseline data on our users' thinking skills knowledge and teaching habits, provided them with paper "Thinking Habits Matrix" cards to test in their clinical workspaces, and then delivered our AI Clinical Coach app to them for extended use during summer and fall.

We followed up with our users in a few ways: structured postmortem interviews that I designed, AI Clinical Coach "diaries" our users submitted after using the app, and post-intervention evaluations of their thinking skills and teaching habits.
The feedback we received was invaluable. It showed that our hypothesis about a shared language of thinking habits was extremely helpful in enabling communication and education about the topic. Trainees and attendings found it fun and rewarding to explicitly work on thinking habits during clinical practice.

The biggest challenge we encountered was structural: every attending had difficulty simply remembering to use the application during their work, citing existing cognitive overload as the main factor. Our most frequent users adopted the strategy of using the application at the end of the service week or in the evenings as a personal coaching tool.
Reflections
Creating AI Clinical Coach and developing it over the last few years has been a lot of fun. It has opened my eyes to an unacknowledged problem that will only become more important in the AI era: how do we teach critical thinking?

Our 2025 work exposed many areas for improvement, while also validating the basic concept of our product. In the next future, we're focusing on making the app more useful as a personal reflection tool, enhancing the "meta-analysis" or longitudinal evaluation of users' thinking habits, adding a "general problem-solving mode" to enable users more habitually use the app for a variety of problems they may need to solve, developing gamification features, creating a small language model for detecting thinking habits, publishing a "thinking habits" online course, and expanding our research efforts. It's going to be a busy year!

More Projects

Back to Top