Real-Time Voice AI for Medical Oral Boards: Sub-Second Clinical Guidance
Built a voice-activated AI clinical assistant that transcribed oral board exam conversations in real time, generated continuous GPT-4o clinical reasoning, and delivered cached responses through an earpiece with sub-second latency.

The Client
RoundSmarter is a healthcare AI company founded by a practicing physician who manages 6 nursing homes with 1,500 patient encounters per month. The founder also serves as Chief Innovation Officer at Madrina, the largest physician staffing company in the US for post-acute care—1,000 doctors and 6 million patient encounters per year. His core product, RoundSmarter, is an AI documentation tool for nursing home physicians with trials launching in early 2025.
The founder had an ambitious proof-of-concept in mind: demonstrate that AI could provide real-time clinical reasoning assistance during medical oral board exams. The demo targets were program directors at Yale and the University of Miami’s Physical Medicine & Rehabilitation residency programs. These are people who are difficult to schedule and get one shot with—the demo had to be flawless.
The Challenge
Medical oral board exams are high-pressure, structured conversations. A case presenter describes a patient scenario, and the physician must synthesize clinical information, identify concerns, ask targeted questions, and propose management plans—all in real time. The challenge was building an AI system that could participate in this flow invisibly: listening to the conversation, reasoning about the clinical content continuously, and delivering guidance through an earpiece only when the physician naturally asks for it.
The technical constraints were severe. The system had to run completely hands-free for 3+ hours of back-to-back cases with no UI, no buttons, no manual interaction—just voice triggers embedded in natural speech. Triggers had to release pre-computed responses instantly, not initiate computation, because any perceptible latency would break the demo’s credibility. The system needed to handle multiple cases cleanly with no context bleed between patients, survive network hiccups silently, and run on a dedicated Mac Mini with a professional microphone and Bluetooth earpiece.
Oral board etiquette added another constraint: silence signals attentiveness, and over-talking is penalized. The AI could never speak without being explicitly triggered, and responses had to be concise enough to absorb and relay naturally.
Our Solution
We built a Python asyncio application orchestrating three concurrent processing loops behind a 5-state deterministic state machine. The ASR loop streams audio continuously to Deepgram’s nova-2-medical model for real-time transcription optimized for medical terminology. The compute loop runs every 4 seconds during active case intake, sending the current transcript to GPT-4o with two specialized clinical prompts—one for analysis (synthesis, concerns, diagnostics, targeted questions) and one for action plans (problem list, safety, rehabilitation, disposition). Both responses are cached and continuously updated as more of the conversation is transcribed.
Three trigger phrases control the system through fuzzy substring matching to accommodate natural speech variation: “new case” resets context and begins fresh intake, “let me think” releases the cached clinical analysis through OpenAI TTS to the earpiece, and “what I would do is” releases the cached management plan. The key architectural insight is that triggers release pre-computed results rather than initiating computation—eliminating perceived latency entirely.
We implemented a rolling transcript buffer with bounded memory (50K characters) using a recent-verbatim/older-summarized strategy to support 3+ hour sessions without degradation. Error handling follows a silent resilience philosophy: Deepgram disconnects trigger automatic reconnection with exponential backoff, OpenAI failures fall back to stale cache, and TTS failures result in silence rather than error messages—because interrupting the demo with an error is worse than a missed response.
The project was delivered at $3,500 fixed-price across 4 milestones, each with clear acceptance criteria: core runtime and state machine, audio capture and ASR pipeline, background processing and speak-on-cue, and final integration with stability hardening.
The Impact
The system achieved its primary goal: a flawless, hands-free clinical AI demo capable of running unattended for 3+ hours across multiple back-to-back cases. Trigger-to-speech latency was sub-second thanks to the pre-computation architecture—the physician says “let me think” and hears clinical guidance immediately, creating a seamless conversational flow indistinguishable from natural deliberation.
The proof-of-concept validated a broader thesis for the client: that real-time AI clinical reasoning is technically feasible and practically useful in medical education contexts. This demonstration opens the door to integration with RoundSmarter’s clinical data lake for RAG-enriched guidance and potential commercialization. The academic partnerships with Yale and University of Miami residency programs position the technology at the intersection of medical education and AI innovation—a space where the client’s unique combination of clinical practice, AI expertise, and institutional relationships creates a significant competitive advantage.

