This project demonstrates how to build a real-time AI application using the Azure OpenAI Realtime API. The demo app features a language coach and a medical form assistant. The language coach allows users to practice speaking a language and get instant feedback on their pronunciation, while the medical form assistant helps users fill out medical forms by conversing with them using their voice.
Language Coach
Medical Form Assistant
- Clone the project.
- Create a
gpt-realtime
model deployment in Azure AI Foundry. - Rename
.env.example
to.env
in the root of the project. - Add your
gpt-realtime
endpoint toOPENAI_ENDPOINT
and your key toOPENAI_API_KEY
. You can get those values from Azure AI Foundry.
OPENAI_API_KEY=
OPENAI_MODEL=gpt-realtime
OPENAI_ENDPOINT=
OPENAI_API_VERSION=2025-04-01-preview
BACKEND=azure
Note: If you'd like to use OpenAI instead of Azure OpenAI, add your OpenAI API key to
OPENAI_API_KEY
and leave theOPENAI_ENDPOINT
blank. Remove the value forBACKEND
.
- Run
npm install
in theclient
andserver
directories. - Run
npm run dev
in theserver
directory. - Run
npm start
in theclient
directory. - Click the
Connect
button in the browser to get started, allow your microphone to be accessed, and start speaking. - Click the
Disconnect
button to stop the session.
If you'd like to use the more secure "keyless" approach with Azure OpenAI, run the following command to add the OpenAI Contributor role to your user principal. Install the Azure CLI if you don't have it on your machine already.
az role assignment create \
--role "Cognitive Services OpenAI Contributor" \
--assignee-object-id "<USER_PRINCIPAL_ID>" \
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>" \
--assignee-principal-type User
Add your subscription ID, resource group, and user principal ID (assigness-object-id) to the command above.
- Run
az login
and select your target subscription. - Get your subscription ID by running
az account list --query "[?isDefault].id" -o tsv
. - Find your user principal ID by running
az ad signed-in-user show --query objectId -o tsv
oraz rest --method GET --url "https://graph.microsoft.com/v1.0/me" --query "id"
.
You can then remove the OPENAI_API_KEY
value your .env
file.
The following diagram illustrates the WebSocket communication flow in the RTSession
class, showing how client messages are processed and relayed to the OpenAI Realtime API.
- Client: This is you—the user interacting with the app via your browser. It sends audio or text inputs (like saying “Hello” or typing a question) to kick things off. It’s written using Angular.
- RealTime Session: The Node.js code where the main action takes place – it manages the flow. It uses a client WebSocket to receive your inputs and send back responses, while a RealTime AI WebSocket connects to the OpenAI API. The logic block processes messages, ensuring everything runs smoothly between the client and the AI.
- OpenAI RealTime API: This is the brains of the operation. It receives audio/text from the Realtime Session, processes it with the gpt-4o-realtime model, and sends back audio/text responses. The app supports calling OpenAI or Azure OpenAI.
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 75}}}%%
graph TD
A[Client] -->|Sends audio/text| B[Client WebSocket]
subgraph Realtime_Session["Realtime Session"]
B[Client WebSocket]
C[Logic]
D[Realtime AI WebSocket]
end
E[OpenAI Realtime API]
B <-->|Receives responses| C
C <-->|Processes messages| D
D <-->|Sends audio/text to Azure OpenAI| E
E -->|Sends audio/text responses| D
classDef sessionLabel font-size:20px;
class Realtime_Session sessionLabel
style A fill:#f9f,stroke:#333,stroke-width:2px,font-size:20px;
style B fill:#bbf,stroke:#333,stroke-width:2px
style C fill:#dfd,stroke:#333,stroke-width:2px
style D fill:#bbf,stroke:#333,stroke-width:2px
style E fill:#f9f,stroke:#333,stroke-width:2px
Thanks to Steve Sanderson for the initial inspiration for this demo.
If you get stuck or have any questions about building AI apps, join:
If you have product feedback or errors while building visit: