SAN FRANCISCO: OpenAI on Thursday added three voice models to its application programming interface, expanding its tools for developers building software that can handle spoken interactions in real time. The release includes GPT-Realtime-2 for live voice conversations, GPT-Realtime-Translate for speech translation and GPT-Realtime-Whisper for streaming transcription. The company said the models are designed to let applications listen, respond and complete tasks during a conversation, moving beyond basic speech recognition or text generation alone.

GPT-Realtime-2 is the central release in the update and is positioned as OpenAI’s first voice model with GPT-5-class reasoning. According to the company, it can manage more complex requests, keep context across longer sessions, recover from interruptions and use multiple tools while a conversation continues. OpenAI also said the model’s context window has been expanded to 128,000 tokens from 32,000, giving developers more room to support extended interactions and more detailed task flows inside voice-based products.
The other two models focus on translation and transcription. GPT-Realtime-Translate is built to translate speech from more than 70 input languages into 13 output languages while keeping pace with the speaker, a feature aimed at customer support, education, events and other multilingual settings. GPT-Realtime-Whisper is a low-latency speech-to-text model that transcribes spoken audio as it happens, allowing developers to build live captions, meeting notes and other workflow tools that depend on continuous transcription.
OpenAI expands developer voice tools
OpenAI said companies already testing the models include Zillow, Priceline and Deutsche Telekom. In examples provided with the launch, Zillow is using the technology in a housing assistant that can respond to detailed spoken requests, while Deutsche Telekom is testing multilingual customer support experiences. Priceline was cited as a company working on voice-based travel planning tools that can help users search, modify bookings and receive travel updates through spoken interaction rather than typed prompts.
The models are available through OpenAI’s Realtime API, and the company said developers can test them in its playground. Pricing starts at $32 per 1 million audio input tokens for GPT-Realtime-2, with output audio priced separately at $64 per 1 million tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper is priced at $0.017 per minute. The announcement places the products directly inside OpenAI’s existing developer platform rather than as standalone consumer features.
Safety measures outlined
Alongside the release, the company detailed safeguards tied to the Realtime API. OpenAI said it uses active classifiers on realtime sessions and can halt certain conversations if they are found to violate harmful-content rules. Developers can also add their own controls through the company’s software tools. OpenAI said its usage policies prohibit using outputs for spam, deception or other harmful purposes, and it requires developers to make clear when end users are interacting with artificial intelligence unless that is already obvious from context.
The launch builds on OpenAI’s broader expansion of audio and realtime tools over the past year, including earlier updates to its Realtime API and speech models. This release brings those capabilities together in a package focused on live voice interaction, combining reasoning, translation and transcription in one announcement for developers. With the latest update, OpenAI is widening the set of voice functions available through its API for customer service, travel, enterprise workflow and multilingual communications. – By Content Syndication Services.
