Transcribe audio files to text using multiple providers:
Copy
import { transcribe, Media } from '@runflow-ai/sdk';// Standalone function (default: OpenAI Whisper)const result = await transcribe({ audioUrl: 'https://example.com/audio.ogg', language: 'pt',});console.log(result.text); // "Olá, como vai?"// Using specific providerconst result2 = await transcribe({ audioUrl: 'https://example.com/audio.ogg', provider: 'deepgram', language: 'pt',});// Or via Media classconst result3 = await Media.transcribe({ audioUrl: 'https://example.com/audio.ogg', provider: 'openai',});
Configure agents to automatically handle audio and image files. When a user sends a voice message, it’s transcribed before processing. When they send an image, it’s analyzed with vision capabilities.
Copy
import { Agent, openai } from '@runflow-ai/sdk';const agent = new Agent({ name: 'WhatsApp Assistant', instructions: 'You are a helpful assistant.', model: openai('gpt-4o'), media: { transcribeAudio: true, processImages: true, audioProvider: 'openai', audioLanguage: 'pt', },});// Audio files are automatically transcribed before processingconst result = await agent.process({ message: '', file: { url: 'https://zenvia.com/storage/audio.ogg', contentType: 'audio/ogg', caption: 'Voice message', },});// Images are automatically processed as multimodalconst result2 = await agent.process({ message: 'What is in this image?', file: { url: 'https://example.com/image.jpg', contentType: 'image/jpeg', },});
A complete WhatsApp agent that handles text, voice messages, and photos. Users can send a voice message to explain their issue or a photo of a damaged product.
import { Agent, openai } from '@runflow-ai/sdk';export const whatsappAgent = new Agent({ name: 'WhatsApp Support', instructions: `You are a customer support agent for WhatsApp.## Behavior- Respond in the customer's language- Be concise — WhatsApp messages should be short- When the customer sends a voice message, you'll receive the transcription — respond naturally- When the customer sends a photo, analyze it and respond accordingly## Tools- Use create-ticket when the issue needs human follow-up- If a customer sends a photo of a damaged product, create a ticket with priority 'high'`, model: openai('gpt-4o'), memory: { maxTurns: 30 }, media: { transcribeAudio: true, processImages: true, audioProvider: 'openai', audioLanguage: 'pt', }, tools: { createTicket: createTicketTool, }, observability: 'full',});
For WhatsApp agents, always enable both transcribeAudio and processImages. Users frequently send voice messages instead of typing, especially on mobile.
Audio transcription adds latency to your agent’s response (typically 1-3 seconds depending on audio length). Consider tracking transcription time with track() to monitor performance.