Skip to main content
Process audio, images, and other media types automatically.

Audio Transcription

Transcribe audio files to text using multiple providers:
import { transcribe, Media } from '@runflow-ai/sdk';

// Standalone function (default: OpenAI Whisper)
const result = await transcribe({
  audioUrl: 'https://example.com/audio.ogg',
  language: 'pt',
});

console.log(result.text); // "Olรก, como vai?"

// Using specific provider
const result2 = await transcribe({
  audioUrl: 'https://example.com/audio.ogg',
  provider: 'deepgram', // openai | deepgram | assemblyai | google
  language: 'pt',
});

// Or via Media class
const result3 = await Media.transcribe({
  audioUrl: 'https://example.com/audio.ogg',
  provider: 'openai',
});

Supported Providers

ProviderStatusDescription
openaiโœ… AvailableOpenAI Whisper (default)
deepgram๐Ÿ”œ ComingDeepgram
assemblyai๐Ÿ”œ ComingAssemblyAI
google๐Ÿ”œ ComingGoogle Speech-to-Text

Agent with Auto Media Processing

Configure agents to automatically process media files:
import { Agent, openai } from '@runflow-ai/sdk';

const agent = new Agent({
  name: 'WhatsApp Assistant',
  instructions: 'You are a helpful assistant.',
  model: openai('gpt-4o'),
  
  // Auto media processing
  media: {
    transcribeAudio: true,    // Transcribe audio files automatically
    processImages: true,      // Process images as multimodal (GPT-4o Vision)
    audioProvider: 'openai',  // Transcription provider
    audioLanguage: 'pt',      // Default language for transcription
  },
});

// Audio files are automatically transcribed before processing
const result = await agent.process({
  message: '',  // Can be empty when file has audio
  file: {
    url: 'https://zenvia.com/storage/audio.ogg',
    contentType: 'audio/ogg',
    caption: 'Voice message',  // Optional
  },
});

// Images are automatically processed as multimodal
const result2 = await agent.process({
  message: 'What is in this image?',
  file: {
    url: 'https://example.com/image.jpg',
    contentType: 'image/jpeg',
  },
});

Media Config Options

OptionTypeDescription
transcribeAudiobooleanAuto-transcribe audio (default: false)
processImagesbooleanAuto-process images (default: false)
audioLanguagestringLanguage code (pt, en, es, etc.)
audioProviderstringopenai | deepgram | assemblyai | google
audioModelstringProvider-specific model

Next Steps