Skip to main content
Stream responses in real-time using processStream(). Supports content chunks, thinking/reasoning, tool calls, and memory persistence.

Basic Streaming

const stream = await agent.processStream({
  message: 'Tell me a story',
  sessionId: 'session_123',
});

for await (const chunk of stream) {
  if (chunk.type === 'content') {
    process.stdout.write(chunk.data.content);
  }
}

Chunk Types

Your stream can receive different chunk types:
TypeDescriptionData
contentText response from the model{ content: string, done: boolean }
thinkingReasoning/thinking content{ content: string, done: boolean }
internal_processTool call start/complete, memory load/save{ processType, status, process }
doneStream complete{ message, metadata }
errorError occurred{ error: string }

Streaming with Thinking

When thinking is enabled, reasoning content arrives as separate chunks before the final response:
const agent = new Agent({
  name: 'Analyst',
  model: anthropic('claude-sonnet-4-6'),
  modelConfig: {
    thinking: { type: 'enabled', budgetTokens: 5000 }
  }
});

const stream = await agent.processStream({ message: 'Why is the sky blue?' });

for await (const chunk of stream) {
  switch (chunk.type) {
    case 'thinking':
      console.log('[Thinking]', chunk.data.content);
      break;
    case 'content':
      process.stdout.write(chunk.data.content);
      break;
    case 'internal_process':
      if (chunk.data.status === 'started') {
        console.log(`[${chunk.data.processType}] started...`);
      }
      break;
  }
}

Streaming with Tool Calls

Tool calls are reported as internal_process chunks. The agent handles the tool loop automatically:
const agent = new Agent({
  name: 'Assistant',
  model: openai('gpt-4o'),
  tools: {
    get_weather: {
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: {
        location: { type: 'string', description: 'City name', required: true }
      },
      execute: async ({ location }) => {
        return { temp: 22, condition: 'sunny' };
      }
    }
  }
});

const stream = await agent.processStream({ message: 'Weather in Tokyo?' });

for await (const chunk of stream) {
  if (chunk.type === 'content') {
    process.stdout.write(chunk.data.content);
  } else if (chunk.type === 'internal_process') {
    const proc = chunk.data;
    if (proc.processType === 'tool_call' && proc.status === 'started') {
      console.log(`\nCalling tool: ${proc.process.name}`);
    }
    if (proc.processType === 'tool_call' && proc.status === 'completed') {
      console.log(`Tool result: ${JSON.stringify(proc.result)}`);
    }
  }
}

Streaming with Memory

Memory is automatically loaded before and saved after streaming:
const agent = new Agent({
  name: 'Chat',
  model: openai('gpt-4o'),
  memory: { maxTurns: 20 }
});

// Memory chunks appear as internal_process
const stream = await agent.processStream({
  message: 'Continue our conversation',
  sessionId: 'session_abc',
});

for await (const chunk of stream) {
  if (chunk.type === 'internal_process' && chunk.data.processType === 'memory_load') {
    console.log('Memory loaded:', chunk.data.result?.messagesCount, 'messages');
  }
  if (chunk.type === 'content') {
    process.stdout.write(chunk.data.content);
  }
}

LLM Standalone Streaming

Direct LLM streaming without agents:
const llm = LLM.anthropic('claude-sonnet-4-6', {
  thinking: { type: 'enabled', budgetTokens: 3000 }
});

for await (const chunk of llm.generateStream('Explain quantum computing')) {
  if (chunk.thinking) {
    console.log('[Think]', chunk.thinking);
  }
  if (chunk.text) {
    process.stdout.write(chunk.text);
  }
}

Testing in Prompt Studio

You can test streaming behavior directly in the Portal’s Prompt Studio:
  1. Open Prompts and select or create a prompt
  2. Click the config icon and enable Thinking
  3. Send a message — you’ll see the thinking content appear as a collapsible block above the response

Next Steps

Reasoning

Extended thinking for complex tasks

Memory

Conversation persistence