Skip to main content
The LLM module lets you call language models directly — no agents, no memory, no tools. Use it when you need a single LLM call for tasks like classification, data extraction, translation, or content generation.

When to Use LLM vs Agent

ScenarioUse
Conversation with memory and toolsAgent
Single classification or categorizationLLM Standalone
Extract structured data from textLLM Standalone
Generate content (emails, summaries)LLM Standalone
Translate textLLM Standalone
Pre-process input before an agentLLM Standalone

Basic Usage

import { LLM } from '@runflow-ai/sdk';

const llm = LLM.openai('gpt-4o', {
  temperature: 0.7,
  maxTokens: 2000,
});

const response = await llm.generate('What is the capital of Brazil?');
console.log(response.text);
console.log('Tokens:', response.usage);

With System Prompt

Use a system prompt to control the LLM’s behavior:
const response = await llm.generate(
  'The product arrived broken and I want my money back.',
  {
    system: `Classify the customer message into exactly one category:
- REFUND_REQUEST
- TECHNICAL_ISSUE
- GENERAL_QUESTION
- COMPLAINT
- PRAISE

Respond with ONLY the category name, nothing else.`,
    temperature: 0,
  }
);

console.log(response.text); // "REFUND_REQUEST"

With Messages

For multi-turn prompts or few-shot examples:
const response = await llm.generate([
  {
    role: 'system',
    content: `Extract structured data from customer messages. Return valid JSON only.`,
  },
  {
    role: 'user',
    content: 'My name is João, email joao@test.com, I need help with order ORD-789',
  },
]);

const data = JSON.parse(response.text);
// { name: "João", email: "joao@test.com", orderId: "ORD-789" }

Streaming

For real-time output (long responses, content generation):
const stream = llm.generateStream('Write a product description for a wireless headphone');

for await (const chunk of stream) {
  if (!chunk.done) {
    process.stdout.write(chunk.text);
  }
}

Available Models

import { LLM } from '@runflow-ai/sdk';

// OpenAI
const gpt4 = LLM.openai('gpt-4o', { temperature: 0.7 });
const gpt4mini = LLM.openai('gpt-4o-mini', { temperature: 0.3 });

// Anthropic (Claude)
const claude = LLM.anthropic('claude-sonnet-4-20250514', {
  temperature: 0.9,
  maxTokens: 4000,
});

// AWS Bedrock
const bedrockClaude = LLM.bedrock('anthropic.claude-3-5-sonnet-20241022-v2:0', {
  temperature: 0.8,
});

// Groq (ultra-fast inference)
const fast = LLM.groq('llama-3.3-70b-versatile', { temperature: 0.3 });

// Google Gemini
const flash = LLM.gemini('gemini-2.5-flash', { temperature: 0.5 });

// Custom (OpenAI-compatible: Ollama, vLLM, LiteLLM, etc.)
const local = LLM.custom('llama3', 'Ollama Local', { temperature: 0.7 });
See LLM Providers for all supported providers and configuration options.

Real-World Example: Intent Classifier Tool

A common pattern is using LLM Standalone inside a tool to classify intent before the agent decides what to do:
tools/classify-intent.ts
import { createTool } from '@runflow-ai/sdk';
import { LLM } from '@runflow-ai/sdk';
import { z } from 'zod';

const classifier = LLM.openai('gpt-4o-mini', { temperature: 0 });

export const classifyIntentTool = createTool({
  id: 'classify-intent',
  description: 'Classify customer message intent',
  inputSchema: z.object({
    message: z.string().describe('The customer message to classify'),
  }),
  execute: async ({ context }) => {
    try {
      const response = await classifier.generate(context.message, {
        system: `Classify the message into one category:
- ORDER_STATUS: asking about an order, delivery, or tracking
- REFUND: requesting money back or return
- TECHNICAL: product issue or bug report
- BILLING: payment, invoice, or charge question
- GENERAL: anything else

Respond with JSON: { "intent": "CATEGORY", "confidence": 0.0-1.0 }`,
      });

      return JSON.parse(response.text);
    } catch (error) {
      return { intent: 'GENERAL', confidence: 0 };
    }
  },
});

Real-World Example: Pre-Processing in main.ts

Use LLM Standalone to pre-process or enrich input before passing it to your agent:
main.ts
import { LLM } from '@runflow-ai/sdk';
import { identify, track } from '@runflow-ai/sdk/observability';
import { supportAgent } from './agent';

const classifier = LLM.openai('gpt-4o-mini', { temperature: 0 });

async function detectLanguage(text: string): Promise<string> {
  const response = await classifier.generate(text, {
    system: 'Detect the language of this text. Respond with only the ISO 639-1 code (e.g., "pt", "en", "es").',
  });
  return response.text.trim().toLowerCase();
}

export async function main(input: any) {
  if (!input?.message) {
    return { error: 'message is required' };
  }

  identify(input.email || input.phone || 'anonymous');

  // Pre-process: detect language
  const language = await detectLanguage(input.message);

  const result = await supportAgent.process({
    message: input.message,
    sessionId: input.sessionId,
  });

  track('message_processed', { language });

  return { message: result.message, language };
}

Real-World Example: Content Generation

Generate structured content without needing an agent:
tools/generate-email.ts
import { createTool } from '@runflow-ai/sdk';
import { LLM } from '@runflow-ai/sdk';
import { z } from 'zod';

const writer = LLM.openai('gpt-4o', { temperature: 0.7 });

export const generateEmailTool = createTool({
  id: 'generate-email',
  description: 'Generate a professional email based on context',
  inputSchema: z.object({
    to: z.string().describe('Recipient name'),
    subject: z.string().describe('Email subject'),
    context: z.string().describe('What the email should communicate'),
    tone: z.enum(['formal', 'friendly', 'urgent']).describe('Email tone'),
  }),
  execute: async ({ context }) => {
    try {
      const response = await writer.generate(
        `Write an email to ${context.to} about: ${context.context}`,
        {
          system: `You are a professional email writer.
Tone: ${context.tone}
Subject: ${context.subject}

Write the email body only (no subject line, no "From/To" headers).
Keep it concise — 2-3 paragraphs max.`,
        }
      );

      return {
        success: true,
        subject: context.subject,
        body: response.text,
        tokensUsed: response.usage?.totalTokens,
      };
    } catch (error) {
      return {
        success: false,
        error: error instanceof Error ? error.message : 'Failed to generate email',
      };
    }
  },
});

Next Steps

Agents

When you need memory and tools

Tools

Use LLM inside tools

Media Processing

Process audio and images

Best Practices

Tips for effective agents