RPA / Browser Automation

RPA (Robotic Process Automation) lets your agents navigate websites, fill forms, click buttons, extract data, and download files using a real browser. Built on top of Playwright.

Installation

Playwright is an optional peer dependency. Install it alongside the browser binaries:

npm install playwright
npx playwright install chromium

Or use the CLI shortcut:

rf rpa install

Check your setup:

rf rpa status

Quick Start

The simplest way to use RPA is with createBrowserTool — it manages the browser lifecycle automatically.

import { createBrowserTool } from '@runflow-ai/sdk/rpa';
import { z } from 'zod';

const scrapeTool = createBrowserTool({
  id: 'scrape-products',
  description: 'Scrape product listings from a website',
  inputSchema: z.object({
    url: z.string().url(),
  }),
  browser: {
    headless: true,
    screenshotsDir: './screenshots',
  },
  execute: async ({ context, browser }) => {
    const page = browser.page;
    await page.goto(context.url);
    
    const products = await page.$$eval('.product', (els) =>
      els.map((el) => ({
        name: el.querySelector('.name')?.textContent?.trim(),
        price: el.querySelector('.price')?.textContent?.trim(),
      }))
    );

    await browser.screenshot('products-page');
    return { products };
  },
});

Then add it to your agent:

import { Agent } from '@runflow-ai/sdk';

const agent = new Agent({
  name: 'scraper',
  instructions: 'You scrape product data from websites.',
  model: openai('gpt-4o'),
  tools: { scrapeTool },
});

createBrowserTool

Factory function that wraps your browser logic into a Runflow tool with automatic lifecycle management.

import { createBrowserTool } from '@runflow-ai/sdk/rpa';

const myTool = createBrowserTool({
  id: 'tool-id',
  description: 'What this tool does (shown to LLM)',
  inputSchema: z.object({ /* ... */ }),
  outputSchema: z.object({ /* ... */ }),  // optional
  browser: {
    headless: true,
    viewport: { width: 1440, height: 900 },
    timeout: 30000,
    screenshotsDir: './screenshots',
  },
  execute: async ({ context, browser, projectId, companyId, userId, sessionId }) => {
    const page = browser.page;
    // your automation logic
    return { /* result */ };
  },
});

What it handles for you:

Launches the browser before your execute runs
Closes the browser after (even on errors)
Takes an error screenshot automatically if screenshotsDir is configured
Attaches RPA trace data to the output for observability
Validates input/output with Zod schemas

Browser Configuration

Option	Type	Default	Description
`headless`	`boolean`	`true`	Run browser without visible window
`viewport`	`{ width, height }`	`1440x900`	Browser viewport size
`timeout`	`number`	`30000`	Default timeout in milliseconds
`acceptDownloads`	`boolean`	`true`	Allow file downloads
`slowMo`	`number`	-	Slow down actions by N ms (useful for debugging)
`screenshotsDir`	`string`	-	Directory to save screenshots
`userAgent`	`string`	-	Custom user agent string
`locale`	`string`	-	Browser locale (e.g., `pt-BR`)
`timezoneId`	`string`	-	Timezone (e.g., `America/Sao_Paulo`)
`extraHTTPHeaders`	`Record<string, string>`	-	Custom HTTP headers
`launchArgs`	`string[]`	-	Extra Chromium launch arguments

BrowserSession

For more control, use BrowserSession directly. This is useful when you need multiple pages, custom lifecycle, or manual tracing.

import { BrowserSession } from '@runflow-ai/sdk/rpa';

const session = new BrowserSession({
  headless: true,
  viewport: { width: 1920, height: 1080 },
  screenshotsDir: './screenshots',
});

await session.launch();

const page = session.page;
await page.goto('https://example.com');

// Take a screenshot
await session.screenshot('home-page');

// Traced action (appears in observability)
const title = await session.traced('get-title', async () => {
  return page.title();
});

await session.close();

Key Methods

Method	Description
`launch(config?)`	Start the browser
`close()`	Close browser and cleanup
`screenshot(name)`	Take a full-page screenshot, returns file path
`waitForNavigation(urlPattern, timeout?)`	Wait for URL to match a RegExp
`waitForSelector(selector, timeout?)`	Wait for a CSS selector to appear
`getContent()`	Get page HTML content
`newPage()`	Open a new page/tab
`traced(action, fn, meta?)`	Wrap an operation in a traced span

Properties

Property	Type	Description
`page`	`Page`	Current Playwright page
`isLaunched`	`boolean`	Whether the browser is running
`artifacts`	`BrowserSessionArtifact[]`	Screenshots, downloads, PDFs created
`actionSpans`	`BrowserActionSpan[]`	All traced actions

Observability

Every BrowserSession tracks actions and artifacts. Get a summary with:

const trace = session.getTraceSummary();
// {
//   totalActions: 5,
//   totalDurationMs: 3200,
//   actions: [{ action: 'login', durationMs: 1200, ... }, ...],
//   artifacts: [{ type: 'screenshot', name: 'home', path: '...', ... }],
//   errors: []
// }

When using createBrowserTool, this trace is automatically attached to the tool output as _rpaTrace.

High-Level Actions

Helper functions for common browser patterns. Import from @runflow-ai/sdk/rpa. Automate login flows with smart field detection.

import { login } from '@runflow-ai/sdk/rpa';

await login(page, {
  url: 'https://app.example.com/login',
  username: 'user@example.com',
  password: 'secret123',
  waitAfterLogin: /dashboard/,  // wait until URL matches
});

Options:

Option	Type	Default	Description
`url`	`string`	required	Login page URL
`username`	`string`	required	Username/email value
`password`	`string`	required	Password value
`usernameSelector`	`string`	auto-detect	CSS selector for username field
`passwordSelector`	`string`	auto-detect	CSS selector for password field
`submitSelector`	`string`	auto-detect	CSS selector for submit button
`waitAfterLogin`	`RegExp \| string`	-	URL pattern to wait for after login
`timeout`	`number`	`30000`	Timeout in ms

Auto-detection works for most login pages. It finds the first text input for username, the password input, and the submit button by common labels (Login, Entrar, Sign In, etc.).

fillForm

Fill multiple form fields with flexible locators.

import { fillForm } from '@runflow-ai/sdk/rpa';

await fillForm(page, [
  { selector: '#name', value: 'John Doe' },
  { label: 'Email', value: 'john@example.com' },
  { role: 'combobox', label: 'Country', value: 'Brazil', type: 'select' },
  { selector: '#terms', value: 'true', type: 'check' },
]);

FormField options:

Option	Type	Description
`selector`	`string`	CSS selector
`role`	`string`	Aria role (`textbox`, `combobox`, `checkbox`, etc.)
`label`	`string`	Accessible label or placeholder
`nth`	`number`	Index when multiple elements match
`value`	`string`	Value to set
`type`	`'fill' \| 'select' \| 'check' \| 'uncheck'`	Action type (default: `fill`)

Locator priority: selector > role + label > role > label.

clickButton

Click a button by its visible label.

import { clickButton } from '@runflow-ai/sdk/rpa';

await clickButton(page, 'Submit');

waitAndClick

Wait for an element to appear, then click it.

import { waitAndClick } from '@runflow-ai/sdk/rpa';

await waitAndClick(page, '.modal-confirm-button', 5000);

extractTable

Extract an HTML table into structured data.

import { extractTable } from '@runflow-ai/sdk/rpa';

const rows = await extractTable(page, 'table.results');
// [
//   { "Name": "Product A", "Price": "$10", "Stock": "42" },
//   { "Name": "Product B", "Price": "$25", "Stock": "7" },
// ]

Returns an array of objects where keys are column headers.

extractText

Extract text content from matching elements.

import { extractText } from '@runflow-ai/sdk/rpa';

const titles = await extractText(page, 'h2.title');
// ["First Title", "Second Title", "Third Title"]

downloadFile

Click an element to trigger a download and wait for it to complete.

import { downloadFile } from '@runflow-ai/sdk/rpa';

const filePath = await downloadFile(page, '#export-btn', './downloads');
// "./downloads/report.xlsx"

screenshotPage

Take a full-page screenshot.

import { screenshotPage } from '@runflow-ai/sdk/rpa';

const path = await screenshotPage(page, './screenshots/page.png');

waitForResponse

Wait for a network response matching a URL pattern.

import { waitForResponse } from '@runflow-ai/sdk/rpa';

const response = await waitForResponse(page, /api\/products/, 10000);

import { Agent } from '@runflow-ai/sdk';
import { openai } from '@runflow-ai/sdk/models';
import { createBrowserTool, login, extractTable } from '@runflow-ai/sdk/rpa';
import { z } from 'zod';

const crmScrapeTool = createBrowserTool({
  id: 'crm-contacts',
  description: 'Login to CRM and extract contact list',
  inputSchema: z.object({
    searchTerm: z.string().describe('Term to search in CRM'),
  }),
  browser: {
    headless: true,
    screenshotsDir: './screenshots',
    locale: 'pt-BR',
    timezoneId: 'America/Sao_Paulo',
  },
  execute: async ({ context, browser }) => {
    const page = browser.page;

    // 1. Login
    await browser.traced('login', () =>
      login(page, {
        url: 'https://crm.example.com/login',
        username: process.env.CRM_USER!,
        password: process.env.CRM_PASS!,
        waitAfterLogin: /contacts/,
      })
    );

    // 2. Search
    await browser.traced('search', async () => {
      await page.fill('#search-input', context.searchTerm);
      await page.click('#search-button');
      await page.waitForSelector('table.contacts');
    });

    // 3. Extract data
    const contacts = await browser.traced('extract', () =>
      extractTable(page, 'table.contacts')
    );

    await browser.screenshot('results');

    return { contacts, total: contacts.length };
  },
});

const agent = new Agent({
  name: 'crm-agent',
  instructions: 'You extract contact data from the CRM system.',
  model: openai('gpt-4o'),
  tools: { crmScrapeTool },
});

export default agent;

Agent-Level RPA Config

You can also configure RPA at the agent level:

const agent = new Agent({
  name: 'scraper',
  instructions: '...',
  model: openai('gpt-4o'),
  tools: { scrapeTool },
  rpa: {
    enabled: true,
    browser: {
      headless: true,
      viewport: { width: 1440, height: 900 },
    },
    screenshotOnError: true,
    artifactsDir: './rpa-artifacts',
  },
});

Option	Type	Default	Description
`enabled`	`boolean`	`false`	Enable RPA capability for this agent
`browser`	`BrowserSessionConfig`	-	Default browser config for all tools
`maxConcurrentPages`	`number`	-	Limit concurrent browser pages
`screenshotOnError`	`boolean`	`false`	Auto-screenshot on errors
`artifactsDir`	`string`	-	Directory for all RPA artifacts

Agents with RPA tools are automatically detected during deploy and receive the rpa capability flag. This routes them to RPA-enabled workers with Chromium pre-installed.

Debugging

Use slowMo and headless: false during development to watch the browser:

const tool = createBrowserTool({
  // ...
  browser: {
    headless: false,
    slowMo: 500,  // 500ms delay between actions
    screenshotsDir: './debug-screenshots',
  },
  // ...
});

Use rf test to run your agent locally with a visible browser.

Documentation Index

​Installation

​Quick Start

​createBrowserTool

​Browser Configuration

​BrowserSession

​Key Methods

​Properties

​Observability

​High-Level Actions

​login

​fillForm

​clickButton

​waitAndClick

​extractTable

​extractText

​downloadFile

​screenshotPage

​waitForResponse

​Full Example: CRM Login + Data Extraction

​Agent-Level RPA Config

​Debugging

Installation

Quick Start

createBrowserTool

Browser Configuration

BrowserSession

Key Methods

Properties

Observability

High-Level Actions

login

fillForm

clickButton

waitAndClick

extractTable

extractText

downloadFile

screenshotPage

waitForResponse

Full Example: CRM Login + Data Extraction

Agent-Level RPA Config

Debugging