RPA (Robotic Process Automation) lets your agents navigate websites, fill forms, click buttons, extract data, and download files using a real browser. Built on top of Playwright.
Installation
Playwright is an optional peer dependency. Install it alongside the browser binaries:
npm install playwright
npx playwright install chromium
Or use the CLI shortcut:
Check your setup:
Quick Start
The simplest way to use RPA is with createBrowserTool — it manages the browser lifecycle automatically.
import { createBrowserTool } from '@runflow-ai/sdk/rpa';
import { z } from 'zod';
const scrapeTool = createBrowserTool({
id: 'scrape-products',
description: 'Scrape product listings from a website',
inputSchema: z.object({
url: z.string().url(),
}),
browser: {
headless: true,
screenshotsDir: './screenshots',
},
execute: async ({ context, browser }) => {
const page = browser.page;
await page.goto(context.url);
const products = await page.$$eval('.product', (els) =>
els.map((el) => ({
name: el.querySelector('.name')?.textContent?.trim(),
price: el.querySelector('.price')?.textContent?.trim(),
}))
);
await browser.screenshot('products-page');
return { products };
},
});
Then add it to your agent:
import { Agent } from '@runflow-ai/sdk';
const agent = new Agent({
name: 'scraper',
instructions: 'You scrape product data from websites.',
model: openai('gpt-4o'),
tools: { scrapeTool },
});
Factory function that wraps your browser logic into a Runflow tool with automatic lifecycle management.
import { createBrowserTool } from '@runflow-ai/sdk/rpa';
const myTool = createBrowserTool({
id: 'tool-id',
description: 'What this tool does (shown to LLM)',
inputSchema: z.object({ /* ... */ }),
outputSchema: z.object({ /* ... */ }), // optional
browser: {
headless: true,
viewport: { width: 1440, height: 900 },
timeout: 30000,
screenshotsDir: './screenshots',
},
execute: async ({ context, browser, projectId, companyId, userId, sessionId }) => {
const page = browser.page;
// your automation logic
return { /* result */ };
},
});
What it handles for you:
- Launches the browser before your
execute runs
- Closes the browser after (even on errors)
- Takes an error screenshot automatically if
screenshotsDir is configured
- Attaches RPA trace data to the output for observability
- Validates input/output with Zod schemas
Browser Configuration
| Option | Type | Default | Description |
|---|
headless | boolean | true | Run browser without visible window |
viewport | { width, height } | 1440x900 | Browser viewport size |
timeout | number | 30000 | Default timeout in milliseconds |
acceptDownloads | boolean | true | Allow file downloads |
slowMo | number | - | Slow down actions by N ms (useful for debugging) |
screenshotsDir | string | - | Directory to save screenshots |
userAgent | string | - | Custom user agent string |
locale | string | - | Browser locale (e.g., pt-BR) |
timezoneId | string | - | Timezone (e.g., America/Sao_Paulo) |
extraHTTPHeaders | Record<string, string> | - | Custom HTTP headers |
launchArgs | string[] | - | Extra Chromium launch arguments |
BrowserSession
For more control, use BrowserSession directly. This is useful when you need multiple pages, custom lifecycle, or manual tracing.
import { BrowserSession } from '@runflow-ai/sdk/rpa';
const session = new BrowserSession({
headless: true,
viewport: { width: 1920, height: 1080 },
screenshotsDir: './screenshots',
});
await session.launch();
const page = session.page;
await page.goto('https://example.com');
// Take a screenshot
await session.screenshot('home-page');
// Traced action (appears in observability)
const title = await session.traced('get-title', async () => {
return page.title();
});
await session.close();
Key Methods
| Method | Description |
|---|
launch(config?) | Start the browser |
close() | Close browser and cleanup |
screenshot(name) | Take a full-page screenshot, returns file path |
waitForNavigation(urlPattern, timeout?) | Wait for URL to match a RegExp |
waitForSelector(selector, timeout?) | Wait for a CSS selector to appear |
getContent() | Get page HTML content |
newPage() | Open a new page/tab |
traced(action, fn, meta?) | Wrap an operation in a traced span |
Properties
| Property | Type | Description |
|---|
page | Page | Current Playwright page |
isLaunched | boolean | Whether the browser is running |
artifacts | BrowserSessionArtifact[] | Screenshots, downloads, PDFs created |
actionSpans | BrowserActionSpan[] | All traced actions |
Observability
Every BrowserSession tracks actions and artifacts. Get a summary with:
const trace = session.getTraceSummary();
// {
// totalActions: 5,
// totalDurationMs: 3200,
// actions: [{ action: 'login', durationMs: 1200, ... }, ...],
// artifacts: [{ type: 'screenshot', name: 'home', path: '...', ... }],
// errors: []
// }
When using createBrowserTool, this trace is automatically attached to the tool output as _rpaTrace.
High-Level Actions
Helper functions for common browser patterns. Import from @runflow-ai/sdk/rpa.
login
Automate login flows with smart field detection.
import { login } from '@runflow-ai/sdk/rpa';
await login(page, {
url: 'https://app.example.com/login',
username: 'user@example.com',
password: 'secret123',
waitAfterLogin: /dashboard/, // wait until URL matches
});
Options:
| Option | Type | Default | Description |
|---|
url | string | required | Login page URL |
username | string | required | Username/email value |
password | string | required | Password value |
usernameSelector | string | auto-detect | CSS selector for username field |
passwordSelector | string | auto-detect | CSS selector for password field |
submitSelector | string | auto-detect | CSS selector for submit button |
waitAfterLogin | RegExp | string | - | URL pattern to wait for after login |
timeout | number | 30000 | Timeout in ms |
Auto-detection works for most login pages. It finds the first text input for username, the password input, and the submit button by common labels (Login, Entrar, Sign In, etc.).
Fill multiple form fields with flexible locators.
import { fillForm } from '@runflow-ai/sdk/rpa';
await fillForm(page, [
{ selector: '#name', value: 'John Doe' },
{ label: 'Email', value: 'john@example.com' },
{ role: 'combobox', label: 'Country', value: 'Brazil', type: 'select' },
{ selector: '#terms', value: 'true', type: 'check' },
]);
FormField options:
| Option | Type | Description |
|---|
selector | string | CSS selector |
role | string | Aria role (textbox, combobox, checkbox, etc.) |
label | string | Accessible label or placeholder |
nth | number | Index when multiple elements match |
value | string | Value to set |
type | 'fill' | 'select' | 'check' | 'uncheck' | Action type (default: fill) |
Locator priority: selector > role + label > role > label.
Click a button by its visible label.
import { clickButton } from '@runflow-ai/sdk/rpa';
await clickButton(page, 'Submit');
waitAndClick
Wait for an element to appear, then click it.
import { waitAndClick } from '@runflow-ai/sdk/rpa';
await waitAndClick(page, '.modal-confirm-button', 5000);
Extract an HTML table into structured data.
import { extractTable } from '@runflow-ai/sdk/rpa';
const rows = await extractTable(page, 'table.results');
// [
// { "Name": "Product A", "Price": "$10", "Stock": "42" },
// { "Name": "Product B", "Price": "$25", "Stock": "7" },
// ]
Returns an array of objects where keys are column headers.
Extract text content from matching elements.
import { extractText } from '@runflow-ai/sdk/rpa';
const titles = await extractText(page, 'h2.title');
// ["First Title", "Second Title", "Third Title"]
downloadFile
Click an element to trigger a download and wait for it to complete.
import { downloadFile } from '@runflow-ai/sdk/rpa';
const filePath = await downloadFile(page, '#export-btn', './downloads');
// "./downloads/report.xlsx"
screenshotPage
Take a full-page screenshot.
import { screenshotPage } from '@runflow-ai/sdk/rpa';
const path = await screenshotPage(page, './screenshots/page.png');
waitForResponse
Wait for a network response matching a URL pattern.
import { waitForResponse } from '@runflow-ai/sdk/rpa';
const response = await waitForResponse(page, /api\/products/, 10000);
import { Agent } from '@runflow-ai/sdk';
import { openai } from '@runflow-ai/sdk/models';
import { createBrowserTool, login, extractTable } from '@runflow-ai/sdk/rpa';
import { z } from 'zod';
const crmScrapeTool = createBrowserTool({
id: 'crm-contacts',
description: 'Login to CRM and extract contact list',
inputSchema: z.object({
searchTerm: z.string().describe('Term to search in CRM'),
}),
browser: {
headless: true,
screenshotsDir: './screenshots',
locale: 'pt-BR',
timezoneId: 'America/Sao_Paulo',
},
execute: async ({ context, browser }) => {
const page = browser.page;
// 1. Login
await browser.traced('login', () =>
login(page, {
url: 'https://crm.example.com/login',
username: process.env.CRM_USER!,
password: process.env.CRM_PASS!,
waitAfterLogin: /contacts/,
})
);
// 2. Search
await browser.traced('search', async () => {
await page.fill('#search-input', context.searchTerm);
await page.click('#search-button');
await page.waitForSelector('table.contacts');
});
// 3. Extract data
const contacts = await browser.traced('extract', () =>
extractTable(page, 'table.contacts')
);
await browser.screenshot('results');
return { contacts, total: contacts.length };
},
});
const agent = new Agent({
name: 'crm-agent',
instructions: 'You extract contact data from the CRM system.',
model: openai('gpt-4o'),
tools: { crmScrapeTool },
});
export default agent;
Agent-Level RPA Config
You can also configure RPA at the agent level:
const agent = new Agent({
name: 'scraper',
instructions: '...',
model: openai('gpt-4o'),
tools: { scrapeTool },
rpa: {
enabled: true,
browser: {
headless: true,
viewport: { width: 1440, height: 900 },
},
screenshotOnError: true,
artifactsDir: './rpa-artifacts',
},
});
| Option | Type | Default | Description |
|---|
enabled | boolean | false | Enable RPA capability for this agent |
browser | BrowserSessionConfig | - | Default browser config for all tools |
maxConcurrentPages | number | - | Limit concurrent browser pages |
screenshotOnError | boolean | false | Auto-screenshot on errors |
artifactsDir | string | - | Directory for all RPA artifacts |
Agents with RPA tools are automatically detected during deploy and receive the rpa capability flag. This routes them to RPA-enabled workers with Chromium pre-installed.
Debugging
Use slowMo and headless: false during development to watch the browser:
const tool = createBrowserTool({
// ...
browser: {
headless: false,
slowMo: 500, // 500ms delay between actions
screenshotsDir: './debug-screenshots',
},
// ...
});
Use rf test to run your agent locally with a visible browser.