Batch Operations

The batch operate SDK helpers allow you to efficiently process multiple URLs in parallel while managing browser sessions and windows automatically. These helpers automatically manage the lifecycle of browser sessions and windows, handling creation and cleanup behind the scenes so you can focus on your own application logic.

This guide explains how to use these helpers which are available in both the Node.js and Python Airtop SDKs.

Python SDK

Note that batch operations are currently only supported by the AsyncAirtop client.

Basic Usage

1from airtop import AsyncAirtop, types, BatchOperationUrl, BatchOperationInput, BatchOperationResponse, BatchOperateConfig
2from dataclasses import dataclass
3from typing import Optional, List, Any, Dict
4
5async def main():
6 client = AsyncAirtop(api_key="your-api-key")
7
8 # Define URLs to process
9 urls = [
10 BatchOperationUrl(url="https://example.com/page1"),
11 BatchOperationUrl(url="https://example.com/page2"),
12 BatchOperationUrl(url="https://example.com/page3")
13 ]
14
15 # Define your operation function
16 async def operation(input: BatchOperationInput) -> BatchOperationResponse:
17 # Example: Run a custom query on the page
18 result = await client.windows.page_query(
19 session_id=input.session_id,
20 window_id=input.window_id,
21 prompt="What is the main idea of this page?"
22 )
23
24 return BatchOperationResponse(
25 data=result.data.model_response,
26 should_halt_batch=False,
27 additional_urls=[]
28 )
29
30 # Execute batch operation
31 results = await client.batch_operate(
32 urls=urls,
33 operation=operation,
34 )
35
36 # Results will be an array containing the responses from each operation
37 # Example:
38 # [
39 # "The main idea of this page is about cloud computing services and their benefits for enterprise...",
40 # "This page discusses machine learning algorithms and their applications in data science...",
41 # "The page focuses on cybersecurity best practices for organizations..."
42 # ]

Advanced Features

Error Handling

By default, the Batch Operate helpers will not automatically retry failed operations. However, you can implement this or other custom error handling by providing an on_error callback.

1async def handle_error(error: BatchOperationError):
2 print(f"Error occurred: {error.error}")
3 print(f"Session ID: {error.session_id}")
4 print(f"Window ID: {error.window_id}")
5 print(f"URLs affected: {error.operation_urls}")
6
7config = BatchOperateConfig(
8 on_error=handle_error
9)

Session Configuration

You can customize the session configuration for all sessions created during the batch operation.

Note that the persistProfile option is not supported for batch operations. Instead, we recommend handling operations that require a persistent profile in a separate operation and then using that profile id in the session configuration for batch operations.

1session = await client.sessions.create()
2
3# ... Your custom logic, i.e. authentication ...
4
5# Terminate the session to persist the profile
6await client.sessions.terminate(session.data.id)
7
8config = BatchOperateConfig(
9 session_config=types.SessionConfigV1(
10 timeout_minutes=10,
11 base_profile_id=session.data.profile_id # Use the profile id from the previous session
12 )
13)
14
15results = await client.batch_operate(
16 urls=urls,
17 operation=operation,
18 config=config
19)

Halting Operations

You can stop the batch processing at any point by returning should_halt_batch=True:

1async def operation(input: BatchOperationInput) -> BatchOperationResponse:
2 # ... your custom logic ...
3
4 if some_condition:
5 return BatchOperationResponse(
6 data=result,
7 should_halt_batch=True # This will stop processing remaining URLs
8 )
9
10 return BatchOperationResponse(data=result)

Controlling Concurrency

Note: We recommend keeping max_windows_per_session=1 (the default) for optimal stability and performance. While you can experiment with multiple windows per session, single-window sessions are generally more reliable and easier to manage. Increase max_concurrent_sessions instead if you need more parallelization.

You can control the number of concurrent sessions and windows per session:

1config = BatchOperateConfig(
2 max_concurrent_sessions=30, # Default value is 30
3 max_windows_per_session=1 # Default value is 1
4)

Dynamic URL Addition

Your operation can discover and add new URLs to process during execution. The original batch_operate call will not complete until all operations have completed, including the new additions.

1async def operation(input: BatchOperationInput) -> BatchOperationResponse:
2 # ... your scraping logic ...
3
4 new_urls = [
5 BatchOperationUrl(url="https://example.com/discovered1"),
6 BatchOperationUrl(url="https://example.com/discovered2")
7 ]
8
9 return BatchOperationResponse(
10 data=result,
11 additional_urls=new_urls
12 )

Node.js SDK

Basic Usage

1import {
2 Airtop,
3 BatchOperationUrl,
4 BatchOperationInput,
5 BatchOperationResponse,
6 BatchOperateConfig
7} from '@airtop/sdk';
8
9async function main() {
10 const client = new Airtop({ apiKey: "your-api-key" });
11
12 // Define URLs to process
13 const urls: BatchOperationUrl[] = [
14 { url: "https://example.com/page1" },
15 { url: "https://example.com/page2" },
16 { url: "https://example.com/page3" }
17 ];
18
19 // Define your operation function
20 const operation = async (input: BatchOperationInput): Promise<BatchOperationResponse> => {
21 const { windowId, sessionId } = input;
22
23 // Example: Run a custom query on the page
24 const result = await client.windows.pageQuery({
25 sessionId,
26 windowId,
27 prompt: "What is the main idea of this page?"
28 });
29
30 return {
31 data: result.data.modelResponse,
32 shouldHaltBatch: false,
33 additionalUrls: []
34 };
35 };
36
37 // Execute batch operation
38 const results = await client.batchOperate({
39 urls,
40 operation
41 });
42
43 // Results will be an array containing the responses from each operation
44 // Example results:
45 // [
46 // "This page is about cloud computing services and infrastructure",
47 // "The main topic is machine learning and AI applications",
48 // "This page discusses data analytics and visualization tools"
49 // ]
50}

Advanced Features

Error Handling

By default, the Batch Operate helpers will not automatically retry failed operations. However, you can implement this or other custom error handling by providing an onError callback.

1const config: BatchOperateConfig = {
2 onError: async (error: BatchOperationError) => {
3 console.error(`Error occurred: ${error.error}`);
4 console.error(`Session ID: ${error.sessionId}`);
5 console.error(`Window ID: ${error.windowId}`);
6 console.error(`URLs affected:`, error.operationUrls);
7 }
8};

Session Configuration

You can customize the session configuration for all sessions created during the batch operation.

Note that the persistProfile option is not supported for batch operations. Instead, we recommend handling operations that require a persistent profile in a separate operation and then using that profile id in the session configuration for batch operations.

1// Create and setup initial session
2const session = await client.sessions.create();
3
4// ... Your custom logic, i.e. authentication ...
5
6// Terminate the session to persist the profile
7await client.sessions.terminate({ sessionId: session.data.id });
8
9const config: BatchOperateConfig = {
10 sessionConfig: {
11 timeoutMinutes: 10,
12 baseProfileId: session.data.profileId // Use the profile id from the previous session
13 }
14};
15
16const results = await client.batchOperate({
17 urls,
18 operation,
19 config
20});

Halting Operations

You can stop the batch processing at any point by returning shouldHaltBatch: true:

1const operation = async (input: BatchOperationInput): Promise<BatchOperationResponse> => {
2 // ... your custom logic ...
3
4 if (someCondition) {
5 return {
6 data: result,
7 shouldHaltBatch: true // This will stop processing remaining URLs
8 };
9 }
10
11 return { data: result };
12};

Controlling Concurrency

Note: We recommend keeping maxWindowsPerSession=1 (the default) for optimal stability and performance. While you can experiment with multiple windows per session, single-window sessions are generally more reliable and easier to manage. Increase maxConcurrentSessions instead if you need more parallelization.

You can control the number of concurrent sessions and windows per session:

1const config: BatchOperateConfig = {
2 maxConcurrentSessions: 30, // Default value is 30
3 maxWindowsPerSession: 1 // Default value is 1
4};

Dynamic URL Addition

Your operation can discover and add new URLs to process during execution. The original batchOperate call will not complete until all operations have completed, including the new additions.

1const operation = async (input: BatchOperationInput): Promise<BatchOperationResponse> => {
2 // ... your custom logic ...
3
4 return {
5 data: result,
6 additionalUrls: [
7 { url: "https://example.com/discovered1" },
8 { url: "https://example.com/discovered2" }
9 ]
10 };
11};
Built with