Monitor a website for changes

Overview

This recipe demonstrates how to use Airtop to create an intelligent web agent that can monitor and extract information from websites on a schedule. The agent can handle authentication, navigate through pages, and use AI to analyze and compare content changes over time.

The script creates a browser session, handles login states, extracts structured data using AI, and stores the results in a local database. It’s particularly useful for monitoring job listings, product prices, or any web content that changes regularly.

Prerequisites

To get started, ensure you have:

  • Python 3.10 or higher
  • Poetry==2.1.1 or higher
  • An Airtop API key. You can get one for free.

Getting Started

  1. Clone the examples-python repository and navigate to the example directory:
$cd examples/web_agent
  1. Install dependencies using Poetry:
$poetry install
  1. Configure your environment:
$poetry env activate
>cp .env.example .env

Add your Airtop API key to the .env file:

# file examples/web_agent/.env
AIRTOP_API_KEY=<YOUR_API_KEY>

Script Walkthrough

1. Initialize the Airtop Client

The script begins by setting up the Airtop client with your API key:

1client = Airtop(api_key=AIRTOP_API_KEY)

2. Session Management

The agent creates a browser session with a Profile name that can be persisted for future use:

1profile_name = input("Enter a profile name. If no profile exists with this name, one will be created: ").strip()
2
3configuration = SessionConfigV1(
4 timeout_minutes=10,
5 profile_name=profile_name,
6)
7session = client.sessions.create(configuration=configuration)

If the profile doesn’t exist, Airtop creates a new one and your browsing session, including login credentials, cookies, and cache, will be automatically saved under this profile name for future use.

You can also create a profile manually at Airtop Portal and then use the same name in the script.

3. Authentication Handling

The script intelligently handles authentication checks:

1logged_in_schema = IS_LOGGED_IN_OUTPUT_SCHEMA
2is_logged_in_response = client.windows.page_query(
3 session_id,
4 window.data.window_id,
5 prompt=IS_LOGGED_IN_PROMPT,
6 configuration=PageQueryConfig(output_schema=logged_in_schema),
7)

4. AI-Powered Data Extraction

The agent uses structured prompts and schemas to extract specific information:

1extract_data_response = client.windows.page_query(
2 session_id,
3 window.data.window_id,
4 prompt=EXTRACT_DATA_PROMPT,
5 configuration=PageQueryConfig(output_schema=EXTRACT_DATA_OUTPUT_SCHEMA),
6)

5. Data Storage and Comparison

Results are stored in a local SQLite database and compared with previous runs:

1old_content = retrieve_previous_result(TARGET_URL, EXTRACT_DATA_PROMPT)
2if old_content:
3 prompt_content_response = client.windows.page_query(
4 session.data.id,
5 window_info.data.window_id,
6 prompt=comparison_prompt(old_content, formatted_json),
7 )

Running the Agent

Execute the script with:

$poetry run python worker.py
># or:
>python worker.py

The agent will:

  1. Create a browser session with a new or existing profile
  2. Handle authentication if needed
  3. Navigate to the target URL
  4. Extract data using AI
  5. Compare results with previous runs (if available)
  6. Store results in the local database for future comparisons

Best Practices and Considerations

Session Persistence

  • For convinience, create a Profile using our tool at Airtop Portal
  • Use your saved profile name to keep your login details for future runs

Data Extraction

  • Define clear output schemas for structured data
  • Use specific prompts for targeted information

Error Handling

The script includes robust error handling for:

  • Session creation failures
  • Authentication issues
  • Data extraction errors

Scheduling and Automation

The agent can be scheduled to run regularly using:

  • Cron jobs
  • Task schedulers
  • CI/CD pipelines

For detailed scheduling instructions, refer to How to schedule a worker.

Summary

This recipe demonstrates how to use Airtop to create web agents that monitor websites and extract data automatically. You can use it to track changes to web pages, monitor prices, or collect competitive data. Since the agent can maintain login sessions, it works well for monitoring content that requires authentication.