Monitor a website for changes
Overview
This recipe demonstrates how to use Airtop to create an intelligent web agent that can monitor and extract information from websites on a schedule. The agent can handle authentication, navigate through pages, and use AI to analyze and compare content changes over time.
The script creates a browser session, handles login states, extracts structured data using AI, and stores the results in a local database. It’s particularly useful for monitoring job listings, product prices, or any web content that changes regularly.
Prerequisites
To get started, ensure you have:
- Python 3.10 or higher
- Poetry==2.1.1 or higher
- An Airtop API key. You can get one for free.
Getting Started
- Clone the examples-python repository and navigate to the example directory:
- Install dependencies using Poetry:
- Configure your environment:
Add your Airtop API key to the .env
file:
Script Walkthrough
1. Initialize the Airtop Client
The script begins by setting up the Airtop client with your API key:
2. Session Management
The agent creates a browser session with a Profile name that can be persisted for future use:
If the profile doesn’t exist, Airtop creates a new one and your browsing session, including login credentials, cookies, and cache, will be automatically saved under this profile name for future use.
You can also create a profile manually at Airtop Portal and then use the same name in the script.
3. Authentication Handling
The script intelligently handles authentication checks:
4. AI-Powered Data Extraction
The agent uses structured prompts and schemas to extract specific information:
5. Data Storage and Comparison
Results are stored in a local SQLite database and compared with previous runs:
Running the Agent
Execute the script with:
The agent will:
- Create a browser session with a new or existing profile
- Handle authentication if needed
- Navigate to the target URL
- Extract data using AI
- Compare results with previous runs (if available)
- Store results in the local database for future comparisons
Best Practices and Considerations
Session Persistence
- For convinience, create a Profile using our tool at Airtop Portal
- Use your saved profile name to keep your login details for future runs
Data Extraction
- Define clear output schemas for structured data
- Use specific prompts for targeted information
Error Handling
The script includes robust error handling for:
- Session creation failures
- Authentication issues
- Data extraction errors
Scheduling and Automation
The agent can be scheduled to run regularly using:
- Cron jobs
- Task schedulers
- CI/CD pipelines
For detailed scheduling instructions, refer to How to schedule a worker.
Summary
This recipe demonstrates how to use Airtop to create web agents that monitor websites and extract data automatically. You can use it to track changes to web pages, monitor prices, or collect competitive data. Since the agent can maintain login sessions, it works well for monitoring content that requires authentication.