Page Interactions
Control browser interactions using LLMs
You can use Airtop’s interaction methods to naturally control browser actions like clicking, typing, and hovering. These methods use AI to understand natural language descriptions of elements, eliminating the need for complex selectors or XPaths.
Usage Examples
First, you’ll need to create a session and window as shown in previous guides.
Clicking Elements
Use the click
method to interact with clickable elements on the page:
Example descriptions you could use:
- “The blue Submit button at the bottom of the form"
- "The ‘Read more’ link in the article"
- "The shopping cart icon in the navigation bar”
Many websites have dynamic pages that update their layout when some events are performed. It’s a good practice to add a delay after interactions that trigger animations or loading states. Check Handling Dynamic Content for more details.
Typing Text
Use the type
method to input text into form fields:
Hovering Over Elements
Use the hover
method to trigger hover states on elements:
Best Practices
Element Descriptions
When describing elements, be as specific as possible:
✅ Good descriptions:
- “The blue ‘Submit’ button at the bottom of the contact form"
- "The search input field in the top navigation bar"
- "The ‘Products’ dropdown menu in the main navigation”
❌ Avoid vague descriptions:
- “The button"
- "The input"
- "The menu”
Handling Dynamic Content
Add a delay after interactions that trigger animations or loading states:
This could prevent errors in which the agent clicks on the wrong element due to dynamic content.
Current Limitations
- Interactions must be performed sequentially
- Elements must be visible in the current viewport
- Complex multi-step interactions require separate commands
- Accuracy can decrease for larger viewport sizes. Keep your browser windows below 1080p (1920x1080) for best results.
Common Use Cases
- Form automation
- Navigation testing
- UI interaction validation
- Interactive web scraping
- End-to-end testing
- Workflow automation
These interaction methods can be chained together to create complex user flows while maintaining readable and maintainable code.