Page Interactions

Control browser interactions using LLMs

You can use Airtop’s interaction methods to naturally control browser actions like clicking, typing, and hovering. These methods use AI to understand natural language descriptions of elements, eliminating the need for complex selectors or XPaths.

Usage Examples

First, you’ll need to create a session and window as shown in previous guides.

1const session = await client.sessions.create();
2const window = await client.windows.create(session.data.id, { url: "https://google.com/finance/" });

Clicking Elements

Use the click method to interact with clickable elements on the page:

1const result = await client.windows.click(sessionId, windowId, {
2 elementDescription: "The 'Compare Markets' button near the top left of the page"
3});

Example descriptions you could use:

  • “The blue Submit button at the bottom of the form"
  • "The ‘Read more’ link in the article"
  • "The shopping cart icon in the navigation bar”

Many websites have dynamic pages that update their layout when some events are performed. It’s a good practice to add a delay after interactions that trigger animations or loading states. Check Handling Dynamic Content for more details.

Typing Text

Use the type method to input text into form fields:

1const result = await client.windows.type(sessionId, windowId, {
2 elementDescription: "The search input field at the top of the page",
3 text: "What to search for",
4 pressEnterKey: true // Optional: press Enter after typing
5});

Hovering Over Elements

Use the hover method to trigger hover states on elements:

1const result = await client.windows.hover(sessionId, windowId, {
2 elementDescription: "The dropdown menu in the navigation bar"
3});

Best Practices

Element Descriptions

When describing elements, be as specific as possible:

✅ Good descriptions:

  • “The blue ‘Submit’ button at the bottom of the contact form"
  • "The search input field in the top navigation bar"
  • "The ‘Products’ dropdown menu in the main navigation”

❌ Avoid vague descriptions:

  • “The button"
  • "The input"
  • "The menu”

Handling Dynamic Content

Add a delay after interactions that trigger animations or loading states:

1// Wait a few seconds
2await new Promise((resolve) => setTimeout(resolve, 3000));

This could prevent errors in which the agent clicks on the wrong element due to dynamic content.

Current Limitations

  • Interactions must be performed sequentially
  • Elements must be visible in the current viewport
  • Complex multi-step interactions require separate commands
  • Accuracy can decrease for larger viewport sizes. Keep your browser windows below 1080p (1920x1080) for best results.

Common Use Cases

  • Form automation
  • Navigation testing
  • UI interaction validation
  • Interactive web scraping
  • End-to-end testing
  • Workflow automation

These interaction methods can be chained together to create complex user flows while maintaining readable and maintainable code.

Built with