Page Interactions | Airtop

You can use Airtop’s interaction methods to naturally control browser actions like clicking, typing, and hovering. These methods use AI to understand natural language descriptions of elements, eliminating the need for complex selectors or XPaths.

Usage Examples

First, you’ll need to create a session and window as shown in previous guides.

1 const session = await client.sessions.create();
2 const window = await client.windows.create(session.data.id, { url: "https://google.com/finance/" });

Clicking Elements

Use the click method to interact with clickable elements on the page:

1 const result = await client.windows.click(sessionId, windowId, {
2   elementDescription: "The 'Compare Markets' button near the top left of the page"
3 });

Example descriptions you could use:

“The blue Submit button at the bottom of the form”
“The ‘Read more’ link in the article”
“The shopping cart icon in the navigation bar”

Many websites have dynamic pages that update their layout when some events are performed. It’s a good practice to add a delay after interactions that trigger animations or loading states. Check Handling Dynamic Content for more details.

Right Click + Double Click

If you want to perform a right click or a double click, you can do so by adding the rightClick or doubleClick parameter to the request.

1 const result = await client.windows.click(sessionId, windowId, {
2   elementDescription: "The 'Compare Markets' button near the top left of the page",
3   configuration: {
4     clickType: "rightClick"
5   }
6 });

Typing Text

Use the type method to input text into form fields:

1 const result = await client.windows.type(sessionId, windowId, {
2   elementDescription: "The search input field at the top of the page",
3   text: "What to search for",
4   pressEnterKey: true // Optional: press Enter after typing
5 });

Hovering Over Elements

Use the hover method to trigger hover states on elements:

1 const result = await client.windows.hover(sessionId, windowId, {
2   elementDescription: "The dropdown menu in the navigation bar"
3 });

Best Practices

Element Descriptions

When describing elements, be as specific as possible:

✅ Good descriptions:

“The blue ‘Submit’ button at the bottom of the contact form”
“The search input field in the top navigation bar”
“The ‘Products’ dropdown menu in the main navigation”

❌ Avoid vague descriptions:

“The button”
“The input”
“The menu”

Handling Dynamic Content

It’s common for interactions to trigger a page reload or navigation, in which case subsequent interactions might occur while the page is still loading. You can use the waitForNavigation parameter to wait for the page to load before performing further interactions.

Add a delay after interactions that trigger animations or loading states:

1 const result = await client.windows.click(sessionId, windowId, {
2   elementDescription: "The 'Compare Markets' button near the top left of the page",
3   waitForNavigation: true
4 });

If the interaction doesn’t trigger a navigation, you can use a delay to wait for the page to change.

1 // Wait a few seconds
2 await new Promise((resolve) => setTimeout(resolve, 3000));

This could prevent errors in which the agent clicks on the wrong element due to dynamic content.

Current Limitations

Interactions must be performed sequentially
Elements must be visible in the current viewport
Complex multi-step interactions require separate commands
Accuracy can decrease for larger viewport sizes. Keep your browser windows below 1080p (1920x1080) for best results.

Common Use Cases

Form automation
Navigation testing
UI interaction validation
Interactive web scraping
End-to-end testing
Workflow automation

These interaction methods can be chained together to create complex user flows while maintaining readable and maintainable code.