Querying a Page
Interact with a page using LLMs
You can use the pageQuery
method to interact with a page using LLMs. You might want to use this method to scrape a page for specific information, or even ask a more general question about the page.
Examples:
- On a company’s website, ask if they have a certain job opening.
- On a foreign news website, ask for a translation.
- On a product page, ask for list of product, but with the price in different currencies.
- etc.
Usage example
First, you’ll need to create a session.
Next, you’ll need to create a window and load a URL.
Finally, you can query the page.
Example output:
Paginated Results
If you’re scraping a paginated page, Airtop will automatically handle pagination for you. You just need to pass the followPaginationLinks: true
option and specify the number of pages or results you want to scrape.
Example output:
Using JSON Schemas
You can use JSON schemas to guide the AI’s response and force it to return JSON. This can be useful if you want a structured response that is more suitable for automated processing.
Prompting Tips
Like any LLM based tool, the quality of the results depends heavily on the quality of the prompt. Here are some tips to get the best results:
Basic Prompting Tips
-
Provide the AI with some context by telling it a little bit about the web page or content it’s looking at.
-
Be clear about your goals and what you want the AI to do.
-
If, to complete the request, more content than is originally visible must be loaded (i.e. paginated results or infinite scrolling or “Load More” controls), be sure to include a clear limit on when the AI should stop. It can also be helpful to be explicit about how more content should be loaded.
-
Include a draft-07 JSON schema if you want a structured response that is more suitable for automated processing.
-
Include a few guiding examples of how you would like the AI to respond in different scenarios. Good examples can significantly improve effectiveness and consistency.
-
If you feel like the LLM isn’t being as diligent as you’d like about evaluating a particular content field, try explicitly including that field in your output schema even if you don’t need it. This tends to force the LLM to pay more attention.
Tips for using JSON Schemas:
-
Take advantage of the
description
fields to give the AI additional clarity and instructions about how to populate the property, and consider even adding examples (especially if you would prefer certain formatting). -
Don’t mark a property as required unless you’re certain it will always be possible to provide. If the AI feels compelled to provide data that doesn’t exist, it’s very likely to hallucinate it.
-
Be sure to include a valid way for the AI to report back failure if it cannot fulfill the objective. If your schema does not allow the AI to report failure and something happens, it may feel compelled to return a natural language response instead, or even hallucinate results in order to honor the schema and your request to use it for responses.
-
You can sometimes use schema constraints to guide the AI response. For example, if you find that it includes an empty string or array for an optional property when unavailable, and you’d rather see that property omitted instead, you can add a constraint of minLength: 1 (or minItems: 1 for an array). Of course, make sure those properties are not marked as required.
-
Most major LLMs are quite good at generating JSON schemas from examples (or even natural language descriptions) if you’d rather not write them by hand.
-
Note that some JSON schema features are not supported by the structured outputs API. For example, the
oneOf
keyword is not supported. If you receive an error that the AI response does not match the output schema, you may need to revise your schema.