Find LinkedIn profiles of Y Combinator companies’ employees
Overview
This recipe demonstrates how to use Airtop to automate finding talent from companies accepted into Y Combinator batches. It’s a great example of how to use Airtop’s APIs to create a multi-step process that can execute sequentially or in parallel.
The instructions below will walk you through creating a script that connects to Airtop, gets a list of YC batches to select companies from, prompts the user to select a batch, gets a list of companies, opens a web browser to log in to Linkedin, and gets employee profiles that are linked to the companies previously identified.
Demo
A live demo of this recipe is available here. You can sign up to create an API key for free and try it out yourself!
Prerequisites
To get started, ensure you have:
- Node.js installed on your system.
- PNPM package manager installed. See here for installation steps.
- A Node Version Manager (NVM preferably)
- An Airtop API key. You can get one for free.
Getting Started
-
Clone the repository
Start by cloning the source code from GitHub:
-
Install dependencies
Run the following command to install the necessary dependencies, including the Airtop SDK:
Running the Script
To run the script, go to the examples/yc-batch-company-employees directory and run the following command in your terminal:
Script walkthrough
The script executes the following tasks in order:
- Initializes the Airtop client
First, we initialize the AirtopClient
using your provided API key. This client will be used to create browser sessions and interact with the page content.
Under the hood, the AirtopService
uses the AirtopClient
to interact with the Airtop API.
- Create a Browser Session
Creating a browser session will allow us to connect to and control a cloud-based browser. The API accepts an optional profileName
parameter, which can be used to reuse a user’s previously provided sign in credentials. If no profileName
is given, a fresh session is created and the profile will be saved on session termination.
The profile name is required for this recipe. This is because the script will need a profile logged into LinkedIn later to extract employee information.
- Connect to the Browser
The script creates a window in the cloud browser with the link provided (in this case the YC companies page)
- Uses Airtop’s
pageQuery
API to get the list of batches to filter companies from
You can use the pageQuery
method to interact with a page using LLMs. You might want to use this method to scrape a page for specific information, or even ask a more general question about the page. In this case, we specify a prompt to extract the list of batches from the YC page and return it in a JSON format for later processing.
For a great dev experience and improved reliability, the pageQuery
API allows the separation of the prompt and the output schema. Refer to this page for prompting tips.
- Uses the
pageQuery
API to get the list of companies for the selected batch
Similar to the previous step, we define a prompt and a schema to fetch the list of companies for the selected batch page in YC. Defining a good prompt and JSON schema allows us to implement good error handling and parsing logic.
- Creates parallel windows to fetch the company’s LinkedIn profile URL from each company’s YC page
For speedy parallel processing of different pages, we can leverage Airtop’s batchOperate
API to execute a set of instructions in parallel. In this case, we process each of the companies’ pages to extract their LinkedIn URL using Airtop’s pageQuery API. Internally, the batchOperate
API creates sessions and windows for each company. You can check more about the batchOperate
API here.
- Check if the user is logged in to LinkedIn, if not, provide a Live View URL to the user
In order to extract company and employee information from LinkedIn, we need a session with login credentials. The script checks if the user is logged in to LinkedIn, if not, it provides a Live View URL to the user to login. The code also saves the profile by terminating the session so that we can reuse the profile in the next steps.
- Create a set of sequential windows to scrape the links from the employees list
The script uses Airtop’s scrapeContent
API to get a plain text containing all the relevant info from a web page. This is very useful since a company profile is very predictable, and LLM capabilities might not be needed. scrapeContent
is also a faster API compared to pageQuery
.
After getting the page info in a text format, the script uses regex to detect and get the links from the scraped content.
- Fetch the employee’s profile URL from each of the employees list URLs using Airtop’s
batchOperate
API for parallel processing
Similar to previous steps, we use Airtop’s batchOperate
API and scrapeContent
API to get the employee’s profile URL from each of the employees list URLs.
- Clean up
The script performs some session cleanup at the end, closing all sessions that were used during the execution time.
Summary
This recipe demonstrates how Airtop can be utilized to automate tasks that require multi page and parallel page flows. It leverages several Airtop’s APIs to scrape different pages, and provides an example of session and window management.