Find LinkedIn profiles of Y Combinator companies’ employees

Overview

This recipe demonstrates how to use Airtop to automate finding talent from companies accepted into Y Combinator batches. It’s a great example of how to use Airtop’s APIs to create a multi-step process that can execute sequentially or in parallel.

The instructions below will walk you through creating a script that connects to Airtop, gets a list of YC batches to select companies from, prompts the user to select a batch, gets a list of companies, opens a web browser to log in to Linkedin, and gets employee profiles that are linked to the companies previously identified.

Demo

A live demo of this recipe is available here. You can sign up to create an API key for free and try it out yourself!

Prerequisites

To get started, ensure you have:

  • Node.js installed on your system.
  • PNPM package manager installed. See here for installation steps.
  • A Node Version Manager (NVM preferably)
  • An Airtop API key. You can get one for free.

Getting Started

  1. Clone the repository

    Start by cloning the source code from GitHub:

    $git clone https://github.com/airtop-ai/examples-typescript
    >cd examples-typescript/examples/yc-batch-company-employees
  2. Install dependencies

    Run the following command to install the necessary dependencies, including the Airtop SDK:

    $pnpm install

Running the Script

To run the script, go to the examples/yc-batch-company-employees directory and run the following command in your terminal:

$pnpm run cli

Script walkthrough

The script executes the following tasks in order:

  1. Initializes the Airtop client

First, we initialize the AirtopClient using your provided API key. This client will be used to create browser sessions and interact with the page content.

1const airtop = new AirtopService({ apiKey, log });

Under the hood, the AirtopService uses the AirtopClient to interact with the Airtop API.

1this.client = new AirtopClient({ apiKey });
  1. Create a Browser Session

Creating a browser session will allow us to connect to and control a cloud-based browser. The API accepts an optional profileName parameter, which can be used to reuse a user’s previously provided sign in credentials. If no profileName is given, a fresh session is created and the profile will be saved on session termination.

The profile name is required for this recipe. This is because the script will need a profile logged into LinkedIn later to extract employee information.

1/**
2 * Creates a new session.
3 * @param profileName - The name of the profile to use for the session
4 * @returns The created session
5 */
6 async createSession(profileName?: string): Promise<SessionResponse> {
7 const session = await this.client.sessions.create({
8 configuration: {
9 timeoutMinutes: 15, // Terminate the session after 15 mins of inactivity
10 profileName,
11 },
12 });
13
14 this.savedSessions.set(session.data.id, session);
15
16 return session;
17 }
  1. Connect to the Browser

The script creates a window in the cloud browser with the link provided (in this case the YC companies page)

1/**
2 * Creates a new window.
3 * @param sessionId - The ID of the session to create the window in
4 * @param url - The URL to navigate to
5 * @returns The created window
6 */
7 async createWindow(sessionId: string, url: string): Promise<WindowIdResponse> {
8 const window = await this.client.windows.create(sessionId, { url });
9
10 this.savedWindows.set(window.data.windowId, { sessionId, windowId: window.data.windowId });
11
12 return window;
13 }
  1. Uses Airtop’s pageQuery API to get the list of batches to filter companies from

You can use the pageQuery method to interact with a page using LLMs. You might want to use this method to scrape a page for specific information, or even ask a more general question about the page. In this case, we specify a prompt to extract the list of batches from the YC page and return it in a JSON format for later processing.

1/**
2 * Gets the Y Combinator batches from the Y Combinator Company Directory page.
3 * @returns {Promise<string[]>} A promise that resolves to an array of batches
4 */
5 async getYcBatches(sessionId?: string): Promise<string[]> {
6 this.log.info("Initiating fetch to get YC batches");
7
8 // Get session info if provided, otherwise create a new session
9 const session = sessionId
10 ? await this.airtop.client.sessions.getInfo(sessionId)
11 : await this.airtop.createSession();
12
13 // YC Company Directory window
14 const window = await this.airtop.client.windows.create(session.data.id, {
15 url: YC_COMPANIES_URL,
16 });
17
18 this.log.info("Extracting YC batches");
19 // Extract the batches from the YC Company Directory page
20 const modelResponse = await this.airtop.client.windows.pageQuery(session.data.id, window.data.windowId, {
21 prompt: GET_YC_BATCHES_PROMPT,
22 configuration: {
23 outputSchema: GET_YC_BATCHES_OUTPUT_SCHEMA,
24 },
25 });
26
27 if (!modelResponse.data.modelResponse || modelResponse.data.modelResponse === "") {
28 throw new Error("No batches found");
29 }
30
31 const response = JSON.parse(modelResponse.data.modelResponse) as GetYcBatchesResponse;
32
33 if (response.error) {
34 throw new Error(response.error);
35 }
36
37 this.log
38 .withMetadata({
39 batches: response.batches,
40 })
41 .info("Successfully fetched YC batches");
42
43 return response.batches;
44 }

For a great dev experience and improved reliability, the pageQuery API allows the separation of the prompt and the output schema. Refer to this page for prompting tips.

1/**
2* Prompt to get the list of batches from the Y Combinator Companies page
3*/
4export const GET_YC_BATCHES_PROMPT = `
5You are looking at a startup directory page from Y Combinator. The startups are organized based on industry, region, company size, batch, and others.
6Your task is to extract the list of batches from the page.
7Batches follow the format: [Season][Year] where:
8- Season is F (Fall), S (Spring), or W (Winter)
9- Year is a 2-digit number (e.g., 24, 23, 22)
10Examples: F24, S24, W24
11
12Only include batches that appear in the dedicated batch filter/selection section of the page.
13Return an empty array if no valid batches are found.`;
14
15const GET_YC_BATCHES_SCHEMA = baseSchema.extend({
16 batches: z.array(z.string()),
17});
18
19export type GetYcBatchesResponse = z.infer<typeof GET_YC_BATCHES_SCHEMA>;
20
21export const GET_YC_BATCHES_OUTPUT_SCHEMA = zodToJsonSchema(GET_YC_BATCHES_SCHEMA);
  1. Uses the pageQuery API to get the list of companies for the selected batch

Similar to the previous step, we define a prompt and a schema to fetch the list of companies for the selected batch page in YC. Defining a good prompt and JSON schema allows us to implement good error handling and parsing logic.

1/**
2* Prompt to get the list of companies from the Y Combinator Companies page
3*/
4export const GET_COMPANIES_IN_BATCH_PROMPT = `
5You are looking at a startup directory page from Y Combinator that has been filtered by batch.
6Your task is to extract information for the companies listed on the page.
7
8For each company, extract:
91. Name (required): The company name exactly as shown (e.g., "Airtop")
102. Location (optional): The full location if provided (e.g., "San Francisco, CA")
113. Link (required): The full Y Combinator company URL (e.g., "https://ycombinator.com/companies/airtop")
12
13Important:
14- Only extract companies that have both a name and a valid YC link
15- Return an empty array if no valid companies are found
16`;
17
18const GET_COMPANIES_IN_BATCH_SCHEMA = baseSchema.extend({
19 companies: z.array(
20 z.object({
21 name: z.string(),
22 location: z.string().optional(),
23 link: z.string(),
24 }),
25 ),
26});
27
28export const GET_COMPANIES_IN_BATCH_OUTPUT_SCHEMA = zodToJsonSchema(GET_COMPANIES_IN_BATCH_SCHEMA);
29export type Company = z.infer<typeof GET_COMPANIES_IN_BATCH_SCHEMA>["companies"][number];
30export type GetCompaniesInBatchResponse = z.infer<typeof GET_COMPANIES_IN_BATCH_SCHEMA>;
31
32
33/**
34 * Gets the companies in a given Y Combinator batch.
35 * @param {string} batch - The batch to get companies for
36 * @param {string} sessionId - The ID of the session
37 * @returns {Promise<string[]>} A promise that resolves to an array of company names
38 */
39 async getCompaniesInBatch(batch: string, sessionId?: string): Promise<Company[]> {
40 this.log.info(`Initiating fetch to get companies in Y Combinator batch "${batch}"`);
41
42 const session = sessionId
43 ? await this.airtop.client.sessions.getInfo(sessionId)
44 : await this.airtop.createSession();
45
46 // YC Company Directory window
47 const window = await this.airtop.client.windows.create(session.data.id, {
48 url: `${YC_COMPANIES_URL}?batch=${batch}`,
49 });
50
51 this.log.info(`Extracting companies in batch "${batch}"`);
52 const modelResponse = await this.airtop.client.windows.pageQuery(session.data.id, window.data.windowId, {
53 prompt: GET_COMPANIES_IN_BATCH_PROMPT,
54 configuration: {
55 outputSchema: GET_COMPANIES_IN_BATCH_OUTPUT_SCHEMA,
56 },
57 });
58
59 if (!modelResponse.data.modelResponse || modelResponse.data.modelResponse === "") {
60 throw new Error("No companies found");
61 }
62
63 const response = JSON.parse(modelResponse.data.modelResponse) as GetCompaniesInBatchResponse;
64
65 if (response.error) {
66 throw new Error(response.error);
67 }
68
69 this.log
70 .withMetadata({
71 companies: response.companies,
72 })
73 .info("Successfully fetched companies in batch");
74
75 return response.companies;
76 }
  1. Creates parallel windows to fetch the company’s LinkedIn profile URL from each company’s YC page

For speedy parallel processing of different pages, we can leverage Airtop’s batchOperate API to execute a set of instructions in parallel. In this case, we process each of the companies’ pages to extract their LinkedIn URL using Airtop’s pageQuery API. Internally, the batchOperate API creates sessions and windows for each company. You can check more about the batchOperate API here.

1 /**
2 * Gets the LinkedIn profile URLs for a list of companies.
3 * @param {Company[]} companies - The companies to get LinkedIn profile URLs for
4 * @returns {Promise<string[]>} A promise that resolves to an array of LinkedIn profile URLs
5 */
6 async getCompaniesLinkedInProfileUrls(companies: Company[]): Promise<BatchOperationUrl[]> {
7 const companyUrls: BatchOperationUrl[] = companies.map((c) => ({ url: c.link }));
8
9 const getProfileUrl = async (input: BatchOperationInput): Promise<BatchOperationResponse<BatchOperationUrl>> => {
10 this.log.info(`Scraping for LinkedIn profile URL ${input.operationUrl.url}`);
11 const modelResponse = await this.airtop.client.windows.pageQuery(input.sessionId, input.windowId, {
12 prompt: GET_COMPANY_LINKEDIN_PROFILE_URL_PROMPT,
13 configuration: {
14 outputSchema: GET_COMPANY_LINKEDIN_PROFILE_URL_OUTPUT_SCHEMA,
15 },
16 });
17
18 if (!modelResponse.data.modelResponse || modelResponse.data.modelResponse === "") {
19 throw new Error("No LinkedIn profile URL found");
20 }
21
22 const response = JSON.parse(modelResponse.data.modelResponse) as GetCompanyLinkedInProfileUrlResponse;
23
24 if (response.error) {
25 throw new Error(response.error);
26 }
27
28 if (!response.linkedInProfileUrl) {
29 throw new Error("Failed to parse LinkedIn profile URL");
30 }
31
32 return {
33 data: { url: response.linkedInProfileUrl },
34 };
35 };
36
37 const handleError = async ({ error, operationUrls }: BatchOperationError) => {
38 this.log.withError(error).withMetadata({ operationUrls }).error("Error extracting LinkedIn profile URL");
39 };
40
41 this.log.info("Getting LinkedIn profile URLs for companies");
42 const profileUrls = await this.airtop.client.batchOperate(companyUrls, getProfileUrl, { onError: handleError });
43
44 this.log
45 .withMetadata({
46 linkedInProfileUrls: companyUrls,
47 })
48 .info("Successfully fetched LinkedIn profile urls for the companies");
49
50 return profileUrls.filter((url) => url.url !== null);
51 }
  1. Check if the user is logged in to LinkedIn, if not, provide a Live View URL to the user

In order to extract company and employee information from LinkedIn, we need a session with login credentials. The script checks if the user is logged in to LinkedIn, if not, it provides a Live View URL to the user to login. The code also saves the profile by terminating the session so that we can reuse the profile in the next steps.

1/**
2 * Checks if the user is signed into LinkedIn
3 * @param sessionId - The ID of the session
4 * @returns Whether the user is signed into LinkedIn
5 */
6 async checkIfSignedIntoLinkedIn(sessionId: string): Promise<boolean> {
7 this.log.info("Checking if user is signed into LinkedIn");
8 const window = await this.airtop.createWindow(sessionId, LINKEDIN_FEED_URL);
9
10 const modelResponse = await this.airtop.client.windows.pageQuery(sessionId, window.data.windowId, {
11 prompt: IS_LOGGED_IN_PROMPT,
12 configuration: {
13 outputSchema: IS_LOGGED_IN_OUTPUT_SCHEMA,
14 },
15 });
16
17 if (!modelResponse.data.modelResponse || modelResponse.data.modelResponse === "") {
18 throw new Error("No response from LinkedIn");
19 }
20
21 const response = JSON.parse(modelResponse.data.modelResponse) as IsLoggedInResponse;
22
23 return response.isLoggedIn;
24 }
25
26
27 /**
28 * Gets the LinkedIn login page Live View URL
29 * @param sessionId - The ID of the session
30 * @returns The LinkedIn login page Live View URL
31 */
32 async getLinkedInLoginPageLiveViewUrl(sessionId: string): Promise<string> {
33 const linkedInWindow = await this.airtop.createWindow(sessionId, LINKEDIN_FEED_URL);
34
35 const windowInfo = await this.airtop.client.windows.getWindowInfo(sessionId, linkedInWindow.data.windowId);
36
37 return windowInfo.data.liveViewUrl;
38 }
  1. Create a set of sequential windows to scrape the links from the employees list

The script uses Airtop’s scrapeContent API to get a plain text containing all the relevant info from a web page. This is very useful since a company profile is very predictable, and LLM capabilities might not be needed. scrapeContent is also a faster API compared to pageQuery.

After getting the page info in a text format, the script uses regex to detect and get the links from the scraped content.

1/**
2* Extracts the LinkedIn employees search URL from the given text
3* @param text - The text to extract the LinkedIn employees search URL from
4* @returns The LinkedIn employees search URL or null if not found
5*/
6const extractEmployeeListUrl = (text: string): string | null => {
7 // Pattern to match LinkedIn employees search URLs
8 const pattern = /https:\/\/www\.linkedin\.com\/search\/results\/people\/\?[^"\s]*/g;
9
10 // Find all search URLs
11 const matches = text.match(pattern);
12
13 // Filter to only include URLs with currentCompany parameter
14 const employeesUrl = matches?.find((url) => url.includes('currentCompany='));
15
16 return employeesUrl || null;
17};
18
19
20 /**
21 * Gets the LinkedIn employees list URLs for a given set of companies
22 * @param companyLinkedInProfileUrls - The list of company LinkedIn profile URLs
23 * @param sessionId - The ID of the session
24 * @returns The list of LinkedIn employees list URLs
25 */
26 async getEmployeesListUrls({
27 companyLinkedInProfileUrls,
28 profileName,
29 }: {
30 companyLinkedInProfileUrls: BatchOperationUrl[];
31 profileName: string;
32 }): Promise<BatchOperationUrl[]> {
33 this.log.info("Attempting to get the list of employees for the companies");
34
35 const getEmployeesListUrl = async (input: BatchOperationInput): Promise<BatchOperationResponse<string>> => {
36 const scrapedContent = await this.airtop.client.windows.scrapeContent(input.sessionId, input.windowId);
37
38 const url = this.extractEmployeeListUrl(scrapedContent.data.modelResponse.scrapedContent.text);
39
40 if (!url) {
41 throw new Error("No employees list URL found");
42 }
43
44 return {
45 data: url,
46 };
47 };
48
49 const handleError = async ({ error, operationUrls, liveViewUrl }: BatchOperationError) => {
50 this.log
51 .withError(error)
52 .withMetadata({
53 liveViewUrl,
54 operationUrls,
55 })
56 .error("Error extracting employees list URL for company LinkedIn profile.");
57 };
58
59 const employeesListUrls = await this.airtop.client.batchOperate(companyLinkedInProfileUrls, getEmployeesListUrl, {
60 onError: handleError,
61 sessionConfig: {
62 profileName, // Profile logged into LinkedIn
63 },
64 });
65
66 this.log
67 .withMetadata({
68 employeesListUrls,
69 })
70 .info("Successfully fetched employee list URLs for the companies");
71
72 // Filter out any null values and remove duplicates
73 return [...new Set(employeesListUrls.filter((url) => url !== null).map((url) => ({ url })))];
74 }
  1. Fetch the employee’s profile URL from each of the employees list URLs using Airtop’s batchOperate API for parallel processing

Similar to previous steps, we use Airtop’s batchOperate API and scrapeContent API to get the employee’s profile URL from each of the employees list URLs.

1/**
2 * Gets the LinkedIn employees profile URLs for a list of LinkedIn employees list URLs
3 * @param employeesListUrls - The list of LinkedIn employees list URLs
4 * @param sessionId - The ID of the session
5 * @returns The list of LinkedIn employees profile URLs
6 */
7 async getEmployeesProfileUrls({
8 employeesListUrls,
9 profileName,
10 }: { employeesListUrls: BatchOperationUrl[]; profileName: string }): Promise<string[]> {
11 this.log.info("Initiating extraction of employee's profile URLs for the employees");
12
13 const getEmployeeProfileUrl = async (input: BatchOperationInput): Promise<BatchOperationResponse<string[]>> => {
14 this.log.info(`Scraping content for employee URL: ${input.operationUrl.url}`);
15 const scrapedContent = await this.airtop.client.windows.scrapeContent(input.sessionId, input.windowId);
16
17 const newUrls = this.extractEmployeeProfileUrls(scrapedContent.data.modelResponse.scrapedContent.text);
18
19 return {
20 data: newUrls,
21 };
22 };
23
24 const employeesProfileUrls = (
25 await this.airtop.client.batchOperate(employeesListUrls, getEmployeeProfileUrl, {
26 sessionConfig: {
27 profileName, // Profile logged into LinkedIn
28 },
29 })
30 ).flat();
31
32 this.log
33 .withMetadata({
34 employeesProfileUrls,
35 })
36 .info("Successfully obtained employee profile URLs");
37
38 return employeesProfileUrls;
39 }
  1. Clean up

The script performs some session cleanup at the end, closing all sessions that were used during the execution time.

1/**
2 * Terminates a session.
3 * @param sessionId - The ID of the session to terminate
4 */
5 async terminateSession(sessionId: string | undefined): Promise<void> {
6 if (!sessionId) {
7 return;
8 }
9
10 this.log.debug(`Terminating session: ${sessionId}`);
11
12 // Terminate the session
13 await this.client.sessions.terminate(sessionId);
14 }

Summary

This recipe demonstrates how Airtop can be utilized to automate tasks that require multi page and parallel page flows. It leverages several Airtop’s APIs to scrape different pages, and provides an example of session and window management.