Extract data from a LinkedIn profile

Overview

This recipe demonstrates how to extract data from a LinkedIn profile. It uses Airtop’s Live View, Profiles, and Page Query APIs to allow a user to log in to LinkedIn and extract data from their profile. Once logged in, an agent can re-use the user’s profile to log in and extract additional data from other LinkedIn profiles autonomously.

Demo

A live demo of this recipe is available here. You can sign up to create an API key for free and try it out yourself!

Running Locally

To get started, ensure you have:

  • Node.js installed on your system.
  • An Airtop API key. You can get one for free.
  1. Clone the repository and install dependencies
$git clone https://github.com/airtop-ai/examples-typescript.git
>cd examples-typescript
>pnpm i
>turbo build:packages
>cd examples/linkedin-data-extraction
>pnpm i
  1. Run the CLI. This will run the agent on the command line and is useful for quickly testing the agent.
$pnpm run cli
  1. (optional) Run the web application. This will start the Next.js app that is demoed in the video above.
$pnpm dev

Walkthrough

Although this recipe seems to have a lot of code for the CLI and for the web app, the core logic is actually quite simple and entirely encapsulated in the linkedin-extractor.service.ts file. The LinkedInExtractorService class encapsulates all the agent specific logic and has methods for initializing a session, checking if the user is signed in, extracting data from the user’s profile, and terminating the session.

Creating a new session and window

The following code creates a new session and browser window, and generates a LiveView URL for the user to sign in to LinkedIn.

1function async initializeSessionAndBrowser(profileId?: string): Promise<{ session: any; windowInfo: any }> {
2 this.log.info("Creating a new session");
3 const createSessionResponse = await this.client.sessions.create({
4 configuration: {
5 timeoutMinutes: 10,
6 persistProfile: !profileId, // Only persist a new profile if we do not have an existing profileId
7 baseProfileId: profileId,
8 },
9 });
10
11 const session = createSessionResponse.data;
12 this.log.info("Created session", session.id);
13
14 if (!createSessionResponse.data.cdpWsUrl) {
15 throw new Error("Unable to get cdp url");
16 }
17
18 this.log.info("Creating browser window");
19 const windowResponse = await this.client.windows.create(session.id, { url: LOGIN_URL });
20
21 this.log.info("Getting browser window info");
22 const windowInfo = await this.client.windows.getWindowInfo(session.id, windowResponse.data.windowId);
23
24 return {
25 session,
26 windowInfo, // This includes the live view url which is passed to the user to sign in
27 };
28}

Checking if a user is signed in

Checking if a user is signed in is simple and only requires a call to the Page Query API.

1function async checkIfSignedIntoLinkedIn({ sessionId, windowId }: { sessionId: string; windowId: string }): Promise<boolean> {
2 this.log.info("Determining whether the user is logged in...");
3 const isLoggedInPromptResponse = await this.client.windows.pageQuery(sessionId, windowId, {
4 prompt: IS_LOGGED_IN_PROMPT,
5 configuration: {
6 outputSchema: IS_LOGGED_IN_OUTPUT_SCHEMA,
7 },
8 });
9
10 this.log.info("Parsing response to if the user is logged in");
11 const parsedResponse = JSON.parse(isLoggedInPromptResponse.data.modelResponse);
12
13 if (parsedResponse.error) {
14 throw new Error(parsedResponse.error);
15 }
16
17 return parsedResponse.isLoggedIn;
18 }

All the prompts and schemas are defined in the consts.ts file.

1export const IS_LOGGED_IN_PROMPT =
2 "This browser is open to a page that either display's a user's Linkedin profile or prompts the user to login. Please give me a JSON response matching the schema below.";
3
4export const IS_LOGGED_IN_OUTPUT_SCHEMA = {
5 $schema: "http://json-schema.org/draft-07/schema#",
6 type: "object",
7 properties: {
8 isLoggedIn: {
9 type: "boolean",
10 description: "Use this field to indicate whether the user is logged in.",
11 },
12 error: {
13 type: "string",
14 description: "If you cannot fulfill the request, use this field to report the problem.",
15 },
16 },
17};

Extracting data from a LinkedIn profile

Finally, extracting data from a LinkedIn profile is also done through a call to the Page Query API. At the end of the flow, we close the window and terminate the session.

1function async extractLinkedInData({ sessionId, windowId }: { sessionId: string; windowId: string }): Promise<string> {
2 this.log.info("Extracting data from LinkedIn");
3
4 // Navigate to the target URL
5 this.log.info("Navigating to target url");
6
7 await this.client.windows.loadUrl(sessionId, windowId, { url: TARGET_URL });
8
9 this.log.info("Prompting the AI agent, waiting for a response (this may take a few minutes)...");
10
11 const promptContentResponse = await this.client.windows.pageQuery(sessionId, windowId, {
12 prompt: EXTRACT_DATA_PROMPT,
13 configuration: {
14 outputSchema: EXTRACT_DATA_OUTPUT_SCHEMA,
15 },
16 });
17
18 this.log.info("Got response from AI agent, formatting JSON");
19
20 const formattedJson = JSON.stringify(JSON.parse(promptContentResponse.data.modelResponse), null, 2);
21
22 this.log.info("Closing window and terminating session");
23
24 await this.client.windows.close(sessionId, windowId);
25 await this.client.sessions.terminate(sessionId);
26
27 this.log.info("Cleanup completed");
28
29 return formattedJson;
30}
1export const EXTRACT_DATA_PROMPT = `Given the LinkedIn profile URL, extract the following information:
2Job Information: Include all past and present job titles, company names, dates of employment, job descriptions, and locations.
3Education Information: Gather the school names, degrees obtained, field of study, dates attended, and any listed honors or activities.
4Mutual Connections: List mutual connections by their names, current job titles, and the company they work for.`;
5
6const OUTPUT_SCHEMA = z.object({
7 profile_url: z.string(),
8 personal_info: z.object({
9 name: z.string(),
10 headline: z.string(),
11 location: z.string(),
12 }),
13 job_history: z.array(
14 z.object({
15 title: z.string(),
16 company: z.string(),
17 location: z.string(),
18 start_date: z.string().describe("In YYYY-MM format"),
19 end_date: z.string().describe("In YYYY-MM format or 'Present'"),
20 description: z.string(),
21 }),
22 ),
23 education: z.array(
24 z.object({
25 school: z.string(),
26 degree: z.string(),
27 field_of_study: z.string(),
28 start_date: z.string().describe("In YYYY-MM format"),
29 end_date: z.string().describe("In YYYY-MM format"),
30 honors: z.array(z.string()),
31 activities: z.array(z.string()),
32 }),
33 ),
34 mutual_connections: z.array(
35 z.object({
36 name: z.string(),
37 title: z.string(),
38 company: z.string(),
39 }),
40 ),
41 error: z.string().optional().describe("If you cannot fulfill the request, use this field to report the problem"),
42});
43
44export const EXTRACT_DATA_OUTPUT_SCHEMA = zodToJsonSchema(OUTPUT_SCHEMA);
Built with