Scraping LinkedIn Jobs with AI


I’ve been working in the oil & gas industry for the last 3 years, albeit more on the tech side which is more transferable. It’s been a great challenge and learning experience but it was never meant to be forever and the major challenges have been solved. I’ve been looking at getting into the defense industry for a while but am uncertain about what are the most important and sought-after skills.

I started looking at defense-related jobs that allow me to use my broad-ranging skillset in mechanical, software, and a lesser extent electrical engineering. If you search “defense” on LinkedIn, you get roles such as systems, product, program, solution, design, test, or RF engineers. Most of these roles would allow me to use the cross-disciplines skills I’ve gained; I definitely don’t qualify for RF roles, that’s pure electrical but it’s something I’ve always found interesting after developing a RFID tracking system to install on offshore drillships.

After going through these jobs and saving the ones that sounded interesting, my plan was to read through them and pull out the relevant skills and start aggregating and tracking them by their frequency. At first the plan was to do this manually, however I had saved 170 jobs and it then became quicker to automate this.

I’ve used a javascript script on LinkedIn before that you run in your console browser, this was to go through and unsave all jobs because there’s no way to do this in bulk. That wouldn’t work since there are no standard formats for job descriptions and writing code to parse them would be very fickle, easier to download the job description, send it to an AI, and have it return skills in a set format that can then be parsed. Off the top of my head this would likely need a browser driver like selenium to navigate the website, a scripting language (likely python for it’s AI libraries, especially pydandtic for response type checking, and then an AI api to send the job descriptions to.).

The Investigation

LinkedIn uses a search query param to track the page, based on the number of jobs: https://www.linkedin.com/my-items/saved-jobs/?start=160 with 10 jobs shown per page.

Now I need to find if there’s a way to programatically go through each post and navigate to it. Checking the page source, there’s the following anchor tag for each title:

<a class="nuXDIvMbeMYWApPugutCOKmVhZzvTYUM " href="https://www.linkedin.com/jobs/view/4216481674/?refId=957de8e7-2b89-4bd4-8699-a8f08e6d6485&amp;trackingId=2S5d%2FZcsTlme9WuofTKkWw%3D%3D&amp;trk=flagship3_job_home_savedjobs" data-test-app-aware-link="">
<!---->Product Engineer - Solidworks<!---->
</a>

I probably don’t need all the referance and tracking IDs, if I try simply navigate to: https://www.linkedin.com/jobs/view/4216481674/ does it work? Success. Yes it does.

Parsing the job page

The job description is in a collapsed card with a title “About the job” with a See more link at the bottom. This appears to the same for all jobs. Now I need to figure out how to parse it.

Inspecting the collapse card shows, the following:

<article class="jobs-description__container">
    <!-- All the job description details -->
</article>

So I can navigate to this page, copy this entire html element, and that should be good enough for the AI to parse and pull out the most relevant jobs. Come on LinkedIn, make it at least a little hard. Was initially expecting to have to simulate clicks to access the data but in hindsight that would have been a poor user experience if they data was fetched from the backend.

The Implementation

Now before starting it’s always good to take stock and reassess basic assumptions. In this case, the basic assumption was that it would be faster to write a script than to manually parse these job descriptions. After investigating the issue properly, we’ll need to:

  • Write a python script to access the jobs web page using selenium (20 mins)
  • Write code to navigate to the each job post, and pull out the job description (10 mins)
  • Send that job description html to an AI api with a prompt to return a list of skills. (30 mins)
  • Write a way to aggregate the data (20 mins)

So let’s say 1.5 hours to do this via scripting and AI. If we optimistically assume 2 minutes to do this manually for a job, it would be 6 hours. It’s a no-brainer in the sense that if I tried to do this manually I would end up blowing my brains out.

The code

The code for the script can be found here: AI Job Scraping Script