Automatically Get Top 10 Jobs from LinkedIn Using Python

Q: What is Automatically Get Top 10 Jobs from LinkedIn Using Python?

In this article, we will learn Automatically Get Top 10 Jobs from LinkedIn Using Python,This free Python tutorial for complete beginners will help you learn Python from scratch.

How to get data from LinkedIn using Python

Here we are going to use Clicknium to scrape LinkedIn top 10 jobs. First, we will login to LinkedIn to search the jobs according to the job keyword(the title, the skill, or the company) and the location, and then get the top 10 jobs in the search results. For each job, we will get the job information, such as the title, the company name, the size of the company, the post date, the job type, and the link URL. At last, we will save the results into CSV file.

The steps overview are as below:

Login to LinkedIn
Search jobs with the keyword and location
Scrape the information of the top 10 jobs
Save search results into csv file

Installation

1.1 Python modules

Clicknium python module provides methods to automate various types of applications in Windows, such as Web browser, Windows Desktop application, Java application and Sap windows GUI app, etc. In this sample, we also use pywin32 python module to get clipboard data, pywin32 python module provides access to many of the Windows APIs from Python.

Install the python libraries with the following commands:

pip install clicknium
pip install pywin32

1.2 Clicknium Visual Studio Code Extension

Clicknium VS Code extension provides ways to install extension with the chosen browser, Clicknium use the browser extension to interact with the browser. It also helps us get elements, edit elements or validate elements easier than before.

Login to LinkedIn

2.1 Capturing Steps using clicknium VS Code extension

Besides writing Python source code to automate the login process and the job search as well as the storing of the data, we also need to capture the web elements on Chrome browser using the clicknium VS Code extension. To launch the extension, press Ctrl+Shift+P to open the command palette and type to select “clicknium capture”. This will open a new capture dialog and let the user record web elements using Ctrl+Click. After following the discussed steps as discussed below, click complete and execute the Python source code for clicknium.

Launch Clicknium Capture Dialog

2.2 In this section, we will scrape the related elements of the login page

2.3 Open the browser with LinkedIn website, input the account username and password and then click the Sign in button

Python3

from clicknium import clicknium as cc, locator 
  
# Create a browser instance with 
# "cc.chrome", for edge browser using "cc.edge" 
# Open browser with specified url and 
# get browser tab For default, it will 
# wait the page load completely. You do 
# not need to add extra time.sleep() 
_tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True) 
  
# Find input box for username 
# Fill in with the key value 'linkedin_login_name' 
# in setting.json 
_tab.find_element(locator.chrome.linkedin.login.login_email).set_text( 
    Setting.login_name) 
  
# Find input box for password 
# Fill in with the key value 'linkedin_login_password' 
# in setting.json 
_tab.find_element(locator.chrome.linkedin.login.login_password).set_text( 
    Setting.login_password) 
  
# Find submit button, and click it to login 
_tab.find_element(locator.chrome.linkedin.login.signin).click() 
  
# Wait skip add phone button appears in 5 seconds, 
# if it exists, click the 'skip' button 
_tab.wait_appear(locator.chrome.linkedin.login.skip_add_phone, 
                 wait_timeout=5).click() 

Search jobs with the keyword and location

3.1 In this section, we will scrape the related elements of the job search page

job search page

3.2 Switch to the Jobs tab, fill out keyword and location of the job, and then click the Search button

Python3

# Wait the page load completely 
# after submitting login information 
# Find job channel and click it 
# to switch to job channel 
_tab.wait_appear(locator.chrome.linkedin.job.jobs_channel, 
                 wait_timeout=5).click() 
  
# Wait job search keyword input 
# box exists in 10 seconds 
# If exists fill in with the key 
# value 'linkedin_search_job_key' 
# in setting.json 
_tab.wait_appear(locator.chrome.linkedin.job.job_search_key, 
                 wait_timeout=10).set_text(Setting.search_job_key) 
  
# Find job search location input box 
# Fill in with the key value 
# 'linkedin_search_job_location' in setting.json 
_tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text( 
    Setting.search_job_location) 
  
# Find the search button, and click 
# it to search 
_tab.find_element(locator.chrome.linkedin.job.job_search).click() 

Scrape the information of the top 10 jobs

4.1 In this section, we will scrape the elements below:

job detail information

4.2 Get the job item from the searching result list with parameter index

Python3

# Here we set range(1,11) to get top 
# 10 jobs, it can be set with any value 
for i in range(1, 11): 
  
    # Wait the job item appears in 5 second, 
    # and get the element with index value 
    ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, { 
                           "index": i}, wait_timeout=5) 

4.3 Get the title, the company name, the size of the company, the post date, the job type for each job item

Python3

# Initial job item search dict 
details = {} 
  
# Click job item 
ele.click() 
  
# Wait job item's title appears in 5 seconds 
job_title_ele = _tab.wait_appear( 
    locator.chrome.linkedin.jobitem.job_title, wait_timeout=5) 
  
# If job item's title exists, get the title 
# string and save into result object 'details' 
if job_title_ele: 
details["Job Title"] = job_title_ele.get_text().strip() 
  
# Wait job item's company name appears in 5 seconds 
job_company_ele = _tab.wait_appear( 
    locator.chrome.linkedin.jobitem.job_company, wait_timeout=2) 
  
# If job item's company name exists, get the company 
# name string and save into result object 'details' 
if job_company_ele: 
    details["Company Name"] = job_company_ele.get_text().strip() 
  
# Wait job item's company scale appears in 5 seconds 
company_size_ele = _tab.wait_appear( 
    locator.chrome.linkedin.jobitem.company_size, wait_timeout=2) 
  
# If job item's company scale exists, get the 
# company scale string and save into result 
# object 'details' 
if company_size_ele: 
    scale = company_size_ele.get_text().strip( 
    ) if "employees" in company_size_ele.get_text() else "" 
    details["Company Size"] = scale 
  
# Wait job item's post date appears in 5 seconds   
job_post_date_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_post_date,  
                                     wait_timeout = 2) 
  
# If job item's post date exists, get  
# the post date string and save into  
# result object 'details' 
if job_post_date_ele: 
    post_date = job_post_date_ele.get_text().strip() \ 
    if "ago" in job_post_date_ele.get_text() else "" 
    details["Post Date"] = post_date 
              
# Wait job item's type appears in 5 seconds   
job_type_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_type, 
                                wait_timeout = 2) 
  
# If job item's type exists, get the type string 
# and save into result object 'details' 
if job_type_ele: 
    details["Job Type"] = job_type_ele.get_text().strip() 

4.4 Get job link

4.4.1 Getting clipboard data with pywin32

Python3

# Library for win32 clipboard api 
import win32clipboard 
  
# Get clipboard data 
def get_clipboard_data(): 
    try: 
        
        # Call open clipboard api 
        win32clipboard.OpenClipboard() 
  
        # Call get clipboard data api, and return the data 
        data = win32clipboard.GetClipboardData() 
        return data 
    except: 
        
        # If it got exception, return empty string 
        return "" 
    finally: 
        
        # Call close clipboard api 
        win32clipboard.CloseClipboard()

4.4.2 Click the Share button and Copy link button, then get data from clipboard

Python3

# Wait job item's share button appears 
# in 5 seconds 
job_share_btn_ele = _tab.wait_appear( 
    locator.chrome.linkedin.jobitem.share_button, wait_timeout=2) 
  
# If job item's share button exists, click 
# the share button 
if job_share_btn_ele: 
    job_share_btn_ele.click() 
  
    # Wait the copy link button appears in 5 seconds 
    copy_link = _tab.wait_appear( 
        locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2) 
      
    # If the copy link exists, click the copy 
    # link to set clipboard data 
    if copy_link: 
        copy_link.click() 
  
        # Sleep 0.2 second to wait the clipboard  
        # in ready state 
        sleep(0.2) 
  
        # Get the job link string and save into  
        # result object 'details' 
        details["Job Link"] = get_clipboard_data() 

Save search results into csv file

5.1 Here is the content in result csv file:

CSV File of Saved Records

5.2 Use python built-in module csv to save data into csv file

Python3

# Library for csv operations api 
import csv 
  
# Save the list of dicts info csv file 
def list_dict_to_csv(dicts, filename="test.csv"): 
  
    # Open csv file and get file object 
    with open(filename, 'w', newline='') as output_file: 
        
        # Get csv header with the dicts keys 
        keys = dicts[0].keys() 
  
        # Initial DictWriter object 
        dict_writer = csv.DictWriter(output_file, keys) 
  
        # Write header into csv 
        dict_writer.writeheader() 
  
        # Write row datas into csv 
        dict_writer.writerows(dicts) 

Below is the complete implementation

6.1 sample.py

Python3

# Library for web automation apis 
# Locator used for selector reference 
from clicknium import clicknium as cc, locator 
  
# Library for delay function 
from time import sleep 
  
# Library for save dict list data into csv file 
from csvutils import list_dict_to_csv 
  
# Library for clear clipboard and get clipboard data 
from clipboard import get_clipboard_data, clear_clipboard_data 
  
# Library for get setting in 'setting.json' file 
from setting import Setting 
  
# Login to LinkedIn page 
# Find input box for username and password, 
# and fill in with the value in setting.json 
# Find submit button, and click it to login 
# Wait 'skip add phone' button if it needs, 
# and click the 'skip' button 
def login(): 
    
    # Find input box for username 
    # Fill in with the key value 
    # 'linkedin_login_name' in setting.json 
    _tab.find_element(locator.chrome.linkedin.login.login_email).set_text( 
        Setting.login_name) 
  
    # Find input box for password 
    # Fill in with the key value 
    # 'linkedin_login_password' in setting.json 
    _tab.find_element(locator.chrome.linkedin.login.login_password).set_text( 
        Setting.login_password) 
  
    # Find submit button, and click it to login 
    _tab.find_element(locator.chrome.linkedin.login.signin).click() 
  
    # Wait skip add phone button appears in 5 
    # seconds, if it exists, click the 'skip' button 
    _tab.wait_appear( 
        locator.chrome.linkedin.login.skip_add_phone, wait_timeout=5).click() 
  
  
def search_jobs(): 
    
    # Wait the page load completely after  
    # submitting login information 
    # Find job channel and click it to 
    # switch to job channel 
    _tab.wait_appear(locator.chrome.linkedin.job.jobs_channel, 
                     wait_timeout=5).click() 
  
    # Wait job search keyword input box exists 
    # in 10 seconds If exists fill in with 
    # the key value 'linkedin_search_job_key'  
    # in setting.json 
    _tab.wait_appear(locator.chrome.linkedin.job.job_search_key, 
                     wait_timeout=10).set_text(Setting.search_job_key) 
  
    # Find job search location input box 
    # Fill in with the key value 
    # 'linkedin_search_job_location' in setting.json 
    _tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text( 
        Setting.search_job_location) 
  
    # Find the search button, and click it to search 
    _tab.find_element(locator.chrome.linkedin.job.job_search).click() 
  
# Scrape the information of the top 10 jobs 
# For each job item, get the title, 
# the company name, the size of the company, 
# the post date, the job type 
# Save search results into csv file 
def get_job_top10_list(): 
    # Initial search result list 
    job_list = [] 
  
    # Clear clipboard data first 
    clear_clipboard_data() 
  
    # Here we set range(1,11) to get top 10 jobs, 
    # it can be set with any value 
    for i in range(1, 11): 
  
        # Wait the job item appears in 5 second, 
        # and get the element with index value 
        ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, { 
                               "index": i}, wait_timeout=5) 
  
        # If job item exists, click the job 
        # item to get detail information 
        if ele: 
            # Initial job item search dict 
            details = {} 
  
            # Click job item 
            ele.click() 
  
            # Wait job item's title appears in 5 seconds 
            job_title_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.job_title, wait_timeout=5) 
              
            # If job item's title exists, get 
            # the title string and save into  
            # result object 'details' 
            if job_title_ele: 
                details["Job Title"] = job_title_ele.get_text().strip() 
  
            # Wait job item's company name appears in 5 seconds 
            job_company_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.job_company, wait_timeout=2) 
              
            # If job item's company name exists 
            #, get the company name string and 
            # save into result object 'details' 
            if job_company_ele: 
                details["Company Name"] = job_company_ele.get_text().strip() 
  
            # Wait job item's company scale appears in 5 seconds 
            company_size_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.company_size, wait_timeout=2) 
              
            # If job item's company scale exists, 
            # get the company scale string and 
            # save into result object 'details' 
            if company_size_ele: 
                scale = company_size_ele.get_text().strip( 
                ) if "employees" in company_size_ele.get_text() else "" 
                details["Company Size"] = scale 
  
            # Wait job item's post date appears in 5 seconds 
            job_post_date_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.job_post_date, wait_timeout=2) 
              
            # If job item's post date exists, 
            # get the post date string and save 
            # into result object 'details' 
            if job_post_date_ele: 
                post_date = job_post_date_ele.get_text().strip( 
                ) if "ago" in job_post_date_ele.get_text() else "" 
                details["Post Date"] = post_date 
  
            # Wait job item's type appears in 5 seconds 
            job_type_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.job_type, wait_timeout=2) 
              
            # If job item's type exists, get the 
            # type string and save into result 
            # object 'details' 
            if job_type_ele: 
                details["Job Type"] = job_type_ele.get_text().strip() 
  
            # Wait job item's share button appears in 5 seconds 
            job_share_btn_ele = _tab.wait_appear( 
                locator.chrome.linkedin.jobitem.share_button, wait_timeout=2) 
              
            # If job item's share button exists, 
            # click the share button 
            if job_share_btn_ele: 
                job_share_btn_ele.click() 
  
                # Wait the copy link button appears in 5 seconds 
                copy_link = _tab.wait_appear( 
                    locator.chrome.linkedin.jobitem.copy_link, wait_timeout=2) 
                  
                # If the copy link exists, click the copy 
                # link to set clipboard data 
                if copy_link: 
                    copy_link.click() 
  
                    # Sleep 0.2 second to wait the clipboard in ready state 
                    sleep(0.2) 
  
                    # Get the job link string and save 
                    # into result object 'details' 
                    details["Job Link"] = get_clipboard_data() 
  
            # Save job item's result to list object 
            job_list.append(details) 
  
    # If it has any results, save into the csv file, 
    # set the file path with the key 
    # value 'result_csv_file' in setting.json 
    if job_list: 
        list_dict_to_csv(job_list, Setting.result_csv_file) 
  
  
if __name__ == "__main__": 
    
    # Create a browser instance with "cc.chrome", 
    # for edge browser using "cc.edge" 
    # Open browser with specified url and get browser tab 
    # For default, it will wait the page load 
    # completely. You do not need to add extra time.sleep() 
    _tab = cc.chrome.open("https://www.linkedin.com/", is_wait_complete=True) 
  
    # Check whether it needs to login in with username and password 
    # True: means it needs to login in with username and password 
    # False: means the website has remember authentication information 
    if _tab.is_existing(locator.chrome.linkedin.login.login_email): 
        # Login to LinkedIn 
        login() 
  
    # Search jobs with the keyword and location 
    search_jobs() 
  
    # Get top 10 jobs information from search 
    # results and save into csv file 
    get_job_top10_list() 

6.2 csvutils.py

Python3

# Library for csv operations api 
import csv 
  
# Save the list of dicts info csv file 
def list_dict_to_csv(dicts, filename="test.csv"): 
  
    # Open csv file and get file object 
    with open(filename, 'w', newline='') as output_file: 
        # Get csv header with the dicts keys 
        keys = dicts[0].keys() 
  
        # Initial DictWriter object 
        dict_writer = csv.DictWriter(output_file, keys) 
  
        # Write header into csv 
        dict_writer.writeheader() 
  
        # Write row datas into csv 
        dict_writer.writerows(dicts) 

6.3 clipboard.py

Python3

# Library for win32 clipboard api 
import win32clipboard 
  
# Clear clipboard data 
def clear_clipboard_data(): 
    try: 
        # Call open clipboard api 
        win32clipboard.OpenClipboard() 
  
        # Call empty clipboard api 
        win32clipboard.EmptyClipboard() 
    finally: 
        # Call close clipboard api 
        win32clipboard.CloseClipboard() 
  
# Get clipboard data 
def get_clipboard_data(): 
    try: 
        # Call open clipboard api 
        win32clipboard.OpenClipboard() 
  
        # Call get clipboard data api, and return the data 
        data = win32clipboard.GetClipboardData() 
        return data 
    except: 
        # If it got exception, return empty string 
        return "" 
    finally: 
        # Call close clipboard api 
        win32clipboard.CloseClipboard()

6.4 setting.py

Python3

# Library for json operations api 
import json 
  
class Setting(object): 
           
    # Open json file and get file object 
    # Load json data 
    with open("setting.json") as f: 
        data = json.load(f) 
  
    # Value set for LinkedIn login username 
    login_name = data['linkedin_login_name'] 
  
    # Value set for LinkedIn login password 
    login_password = data['linkedin_login_password'] 
  
    # Value set for LinkedIn job search keyword 
    search_job_key = data['linkedin_search_job_key'] 
  
    # Value set for LinkedIn job search location 
    search_job_location = data['linkedin_search_job_location'] 
  
    # Value set for csv file path to save search results 
    result_csv_file = data['result_csv_file']

6.5 setting.json

Python3

{ 
    "linkedin_login_name": "your account username", 
    "linkedin_login_password": "your account password", 
    "linkedin_search_job_key": "your desired job title", 
    "linkedin_search_job_location": "your desired job location", 
    "result_csv_file": "C:\\test\\test.csv"
}

6.6 Output

Here is the video of the complete execution:

complete execution

Tags:

#Python #python

How to get data from LinkedIn using Python