Automatically Get Top 10 Jobs from LinkedIn Using Python
Here we are going to use Clicknium to scrape LinkedIn top 10 jobs. First, we will login to LinkedIn to search the jobs according to the job keyword(the title, the skill, or the company) and the location, and then get the top 10 jobs in the search results. For each job, we will get the job information, such as the title, the company name, the size of the company, the post date, the job type, and the link URL. At last, we will save the results into CSV file.
The steps overview are as below:
- Login to LinkedIn
- Search jobs with the keyword and location
- Scrape the information of the top 10 jobs
- Save search results into csv file
Installation
1.1 Python modules
Clicknium python module provides methods to automate various types of applications in Windows, such as Web browser, Windows Desktop application, Java application and Sap windows GUI app, etc. In this sample, we also use pywin32 python module to get clipboard data, pywin32 python module provides access to many of the Windows APIs from Python.
Install the python libraries with the following commands:
pip install clicknium pip install pywin32
1.2 Clicknium Visual Studio Code Extension
Clicknium VS Code extension provides ways to install extension with the chosen browser, Clicknium use the browser extension to interact with the browser. It also helps us get elements, edit elements or validate elements easier than before.
Login to LinkedIn
2.1 Capturing Steps using clicknium VS Code extension
Besides writing Python source code to automate the login process and the job search as well as the storing of the data, we also need to capture the web elements on Chrome browser using the clicknium VS Code extension. To launch the extension, press Ctrl+Shift+P to open the command palette and type to select “clicknium capture”. This will open a new capture dialog and let the user record web elements using Ctrl+Click. After following the discussed steps as discussed below, click complete and execute the Python source code for clicknium.
2.2 In this section, we will scrape the related elements of the login page
2.3 Open the browser with LinkedIn website, input the account username and password and then click the Sign in button
Python3
from clicknium import clicknium as cc, locator # Create a browser instance with # "cc.chrome", for edge browser using "cc.edge" # Open browser with specified url and # get browser tab For default, it will # wait the page load completely. You do # not need to add extra time.sleep() _tab = cc.chrome. open ( "https://www.linkedin.com/" , is_wait_complete = True ) # Find input box for username # Fill in with the key value 'linkedin_login_name' # in setting.json _tab.find_element(locator.chrome.linkedin.login.login_email).set_text( Setting.login_name) # Find input box for password # Fill in with the key value 'linkedin_login_password' # in setting.json _tab.find_element(locator.chrome.linkedin.login.login_password).set_text( Setting.login_password) # Find submit button, and click it to login _tab.find_element(locator.chrome.linkedin.login.signin).click() # Wait skip add phone button appears in 5 seconds, # if it exists, click the 'skip' button _tab.wait_appear(locator.chrome.linkedin.login.skip_add_phone, wait_timeout = 5 ).click() |
Search jobs with the keyword and location
3.1 In this section, we will scrape the related elements of the job search page
3.2 Switch to the Jobs tab, fill out keyword and location of the job, and then click the Search button
Python3
# Wait the page load completely # after submitting login information # Find job channel and click it # to switch to job channel _tab.wait_appear(locator.chrome.linkedin.job.jobs_channel, wait_timeout = 5 ).click() # Wait job search keyword input # box exists in 10 seconds # If exists fill in with the key # value 'linkedin_search_job_key' # in setting.json _tab.wait_appear(locator.chrome.linkedin.job.job_search_key, wait_timeout = 10 ).set_text(Setting.search_job_key) # Find job search location input box # Fill in with the key value # 'linkedin_search_job_location' in setting.json _tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text( Setting.search_job_location) # Find the search button, and click # it to search _tab.find_element(locator.chrome.linkedin.job.job_search).click() |
Scrape the information of the top 10 jobs
4.1 In this section, we will scrape the elements below:
4.2 Get the job item from the searching result list with parameter index
Python3
# Here we set range(1,11) to get top # 10 jobs, it can be set with any value for i in range ( 1 , 11 ): # Wait the job item appears in 5 second, # and get the element with index value ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, { "index" : i}, wait_timeout = 5 ) |
4.3 Get the title, the company name, the size of the company, the post date, the job type for each job item
Python3
# Initial job item search dict details = {} # Click job item ele.click() # Wait job item's title appears in 5 seconds job_title_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_title, wait_timeout = 5 ) # If job item's title exists, get the title # string and save into result object 'details' if job_title_ele: details[ "Job Title" ] = job_title_ele.get_text().strip() # Wait job item's company name appears in 5 seconds job_company_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_company, wait_timeout = 2 ) # If job item's company name exists, get the company # name string and save into result object 'details' if job_company_ele: details[ "Company Name" ] = job_company_ele.get_text().strip() # Wait job item's company scale appears in 5 seconds company_size_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.company_size, wait_timeout = 2 ) # If job item's company scale exists, get the # company scale string and save into result # object 'details' if company_size_ele: scale = company_size_ele.get_text().strip( ) if "employees" in company_size_ele.get_text() else "" details[ "Company Size" ] = scale # Wait job item's post date appears in 5 seconds job_post_date_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_post_date, wait_timeout = 2 ) # If job item's post date exists, get # the post date string and save into # result object 'details' if job_post_date_ele: post_date = job_post_date_ele.get_text().strip() \ if "ago" in job_post_date_ele.get_text() else "" details[ "Post Date" ] = post_date # Wait job item's type appears in 5 seconds job_type_ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_type, wait_timeout = 2 ) # If job item's type exists, get the type string # and save into result object 'details' if job_type_ele: details[ "Job Type" ] = job_type_ele.get_text().strip() |
4.4 Get job link
4.4.1 Getting clipboard data with pywin32
Python3
# Library for win32 clipboard api import win32clipboard # Get clipboard data def get_clipboard_data(): try : # Call open clipboard api win32clipboard.OpenClipboard() # Call get clipboard data api, and return the data data = win32clipboard.GetClipboardData() return data except : # If it got exception, return empty string return "" finally : # Call close clipboard api win32clipboard.CloseClipboard() |
4.4.2 Click the Share button and Copy link button, then get data from clipboard
Python3
# Wait job item's share button appears # in 5 seconds job_share_btn_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.share_button, wait_timeout = 2 ) # If job item's share button exists, click # the share button if job_share_btn_ele: job_share_btn_ele.click() # Wait the copy link button appears in 5 seconds copy_link = _tab.wait_appear( locator.chrome.linkedin.jobitem.copy_link, wait_timeout = 2 ) # If the copy link exists, click the copy # link to set clipboard data if copy_link: copy_link.click() # Sleep 0.2 second to wait the clipboard # in ready state sleep( 0.2 ) # Get the job link string and save into # result object 'details' details[ "Job Link" ] = get_clipboard_data() |
Save search results into csv file
5.1 Here is the content in result csv file:
5.2 Use python built-in module csv to save data into csv file
Python3
# Library for csv operations api import csv # Save the list of dicts info csv file def list_dict_to_csv(dicts, filename = "test.csv" ): # Open csv file and get file object with open (filename, 'w' , newline = '') as output_file: # Get csv header with the dicts keys keys = dicts[ 0 ].keys() # Initial DictWriter object dict_writer = csv.DictWriter(output_file, keys) # Write header into csv dict_writer.writeheader() # Write row datas into csv dict_writer.writerows(dicts) |
Below is the complete implementation
6.1 sample.py
Python3
# Library for web automation apis # Locator used for selector reference from clicknium import clicknium as cc, locator # Library for delay function from time import sleep # Library for save dict list data into csv file from csvutils import list_dict_to_csv # Library for clear clipboard and get clipboard data from clipboard import get_clipboard_data, clear_clipboard_data # Library for get setting in 'setting.json' file from setting import Setting # Login to LinkedIn page # Find input box for username and password, # and fill in with the value in setting.json # Find submit button, and click it to login # Wait 'skip add phone' button if it needs, # and click the 'skip' button def login(): # Find input box for username # Fill in with the key value # 'linkedin_login_name' in setting.json _tab.find_element(locator.chrome.linkedin.login.login_email).set_text( Setting.login_name) # Find input box for password # Fill in with the key value # 'linkedin_login_password' in setting.json _tab.find_element(locator.chrome.linkedin.login.login_password).set_text( Setting.login_password) # Find submit button, and click it to login _tab.find_element(locator.chrome.linkedin.login.signin).click() # Wait skip add phone button appears in 5 # seconds, if it exists, click the 'skip' button _tab.wait_appear( locator.chrome.linkedin.login.skip_add_phone, wait_timeout = 5 ).click() def search_jobs(): # Wait the page load completely after # submitting login information # Find job channel and click it to # switch to job channel _tab.wait_appear(locator.chrome.linkedin.job.jobs_channel, wait_timeout = 5 ).click() # Wait job search keyword input box exists # in 10 seconds If exists fill in with # the key value 'linkedin_search_job_key' # in setting.json _tab.wait_appear(locator.chrome.linkedin.job.job_search_key, wait_timeout = 10 ).set_text(Setting.search_job_key) # Find job search location input box # Fill in with the key value # 'linkedin_search_job_location' in setting.json _tab.find_element(locator.chrome.linkedin.job.job_search_location).set_text( Setting.search_job_location) # Find the search button, and click it to search _tab.find_element(locator.chrome.linkedin.job.job_search).click() # Scrape the information of the top 10 jobs # For each job item, get the title, # the company name, the size of the company, # the post date, the job type # Save search results into csv file def get_job_top10_list(): # Initial search result list job_list = [] # Clear clipboard data first clear_clipboard_data() # Here we set range(1,11) to get top 10 jobs, # it can be set with any value for i in range ( 1 , 11 ): # Wait the job item appears in 5 second, # and get the element with index value ele = _tab.wait_appear(locator.chrome.linkedin.jobitem.job_listitem, { "index" : i}, wait_timeout = 5 ) # If job item exists, click the job # item to get detail information if ele: # Initial job item search dict details = {} # Click job item ele.click() # Wait job item's title appears in 5 seconds job_title_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_title, wait_timeout = 5 ) # If job item's title exists, get # the title string and save into # result object 'details' if job_title_ele: details[ "Job Title" ] = job_title_ele.get_text().strip() # Wait job item's company name appears in 5 seconds job_company_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_company, wait_timeout = 2 ) # If job item's company name exists #, get the company name string and # save into result object 'details' if job_company_ele: details[ "Company Name" ] = job_company_ele.get_text().strip() # Wait job item's company scale appears in 5 seconds company_size_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.company_size, wait_timeout = 2 ) # If job item's company scale exists, # get the company scale string and # save into result object 'details' if company_size_ele: scale = company_size_ele.get_text().strip( ) if "employees" in company_size_ele.get_text() else "" details[ "Company Size" ] = scale # Wait job item's post date appears in 5 seconds job_post_date_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_post_date, wait_timeout = 2 ) # If job item's post date exists, # get the post date string and save # into result object 'details' if job_post_date_ele: post_date = job_post_date_ele.get_text().strip( ) if "ago" in job_post_date_ele.get_text() else "" details[ "Post Date" ] = post_date # Wait job item's type appears in 5 seconds job_type_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.job_type, wait_timeout = 2 ) # If job item's type exists, get the # type string and save into result # object 'details' if job_type_ele: details[ "Job Type" ] = job_type_ele.get_text().strip() # Wait job item's share button appears in 5 seconds job_share_btn_ele = _tab.wait_appear( locator.chrome.linkedin.jobitem.share_button, wait_timeout = 2 ) # If job item's share button exists, # click the share button if job_share_btn_ele: job_share_btn_ele.click() # Wait the copy link button appears in 5 seconds copy_link = _tab.wait_appear( locator.chrome.linkedin.jobitem.copy_link, wait_timeout = 2 ) # If the copy link exists, click the copy # link to set clipboard data if copy_link: copy_link.click() # Sleep 0.2 second to wait the clipboard in ready state sleep( 0.2 ) # Get the job link string and save # into result object 'details' details[ "Job Link" ] = get_clipboard_data() # Save job item's result to list object job_list.append(details) # If it has any results, save into the csv file, # set the file path with the key # value 'result_csv_file' in setting.json if job_list: list_dict_to_csv(job_list, Setting.result_csv_file) if __name__ = = "__main__" : # Create a browser instance with "cc.chrome", # for edge browser using "cc.edge" # Open browser with specified url and get browser tab # For default, it will wait the page load # completely. You do not need to add extra time.sleep() _tab = cc.chrome. open ( "https://www.linkedin.com/" , is_wait_complete = True ) # Check whether it needs to login in with username and password # True: means it needs to login in with username and password # False: means the website has remember authentication information if _tab.is_existing(locator.chrome.linkedin.login.login_email): # Login to LinkedIn login() # Search jobs with the keyword and location search_jobs() # Get top 10 jobs information from search # results and save into csv file get_job_top10_list() |
6.2 csvutils.py
Python3
# Library for csv operations api import csv # Save the list of dicts info csv file def list_dict_to_csv(dicts, filename = "test.csv" ): # Open csv file and get file object with open (filename, 'w' , newline = '') as output_file: # Get csv header with the dicts keys keys = dicts[ 0 ].keys() # Initial DictWriter object dict_writer = csv.DictWriter(output_file, keys) # Write header into csv dict_writer.writeheader() # Write row datas into csv dict_writer.writerows(dicts) |
6.3 clipboard.py
Python3
# Library for win32 clipboard api import win32clipboard # Clear clipboard data def clear_clipboard_data(): try : # Call open clipboard api win32clipboard.OpenClipboard() # Call empty clipboard api win32clipboard.EmptyClipboard() finally : # Call close clipboard api win32clipboard.CloseClipboard() # Get clipboard data def get_clipboard_data(): try : # Call open clipboard api win32clipboard.OpenClipboard() # Call get clipboard data api, and return the data data = win32clipboard.GetClipboardData() return data except : # If it got exception, return empty string return "" finally : # Call close clipboard api win32clipboard.CloseClipboard() |
6.4 setting.py
Python3
# Library for json operations api import json class Setting( object ): # Open json file and get file object # Load json data with open ( "setting.json" ) as f: data = json.load(f) # Value set for LinkedIn login username login_name = data[ 'linkedin_login_name' ] # Value set for LinkedIn login password login_password = data[ 'linkedin_login_password' ] # Value set for LinkedIn job search keyword search_job_key = data[ 'linkedin_search_job_key' ] # Value set for LinkedIn job search location search_job_location = data[ 'linkedin_search_job_location' ] # Value set for csv file path to save search results result_csv_file = data[ 'result_csv_file' ] |
6.5 setting.json
Python3
{ "linkedin_login_name" : "your account username" , "linkedin_login_password" : "your account password" , "linkedin_search_job_key" : "your desired job title" , "linkedin_search_job_location" : "your desired job location" , "result_csv_file" : "C:\\test\\test.csv" } |
6.6 Output
Here is the video of the complete execution: