Web Scraping using Selenium and Google Colab

Install necessary packages

To begin web scraping using selenium and google colab, we have to first start with installing necessary packages and modules in our google colab environment. Since this are not pre-installed in google colab.

Advanced Package Tool (APT) check for an updates to the list of available software packages and their versions.

Chromium web driver is an essential step as it will allows our program to interact with our chrome browser.

!pip install selenium
!apt update
!apt install chromium-chromedriver

Note : This may take some time as it tries to connect to a server. After it connects to a server ,then its a piece of cake. You can see all the necessary libraries starts to install. Take a look at below image for better understanding.

Step 1: Import Libraries

Now in next step we have to import necessary modules in our program.

Python




from selenium import webdriver
from selenium.webdriver.common.by import By


By class provides us a set of methods that we can further use to locate web elements.

Step 2: Configure Chrome Options

Now we need to configure our chrome options.

  • “–headless” will allow chrome to operate without a graphic user interface (GUI) .
  • “–no-sandbox” it will come in handy when we are running in certain environments where sandboxing might cause an issue. ( sandboxing is isolating software processes or “sandbox” to prevent security breach.)
  • “–disable-dev-shm-usage” will disable /dev/shm/ file which can help with our resource management.

Python




options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
dr = webdriver.Chrome(options=options)


Now we are good to go and can preform web scraping using selenium and google colab with ease. Below we have shown a code snippet demonstrating web scraping with google colab.

Import the website for Scraping

Python3




dr.get("https://www.w3wiki.org/") # Website used for scraping
 
#Displaying the title of the website in this case I had used GFG's Website
print(dr.title,"\n")
 
#Displaying some GFG's Articles
c=1
for i in dr.find_elements(By.CLASS_NAME,'gfg_home_page_article_meta'):
  print(str(c)+". ",i.text)
  c += 1
 
#quitting the browser
dr.quit()


Output:

w3wiki | A computer science portal for geeks 
1. Roles and Responsibilities of an Automation Test Engineer
2. Top 15 AI Website Builders For UI UX Designers
3. 10 Best UPI Apps for Cashback in 2023
4. POTD Solutions | 31 Oct’ 23 | Move all zeroes to end of array
5. Create Aspect Ratio Calculator using HTML CSS and JavaScript
6. Design HEX To RGB Converter using ReactJS
7. Create a Password Generator using HTML CSS and jQuery
8. Waterfall vs Agile Software Development Model
9. Top 8 Software Development Models used in Industry
10. Create a Random User Generator using jQuery
11. Multiple linear regression analysis of Boston Housing Dataset using R
12. Outlier detection with Local Outlier Factor (LOF) using R
13. NTG Full Form
14. R Program to Check Prime Number
15. A Complete Overview of Android Software Development for Beginners
16. Difference Between Ethics and Morals
17. Random Forest for Time Series Forecasting using R
18. Difference Between Vapor and Gas

Conclusion

In this article we have seen the use of Google Colab in web scraping along with selenium. Google colab is a cloud-based and cost effective platform where we can perform our web-related tasks such web scraping, web automation with python with ease. In order to perform such tasks, our first step should be installing necessary packages and libraries in our environment. Since some of the libraries/packages are not pre-installed in our google colab environment. In this article we have demonstrated how we can install those libraries/packages. We have seen how to perform our web related tasks with selenium and google colab with concise examples for better understanding.



How to do web scraping using selenium and google colab?

Selenium is used for testing, web automation tasks, web scraping tasks etc. Its WebDriver component allows user actions to perform tasks in the web browser, while its headless mode performs automation tasks in the background. Google Colaboratory in short Google Colab is a cloud-based platform provided by Google to perform Python tasks, in an environment similar to Jupyter Notebook. It is a great way to work with Selenium as it provides free access to computing resources and flexible frameworks. This integration enables web automation, testing, and data extraction services. This allows users with high RAM (i.e. 12gb+) and great disk storage. In this article, we’ll use Selenium in Google Colab for Web Scraping.

Similar Reads

What is Web Scraping?

Web scraping is the process of extracting data from websites using automated tools or scripts. It involves retrieving information from web pages and saving it in a structured format for further analysis or use. Web scraping is a powerful technique that allows users to gather large amounts of data from various sources on the internet ranging from market research to academic studies....

Web Scraping using Selenium and Google Colab

Install necessary packages...