How to get the next page on BeautifulSoup?

In this article, we are going to see how to Get the next page on beautifulsoup.

Modules Needed

  • BeautifulSoup: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. To install this module type the below command in the terminal.
pip install bs4
  • requests: This library allows you to send HTTP/1.1 requests extremely easily. To install this module type the below command in the terminal.
pip install requests

Approach:

Get the next page on beautifulsoup means first we will scrap one-page content and if many links are given on the page, and we want to scrap them also. We can get the next page first we will scrap the sample website after that any other links find, and we will call again requests. Get method for that page and will create a soup of that also. So this way we can get to the next page on beautifulsoup.

Let’s execute the script step-by-step :

Step 1: Import all dependence

from bs4 import BeautifulSoup
import requests

Step 2: We need to request the page URL with requests.

page=requests.get(sample_website)

Step 3: With the help of beautifulsoup method and HTML parser we will create a soup of the page.

soup = BeautifulSoup(page, 'html.parser')

Step 4:

We will search in the parse tree and find the link. If we want that URL, then with the help of the requests module and beautiful module we will again create the soup of the next page hence we can get the next page on beautifulsoup.

Python3




for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.w3wiki.net" string 
  if("www.w3wiki.net" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)


Below is the full Implementation:

Python3




from bs4 import BeautifulSoup
import requests
  
# sample website
  
# call get method to request the page
page=requests.get(sample_website)
  
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.w3wiki.net" string 
  if("www.w3wiki.net" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)


Output:

next url title :  w3wiki | A computer science portal for Beginner
next url title :  Analysis of Algorithms | Set 1 (Asymptotic Analysis) - w3wiki
next url title :  Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - w3wiki
next url title :  Analysis of Algorithms | Set 3 (Asymptotic Notations) - w3wiki
next url title :  Analysis of algorithms | little o and little omega notations - w3wiki
next url title :  Lower and Upper Bound Theory - w3wiki
next url title :  Analysis of Algorithms | Set 4 (Analysis of Loops) - w3wiki
next url title :  Analysis of Algorithm | Set 4 (Solving Recurrences) - w3wiki
next url title :  Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - w3wiki
next url title :  What does 'Space Complexity' mean? - w3wiki
next url title :  Pseudo-polynomial Algorithms - w3wiki
next url title :  Polynomial Time Approximation Scheme - w3wiki
next url title :  A Time Complexity Question - w3wiki
.................................................................