Web scraping pagination with Scrapy in Python

What is Pagination in Python?

Scraping mobile details from the Amazon site and applying pagination in the following project. The scraped details involve the name and price of mobiles and pagination to scrape all the result for the following searched URLinvolve

Logic behind Pagination

Here next_page variable gets url of the the next page only if next page is available but if no page is left then, this condition gets false.

Python3

next_page = response.xpath("//div/div/ul/li[@class='alast']/a/@href").get() 
if next_page: 
    abs_url = f"https://www.amazon.in{next_page}"
yield scrapy.Request( 
    url=abs_url, 
    callback=self.parse 
)

Note:

abs_url = f"https://www.amazon.in{next_page}"

Here need to take https://www.amazon.in is because next_page is /page2. That is incomplete and the complete url is https://www.amazon.in/page2

Fetch xpath of details need to be scraped –

Follow below steps to get xpath – xpath of items:

xpath of name:xpath of price:xpath of next page:

Spider Code

In this example the below code defines a web scraper using Scrapy to extract information (product name and price) from Amazon’s mobile phone search results. It initiates a request to the specified Amazon URL in the `start_requests` method, and in the `parse` method, it extracts product details from the HTML response. It also navigates to the next page of search results if available, continuing the scraping process.

Python3

import scrapy 
 
class MobilesSpider(scrapy.Spider): 
    name = 'mobiles'
    # create request object initially 
    def start_requests(self): 
        yield scrapy.Request( 
            url ='https://www.amazon.in / s?k = xiome + mobile + phone&crid'\ 
            + '= 2AT2IRC7IKO1K&sprefix = xiome % 2Caps % 2C302&ref = nb_sb_ss_i_1_5', 
            callback = self.parse 
        ) 
 
    # parse products 
    def parse(self, response): 
        products = response.xpath("//div[@class ='s-include-content-margin s-border-bottom s-latency-cf-section']") 
        for product in products: 
            yield { 
                'name': product.xpath(".//span[@class ='a-size-medium a-color-base a-text-normal']/text()").get(), 
                'price': product.xpath(".//span[@class ='a-price-whole']/text()").get() 
            } 
 
        print() 
        print("Next page") 
        print() 
        next_page = response.xpath("//div / div / ul / li[@class ='a-last']/a/@href").get() 
        if next_page: 
            abs_url = f"https://www.amazon.in{next_page}"
            yield scrapy.Request( 
                url = abs_url, 
                callback = self.parse 
            ) 
        else: 
            print() 
            print('No Page Left') 
            print() 

Scraped Results

Output

Pagination using Scrapy – Web Scraping with Python

Pagination using Scrapy. Web scraping is a technique to fetch information from websites. Scrapy is used as a Python framework for web scraping. Getting data from a normal website is easier, and can be just achieved by just pulling the HTML of the website and fetching data by filtering tags. But what is the case when there is Pagination in Python and in the data you are trying to fetch, For example – Amazon’s products can have multiple pages and to scrap all products successfully, one would need the concept of pagination.

Tags:

#Web-scraping #Python #python

What is Pagination in Python?

Web scraping pagination with Scrapy in Python

Logic behind Pagination

Python3

Fetch xpath of details need to be scraped –

Spider Code

Python3

Scraped Results

Pagination using Scrapy – Web Scraping with Python

Similar Reads