Scraping Reddit Posts

To extract data from Reddit posts, we need the URL of the post. Once we have the URL, we need to create a submission object.

Python3




import praw
import pandas as pd
 
reddit_read_only = praw.Reddit(client_id="",         # your client id
                               client_secret="",      # your client secret
                               user_agent="")        # your user agent
 
# URL of the post
url = "https://www.reddit.com/r/IAmA/comments/m8n4vt/\
im_bill_gates_cochair_of_the_bill_and_melinda/"
 
# Creating a submission object
submission = reddit_read_only.submission(url=url)


We will extract the best comments from the post we have selected. We will need the MoreComments object from the praw module. To extract the comments, we will use a for-loop on the submission object. All the comments will be added to the post_comments list. We will also add an if-statement in the for-loop to check whether any comment has the object type of more comments. If it does, it means that our post has more comments available. So we will add these comments to our list as well. Finally, we will convert the list into a pandas data frame.

Python3




from praw.models import MoreComments
 
post_comments = []
 
for comment in submission.comments:
    if type(comment) == MoreComments:
        continue
 
    post_comments.append(comment.body)
 
# creating a dataframe
comments_df = pd.DataFrame(post_comments, columns=['comment'])
comments_df


Output:

list into a pandas dataframe

 



Scraping Reddit using Python

In this article, we are going to see how to scrape Reddit using Python, here we will be using python’s PRAW (Python Reddit API Wrapper) module to scrape the data. Praw is an acronym Python Reddit API wrapper, it allows Reddit API through Python scripts.

Similar Reads

Installation

To install PRAW, run the following commands on the command prompt:...

Creating a Reddit App

Step 1: To extract data from Reddit, we need to create a Reddit app. You can create a new Reddit app(https://www.reddit.com/prefs/apps)....

Creating a PRAW Instance

In order to connect to Reddit, we need to create a praw instance. There are 2 types of praw instances:...

Scraping Reddit Subreddits

...

Scraping Reddit Posts:

There are different ways of extracting data from a subreddit. The posts in a subreddit are sorted as hot, new, top, controversial, etc. You can use any sorting method of your choice....