Scraping Reddit Subreddits

There are different ways of extracting data from a subreddit. The posts in a subreddit are sorted as hot, new, top, controversial, etc. You can use any sorting method of your choice.

Let’s extract some information from the redditdev subreddit.

Python3




import praw
import pandas as pd
 
reddit_read_only = praw.Reddit(client_id="",         # your client id
                               client_secret="",      # your client secret
                               user_agent="")        # your user agent
 
 
subreddit = reddit_read_only.subreddit("redditdev")
 
# Display the name of the Subreddit
print("Display Name:", subreddit.display_name)
 
# Display the title of the Subreddit
print("Title:", subreddit.title)
 
# Display the description of the Subreddit
print("Description:", subreddit.description)


Output:

Name, Title, and Description

Now let’s extract 5 hot posts from the Python subreddit:

Python3




subreddit = reddit_read_only.subreddit("Python")
 
for post in subreddit.hot(limit=5):
    print(post.title)
    print()


Output:

Top 5 hot posts

We will now save the top posts of the python subreddit in a pandas data frame:

Python3




posts = subreddit.top("month")
# Scraping the top posts of the current month
 
posts_dict = {"Title": [], "Post Text": [],
              "ID": [], "Score": [],
              "Total Comments": [], "Post URL": []
              }
 
for post in posts:
    # Title of each post
    posts_dict["Title"].append(post.title)
     
    # Text inside a post
    posts_dict["Post Text"].append(post.selftext)
     
    # Unique ID of each post
    posts_dict["ID"].append(post.id)
     
    # The score of a post
    posts_dict["Score"].append(post.score)
     
    # Total number of comments inside the post
    posts_dict["Total Comments"].append(post.num_comments)
     
    # URL of each post
    posts_dict["Post URL"].append(post.url)
 
# Saving the data in a pandas dataframe
top_posts = pd.DataFrame(posts_dict)
top_posts


Output:

top posts of the python subreddit

Exporting Data to a CSV File:

Python3




import pandas as pd
 
top_posts.to_csv("Top Posts.csv", index=True)


Output:

CSV File of Top Posts

Scraping Reddit using Python

In this article, we are going to see how to scrape Reddit using Python, here we will be using python’s PRAW (Python Reddit API Wrapper) module to scrape the data. Praw is an acronym Python Reddit API wrapper, it allows Reddit API through Python scripts.

Similar Reads

Installation

To install PRAW, run the following commands on the command prompt:...

Creating a Reddit App

Step 1: To extract data from Reddit, we need to create a Reddit app. You can create a new Reddit app(https://www.reddit.com/prefs/apps)....

Creating a PRAW Instance

In order to connect to Reddit, we need to create a praw instance. There are 2 types of praw instances:...

Scraping Reddit Subreddits

...

Scraping Reddit Posts:

There are different ways of extracting data from a subreddit. The posts in a subreddit are sorted as hot, new, top, controversial, etc. You can use any sorting method of your choice....