Getting Reddit Data with Python

In the previous post How to Get Submission and Comments with Python Reddit API Wrapper – PRAW I put how to use Python Reddit API Wrapper for getting information from Reddit. In this post we review few more ways to get data from Reddit.

I did search on the web and found the following python script on github. It is using BeautifulSoup python library for parsing HTML and urllib.request python library for opening reddit url.

As per documentation, The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.

Another available example is at Scraping Reddit with Python and BeautifulSoup 4 It is using BeautifulSoup for HTML web parsing. For opening url it is using requests python library. The requests module uses urllib3 under the hood and provides a slightly more higher level and simpler API on top of it. From multiple discussions on the web it is recommended use requests library.

There are many ways to get information from the web. Instead of using information from web pages, we can utilize website RSS (assuming it is available). For this we would need feedparser – python library for parsing Atom and RSS feeds.
To install feedparser run the command: pip install feedparser

Here is the example how to use feedparser to get data from reddit rss link:

import feedparser

d = feedparser.parse('https://www.reddit.com/r/mlquestions.rss')

# print all posts
count = 1
blockcount = 1
for post in d.entries:
    if count % 5 == 1:
       
        print ("-----------------------------------------\n")
        blockcount += 1
    print (post.title + "\n")
    count += 1

Thus we reviewed several more ways to get information from Reddit. We can use these methods for different websites too – just need to replace url link. Below you can find few more links how to do scraping jobs from web pages.
Feel free to put comments or feedback.

References
1. RedditNewsAggregator
2. Scraping Reddit with Python and BeautifulSoup 4
3. A simple Python feedparser script
4. Requests Documentation
5. What is the practical difference between these two ways of making web connections in Python?
6. Beginners guide to Web Scraping: Part 2 – Build a web scraper for Reddit using Python and BeautifulSoup
7. Scraping Reddit data

Leave a Comment