How to Get Submission and Comments with Python Reddit API Wrapper – PRAW.

According to Alexa [1] people spent more time on Reddit than on Facebook, Instagramm or Youtube. Users use Reddit to post questions, share content or ideas and discuss topics. So it is very interesting to extract automatically text data from this web service. We will look how to do this with PRAW – The Python Reddit API Wrapper.[2]

The example of how to get API key and use python PRAW API can be found at How to scrape reddit with python It is however is not adding all comments, that might be attached to submission. Comments can have important information so I decided to build the python script with PRAW API that is modified from above link for adding comments and few minor things.

To get comments we first need to obtain a submission object.
With a submission object we can then like below:

comment_body = ""
    for comment in submission.comments.list():
        print(comment.body)
        comment_body =  comment_body + comment.body + "\n"

If we wanted to output only the body of the top level comments in the thread we could do:

for top_level_comment in submission.comments:
    print(top_level_comment.body)

Here is the full python script of API example that can get Reddit information including comments. Note that as we only downloading data and not changing anything, we do not need user name and password. But in case you modifying data on reddit, you would need include login information too.

"""
To install module:
pip install praw
"""

import praw
import pandas as pd
from datetime import datetime


reddit = praw.Reddit(client_id='xxxxxxxx', \
                     client_secret='xxxxxxxx', \
                     user_agent='personal use script')   
                     ##username='YOUR_REDDIT_USER_NAME', \
                     ###password='YOUR_REDDIT_LOGIN_PASSWORD')


def get_yyyy_mm_dd_from_utc(dt):
    date = datetime.utcfromtimestamp(dt)
   
    return str(date.year) + "-" + str(date.month) + "-" + str(date.day)



subreddit = reddit.subreddit('learnmachinelearning')


top_subreddit = subreddit.top(limit=998)

topics_dict = { "title":[], "score":[], "id":[], "url":[], \
                "comms_num": [], "created": [],  "body":[], "z_comments":[]}


for submission in top_subreddit:
    
    # https://www.reddit.com/r/redditdev/comments/46g9ao/using_praw_to_call_reddit_api_need_help/
    topics_dict["title"].append(submission.title)
    topics_dict["score"].append(submission.score)
    topics_dict["id"].append(submission.id)
    topics_dict["url"].append(submission.url)
    topics_dict["comms_num"].append(submission.num_comments)
    topics_dict["created"].append(get_yyyy_mm_dd_from_utc(submission.created))
    topics_dict["body"].append(submission.selftext)
    
   
  
    all_comments = submission.comments.list()
    print (all_comments)
  
    
  
    # https://praw.readthedocs.io/en/latest/tutorials/comments.html
    submission.comments.replace_more(limit=None)
    comment_body = ""
    for comment in submission.comments.list():
        print(comment.body)
        comment_body =  comment_body + comment.body + "\n"
    topics_dict["z_comments"].append (comment_body)

topics_data = pd.DataFrame(topics_dict)

topics_data.to_csv('Reddit_data.csv', index=False) 

References
1. The top 500 sites on the web
2. PRAW
3. How to scrape reddit with python
4. Tutorials
5. Webscraping Reddit — Python Reddit API Wrapper (PRAW) Tutorial for Windows

Python API Example with Wallabag Web Application for Extracting Entries and Quotes

python and wallabag

In the previous post Python API Example with Wallabag Web Application we explored how to connect via Web API to Wallabag and make entry to Wallabag web application. For this we setup API, obtained token via python script and then created entry (added link).

In this post we will extract entries through Web API with python script. From entry we will extract needed information such as id of entry. Then for this id we will look how to extract annotations and quotes.

Wallabag is read it later web application like Pocket or Instapaper. Quotes are some texts that we highlight within Wallabag. Annotations are our notes that we can save together with annotations. For one entry we can have several quotes / annotations. Wallabag is open source software so you can download it and install it locally or remotely on web server.

If you did not setup API you need first setup API to run code below. See previous post how to do this.
The beginning of script should be also same as before – as we need first provide our credentials and obtain token.

Obtaining Entries

After obtaining token we move to actual downloading data. We can obtain entries using below code:

p = {'archive': 0 , 'starred': 0, 'access_token': access}
r = requests.get('{}/api/entries.txt'.format(HOST), p)

p is holding parameters that allow to limit our output.
The return data is json structure with a lot of information including entries. It does not include all entries. It divides entries in set of 30 per page and it provides link to next page. So we can extract next page link and then extract entries again.

Each entry has link, id and some other information.

Obtaining Annotations / Quotes

To extract annotations, quotes we can use this code:

p = {'access_token': access}
link = '{}/api/annotations/' + str(data['_embedded']['items'][3]['id']) + '.txt'
print (link)
r = requests.get(link.format(HOST), p)
data=json.loads(r.text)

Full Python Source Code

Below is full script example:

# Extract entries using wallabag API and Python
# Extract quotes and annotations for specific entry
# Save information to files
import requests
import json

# only these 5 variables have to be set
#HOST = 'https://wallabag.example.org'
USERNAME = 'xxxxxx'
PASSWORD = 'xxxxxx'
CLIENTID = 'xxxxxxxxxxxx'
SECRET = 'xxxxxxxxxxx'
HOST = 'https://intelligentonlinetools.com/wallabag/web'    


gettoken = {'username': USERNAME, 'password': PASSWORD, 'client_id': CLIENTID, 'client_secret': SECRET, 'grant_type': 'password'}
print (gettoken)

r = requests.post('{}/oauth/v2/token'.format(HOST), gettoken)
print (r.content)


access = r.json().get('access_token')

p = {'archive': 0 , 'starred': 0, 'access_token': access}
r = requests.get('{}/api/entries.txt'.format(HOST), p)

data=json.loads(r.text)
print (type(data))


with open('data1.json', 'w') as f:  # writing JSON object
      json.dump(data, f)


for key, value in data.items():
     print (key, value)
     
#Below how to access needed information at page level like next link
#and at entry level like id, url for specific 3rd entry (counting from 0)      
print (data['_links']['next']) 
print (data['pages'])
print (data['page']) 
print (data['_embedded']['items'][3]['id'])  
print (data['_embedded']['items'][3]['url'])  
print (data['_embedded']['items'][3]['annotations'])


p = {'access_token': access}

link = '{}/api/annotations/' + str(data['_embedded']['items'][3]['id']) + '.txt'
print (link)
r = requests.get(link.format(HOST), p)
data=json.loads(r.text)
with open('data2.json', 'w') as f:  # writing JSON object
      json.dump(data, f)

#Below how to access first and second annotation / quote
#assuming they exist 
print (data['rows'][0]['quote']) 
print (data['rows'][0]['text']) 
print (data['rows'][1]['quote'])    
print (data['rows'][1]['text'])

Conclusion

In this post we learned how to use Wallabag API to download entries, annotations and quotes. To do this we first downloaded entries and ids. Then we downloaded annotations and quotes for specific entry id. Additionally we learned some json python and json examples to get needed information from retrieved data.

Feel free to provide feedback or ask related questions.

Python API Example with Wallabag Web Application

python and wallabag

Wallabag

Many a times it happens that we need to create API to post data to some web application using python framework.
To over come this problem of sending data to application from outside of it, using API, I am going to show how you can do this for Wallabag Web Application. Wallabag is Read It Later type of application, where you can save website links, and then read later.

Thus we will look here how to write API script that can send information to Wallabag web based application.

To do this we need access to Wallabag. It is open open source project (MIT license) so you can download and install as self hosted service.

Collecting Information

Once we installed Wallabag or got access to it, we will collect information needed for authorization.
Go to Wallabag application, then API Client Management Tab and create the client.
Note client id, client secret.
See below screenshot for references.

Python API Example Script

Now we can go to python IDE and write the script as below. Here https://mysite.com is base url, wallabag is an installation folder where we installed application.


import requests


# below 5 variables have to be set

USERNAME = 'xxxxxxxx'
PASSWORD = 'xxxxxxxx'
CLIENTID = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx'
SECRET = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
HOST = 'https://mysite.com/wallabag/web'    


gettoken = {'username': USERNAME, 'password': PASSWORD, 'client_id': CLIENTID, 'client_secret': SECRET, 'grant_type': 'password'}
print (gettoken)
r = requests.post('{}/oauth/v2/token'.format(HOST), gettoken)
print (r.content)

access = r.json().get('access_token')


url = 'https://visited_site.com' # URL of the article
# should the article be already read? 0 or 1 for archive
# should the article be added as favorited? 0 or 1 for starred

article = {'url': url, 'archive': 0 , 'starred': 0, 'access_token': access}
r = requests.post('{}/api/entries.json'.format(HOST), article)


"""
output:
{'username': 'xxxxxxxxx', 'password': 'xxxxxxxxxx', 'grant_type': 'password', 'client_id': 'xxxxxxxxxxxxxxx', 'client_secret': 'xxxxxxxxxxx'}
b'{"access_token":"xxxxxxxxxxxxxx","expires_in":3600,"token_type":"bearer","scope":null,"refresh_token":"xxxxxxxxxxxxxx"}'
"""



Troubleshooting

I found useful to include print (r.content) in case something goes wrong. It can help see what is returned by sever.
Also it helped me looking at log which is located at /var/logs/prod.log under yoursite.com/wallabag. In case something is going wrong it might have some clue in the log.

Conclusion

We looked at python api example of how to integrate python script with Wallabag web application and send data using Wallabag API and python requests library.

References

wallabag api example
Wallabag

Web API to Save to Pocket App and Instapaper App

As we surf the web we find a lot of information that we might use later. We use different applications (Pocket app, Instapaper, Diigo, Evernote or other apps) to save links or notes what we find.

While many of the above applications have a lot of great features there still a lot of opportunities to automate some processes using web API that many of applications provide now.

This will allow to extend application functionality and eliminate some manual processes.

For Example: You have about 20 links that you want to send to pocket like application.
Another example: When you add link to one application you may be want also save link or note to Pocket app or to Instapaper application.
Or may be you want automatically (through script) extract links from some web sites and save them to your Pocket app.

In today post we will look at few examples that allow you start to do this. We will check how to use Pocket API and Instapaper API with python programming.

API for Pocket App

pocket API Pocket, previously known as Read It Later, is an application and service for managing a reading list of articles from the Internet. It is available on many different devices including web browsers. (Wikipedia)
There is great post[1] that is showing how to set up API for it. This post has detailed screenshots how to get all the needed identification information for successful login.

In summary you need get online consumer key for your api application then obtain token code via script. Then you can access the link that will include token and do authorization of application. After this you can use API for sending links.

Below is the summary python script to send the link to Pocket app including previous steps:

import requests

# can be any for link
redirect_link = "google.com"
consumer_key="xxxxxxxx"
# obtain consume key online
#connect to pocket API to get token code
pocket_api = requests.post('https://getpocket.com/v3/oauth/request',
         data = {'consumer_key':consumer_key,
                 'redirect_uri':redirect_link})

pocket_api.status_code       #if 200, it means all ok.

print(pocket_api.headers)               
print (pocket_api.text)

#remove 'code='
token= pocket_api.text[5:]
print (token)
url="https://getpocket.com/auth/authorize?request_token=" + token + "&redirect_uri=" + redirect_link 

import webbrowser
webbrowser.open_new(url) # opens in default browser
#click on Authorize button in webbrowser

# Once authoration done you can post as below (need uncomment code below)  
"""
pocket_add = requests.post('https://getpocket.com/v3/add',
       data= {'url': 'https://getpocket.com/developer/apps/new',
              'consumer_key':consumer_key,
              'access_token': token})
print (pocket_add.status_code)

"""

API for Instapaper

Instapaper is a bookmarking service owned by Pinterest. It allows web content to be saved so it can be “read later” on a different device, such as an e-reader, smartphone, tablet. (Wikipedia)
Below is the code example how to send link to Instapaper. The code example is based on the script that I found on Internet [2]

import urllib, sys

def error(msg):
sys.exit(msg)

def main():
api = 'https://www.instapaper.com/api/add'

params = urllib.parse.urlencode({
'username' : "actual_user_name",
'password' : "actual_password",
'url' : "https://www.actual_url",
'title' : "actual_title",
'selection' : "description"

}).encode("utf-8")

r = urllib.request.urlopen(api, params)

status = r.getcode()

if status == 201:

print('%s saved as %s' % (r.headers['Content-Location'], r.headers['X-Instapaper-Title']))
elif status == 400:
error('Status 400: Bad request or exceeded the rate limit. Probably missing a required parameter, such as url.')
elif status == 403:
error('Status 403: Invalid username or password.')
elif status == 500:
error('Status 500: The service encountered an error. Please try again later')

if __name__ == '__main__':
main()

References
1. Add Pocket API using Python – Tutorial
2. Instapaper