Learning Resources
urllib and json modules
How much fun would it be if we could write our own program that will get search results from the web? Let us explore that now.
This can be achieved using a few modules. First is the urllib module that we can use to fetch any webpage from the internet. We will make use of Yahoo! Search to get the search results and luckily they can give us the results in a format called JSON which is easy for us to parse because of the built-in json module in the standard library.
- TODO
- This program doesn't work yet which seems to be a bug in Python 3.0 beta 2.
#!/usr/bin/python # Filename: yahoo_search.py import sys if sys.version_info[0] != 3: sys.exit('This program needs Python 3.0') import json import urllib, urllib.parse, urllib.request, urllib.response # Get your own APP ID at https://developer.yahoo.com/wsregapp/ YAHOO_APP_ID = 'jl22psvV34HELWhdfUJbfDQzlJ2B57KFS_qs4I8D0Wz5U5_yCI1Awv8.lBSfPhwr' SEARCH_BASE = 'https://search.yahooapis.com/WebSearchService/V1/webSearch' class YahooSearchError(Exception): pass # Taken from https://developer.yahoo.com/python/python-json.html def search(query, results=20, start=1, **kwargs): kwargs.update({ 'appid': YAHOO_APP_ID, 'query': query, 'results': results, 'start': start, 'output': 'json' }) url = SEARCH_BASE + '?' + urllib.parse.urlencode(kwargs) result = json.load(urllib.request.urlopen(url)) if 'Error' in result: raise YahooSearchError(result['Error']) return result['ResultSet'] query = input('What do you want to search for? ') for result in search(query)['Result']: print("{0} : {1}".format(result['Title'], result['Url']))
Output:
- TODO
How It Works:
We can get the search results from a particular website by giving the text we are searching for in a particular format. We have to specify many options which we combine using key1=value1&key2=value2 format which is handled by the urllib.parse.urlencode() function.
So for example, open this link in your web browser and you will see 20 results, starting from the first result, for the words "byte of python", and we are asking for the output in JSON format.
We make a connection to this URL using the urllib.request.urlopen() function and pass that file handle to json.load() which will read the content and simultaneously convert it to a Python object. We then loop through these results and display it to the end-user.