urllib — URL handling
This module has several sub-modules that works with URLs, and you can check out the sub-modules in Python's standard documentation.
We will show some examples for the most commonly used submodules urllib.request
and urllib.parse
.
urllib.request
A common usage of this module is fetching data from the internet.
Example 1) Get a webpage's source code
from urllib.request import urlopen
source_code = urlopen("http://sabrinaw.oyosite.com").read().decode("utf-8")
print(source_code)
This will print the source code as a string from the website "sabrinaw.oyosite.com".
Note: If you are using the code above to get the source code from a website, it is not uncommon that the function will return a "forbidden" error instead of the actual source code. The reason is that many websites have restrictions on who can fetch their code. For example, the website may check if you are an actual browser or a spider robot, and it will return an empty string or a "forbidden" error if they think you are a robot.
Example 2) Get CSV data and parse it with the CSV lib
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. Read more on Wikipedia >>.
Suppose we have a CSV file for a Code Conquest Hackathon:
Name, Nickname, Age, Kingdom, Soldier Type, Rank
Reuben,The Hammer,59,Prubadour,Lancer,Sergeant
Everitt,The Whisper,27,Prubadour,Pikeman,Major
Reinhardt,The Mammoth,56,Prubadour,Healer,Private
Mel,The Hawk,53,Truisian,Pikeman,Sergeant
We can upload this file to our OYO website, and then read from the url. For example:
import urllib.request, csv
page = urllib.request.urlopen("http://sabrinaw.oyosite.com/testing/data3.html")
reader = csv.reader(
page.read().decode('utf-8').splitlines(),
skipinitialspace=True
)
for row in reader:
# row now is a python list
print(row)
Example 3) Calling API
Most APIs provide JSON format data. Here we will use an IP-to-Location API to get the location information from an IP address.
from urllib.request import urlopen
import json
ip = input("Input the IP you want to query: ")
url = f"http://ip-api.com/json/{ip}"
location = json.loads(urlopen(url).read())
print("Approx. Location:", location["city"], location["region"])
Run it (suppose we want to query the IP 67.84.146.84):
Input the IP you want to query: 67.84.146.84
Approx. Location: Ronkonkoma NY
Use OYOclass Proxy
Some schools may have firewalls that block domains. For example, if your school blocks access to the "ip-api.com" domain, you won't get data returned like the above. In these cases, you can use our proxy by simply prepending https://proxy.oyoclass.com/
to any URL that you want to fetch data from.
from urllib.request import urlopen
import json
ip = input("Input the IP you want to query: ")
url = f"https://proxy.oyoclass.com/http://ip-api.com/json/{ip}"
location = json.loads(urlopen(url).read())
print("Approx. Location:", location["city"], location["region"])
Here is another example. This is an API you can use to get the bitcoin price in USD: https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd
.
Suppose your school blocks the "coingecho.com" domain, you can just prepend https://proxy.oyoclass.com/
to it:
from urllib.request import urlopen
import json
url = "https://proxy.oyoclass.com/https://api.coingecko.com/api/v3/simple/price?ids=bitcoin&vs_currencies=usd"
data = json.loads(urlopen(url).read())
print(data)
Run it:
{'bitcoin': {'usd': 65274}}
urllib.parse
This submodule can help parse a URL string and split it into different components. Read more >>.
Example
from urllib.parse import urlparse
url = "https://oyoclass.com/settings/account?name=python#anchor"
parsed = urlparse(url)
print(parsed)
print(parsed.scheme)
print(parsed.netloc)
print(parsed.path)
print(parsed.query)
print(parsed.fragment)
Run it:
ParseResult(scheme='https', netloc='oyoclass.com', path='/settings/account', params='', query='name=python', fragment='anchor')
https
oyoclass.com
/settings/account
name=python
anchor