BeautifulSoup4
BeautifulSoup4 is a library that parses HTML and XML files and provides an easy interface for extracting information out of them. Can be very useful when combined with the Requests library for scraping data off of webpages.
Examples
Get Title of Webpage with requests
After installing the requests library, paste the following code into main.py
:
from bs4 import BeautifulSoup
import requests
url = "https://www.wikipedia.org/"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
print(soup.title)
Results:
<title>Wikipedia</title>
Reference
- BeautifulSoup4 at crummy.com