Tutorial

How to Scrape Data from a Website using Python

Robson Kanhalelo
#python
Web Scraping Illustration
Source: Edureka

Web scraping, web harvesting, or web data extraction is the process of extracting data from websites. In this guide, we’ll implement a custom scraper using Python.

Getting Started

Create a new file named scraper.py. We will use the Requests library to fetch the HTML and BeautifulSoup to parse it.

import requests
from bs4 import BeautifulSoup
import csv

1. Fetching the Data

We’ll target a demo site. First, we send a GET request to the URL:

url = "https://windhoeknamibia.github.io"
response = requests.get(url)
# Parse the content
soup = BeautifulSoup(response.text, 'html.parser')

2. Extracting Information

Get the Website Title:

print(soup.title.string)

Find specific elements (e.g., Place Names):

places = soup.find_all('h3', class_='place-name')
for place in places:
    print(place.text)

3. Saving to CSV

Extracting data is great, but saving it is better. Here is how you write those names to a CSV file:

with open('places.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(["Place Name"])
    for place in places:
        writer.writerow([place.text])

4. Scraping Images

To get all image source links from the page:

images = soup.find_all('img')
for img in images:
    print(img.get('src'))

Ethical Scraping Note

Always check a website's /robots.txt file before scraping. Be respectful of their servers and terms of service.

🎯 Challenge: To Do!

Try modifying the CSV script to save both the Place Name and its corresponding Image URL in two separate columns.

Happy coding!