How to Scrape Rentals Websites Using BeautifulSoup and Python?

Browsing manually using listings on the rental websites can be very time-consuming. So, the better option is scrape rental websites using web scraping Python. This blog shows How to Scrape Rentals Websites Using BeautifulSoup and Python.

Our achievements in the field of business digital transformation.

Arrow

Web scraping using BeautifulSoup and data wrangling using Pandas to discuss generated insights.

Would renting a condo or apartment in Etobicoke, North York, or Mississauga be considerably cheaper than having one in downtown Toronto?

  • How do suburbs rents compare to the Toronto city’s rents?
  • How much can you potentially save if you have rented a basement unit?
  • Which suburbs have the lowest rent rates?

Browsing manually using listings on the rental websites can be very time-consuming. So, the better option is scrape rental websites using web scraping Python as well as analyze that to get answers to all your questions.

Scraping Rental Website Data through Web scraping using BeautifulSoup and Python

Scraping-Rental-Website-Data-through-Web-scraping-using-BeautifulSoup-and-Python

We have decided to extract data from TorontoRentals.com with Python and BeautifulSoup. This website has lists for Toronto as well as many suburbs like Brampton, Scarborough, Mississauga, and Vaughan etc. This has various kinds of listings like apartment, house, condo, as well as basement.

Initially, we have imported the necessary Python libraries.

# Import Python Libraries
# For HTML parsing
from bs4 import BeautifulSoup 
# For website connections
import requests 
# To prevent overwhelming the server between connections
from time import sleep 

# Display the progress bar
from tqdm import tqdm
# For data wrangling
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# For creating plots
import matplotlib.pyplot as plt
import plotly.graph_objects as go

Next, we have written the function named get_page to have soup objects for every page (iteration). Functions accept 4 user inputs — type, city, beds, and page. Function consists of logic for checking HTTP response status codes for finding if HTTP requests have been completed successfully. A get_page function is named from the key function named page_num.

def get_page(city, type, beds, page):
  
  url = f'https://www.torontorentals.com/{city}/{type}?beds={beds}%20&p={page}'
  # https://www.torontorentals.com/toronto/condos?beds=1%20&p=2
  
  result = requests.get(url)
  
  # check HTTP response status codes to find if HTTP request has been successfully completed
  if result.status_code >= 100  and result.status_code <= 199:
      print('Informational response')
  if result.status_code >= 200  and result.status_code <= 299:
      print('Successful response')
      soup = BeautifulSoup(result.content, "lxml")
  if result.status_code >= 300  and result.status_code <= 399:
      print('Redirect')
  if result.status_code >= 400  and result.status_code <= 499:
      print('Client error')
  if result.status_code >= 500  and result.status_code <= 599:
      print('Server error')
      
  return soup

Our plan is scraping the given information from every listing — City, Zip, Street, Rent, Dimensions, Bed, and Bath. We have assigned an empty listing for every variable having scraped. Seven empty listings are created.

The complete scripting grabs the City, Zip, Street, Rent, Bath, Dimensions, and Bed for every listing with a nested FOR LOOP logic as well as consistent HTML tags.

for page_num in tqdm(range(1, 250)):
    sleep(2)
    
    # get soup object of the page
    soup_page = get_page('toronto', 'condos', '1', page_num)

    # grab listing street
    for tag in soup_page.find_all('div', class_='listing-brief'):
        for tag2 in tag.find_all('span', class_='replace street'):
            # to check if data point is missing
            if not tag2.get_text(strip=True):
                listingStreet.append("empty")
            else:
                listingStreet.append(tag2.get_text(strip=True))

After scripts complete execution, observe the length of all seven listings to ensure all have similar lengths. After that, make a pandas DF using the listing. Save a DF to the csv file.

# create the dataframe
df_Toronto_Condo = pd.DataFrame({'city_main':'Toronto', 'listing_type': 'Condo', 'street': listingStreet, 'city': listingCity, 'zip': listingZip, 'rent': listingRent, 'bed': listingBed,'bath': listingBath, 'dimensions': listingDim})

# saving the dataframe to csv file
df_Toronto_Condo.to_csv('df_Toronto_Condo.csv')

WIth page_num functions as well as changing different parameters of get_page function, we have collected data for various housing kinds — apartment, house, condo, as well as basement for Toronto as well as suburban cities. We have created the pandas DF for every housing kind as well as saved that to the CSV file.

Data Preparation & Cleaning with Pandas

Data Preparation Cleaning with Pandas

The main part of different data science projects include data colection, cleaning, and preparation. So, we have united the DFs proudced from web scraping for getting one key DF having all listings. After that, start data wrangling.

⭢ Search for different missing values in DF

Secret HTML wrappers inhabit like empty listings.

Missing data in a few listings for bath, bed, or dimensions.

⭢ Cope with the missing data

Secret HTML wrappers, which populate like empty listings got dropped from a DF.

If details on bath, bed, or dimensions for the list got missing, after that this was set at zero.

⭢ Get data issues as well as fix them

For different listings, rent get specified as the range. For all the listings, even bath, bed, dimensions are identified as ranges. For e.g., one listing, rent is among $1795–2500, bed differs between 1 to 3, bath ranges between 1 to 2, dimensions differ between 622 to 955 ft2. With this ‘range’ listings look like 1 listing on a website, it looks as if they are promoting for different units in the listing — however, we don’t understand how many units might be accessible within every listing or individual specifications about rent, baths, bed, and dimensions of separate units. Making speculations or getting averages doesn’t seem right in this situation. As such, these rows got dropped from this analysis.

Find and examine larger outliers in the DF.

⭢ Complete data transformations

Clean City features through deleting ‘, ON’ from entries.

Clean particular characters including $, -, as well as,

Data kind conversions: to do data analysis, Rents & Dimensions got converted into numeric data types.

After completing web preparation and cleaning, we get a clear dataset, which we can analyze more to draw helpful insights.

Insights Produced from Data

Here is a count of total number of listings through Type and City.

Images1

Sample dataframe indicating the information and listings collected on various features.

Images2

Insights from different plots produced with Plotly and Matplotlib

Images3

Condos are having maximum number of lists on a website, given by Apartments.

Renters looking for other kinds of accommodation including Basements or Houses should perhaps search other rental websites.

Toronto is having the maximum listings on a website given by Etobicoke and North York.

Suburb cities are having fewer listings associated to Toronto.

Images4

For the majority of cities, Condos have the maximum listing types. The next one is Apartments.

For Scarborough, this appears as there are equivalent number of listings given for Apartments, Condos, and Houses.

Brampton looks to have maximum Basements and Houses listed.

Images5

The majority of listings either get one or two bedrooms. This trend gets observed across various cities.

Listings having one-bath are most usual.

Toronto looks to have good % of lists having two-baths.

Images6

Mean gets affected by the outliers in a dataset. Median is the better statistic because it is robust to the outliers.

For Rent: Important difference between the Median and Mean for Etobicoke, Toronto, Mississauga, and North York.

For Dimensions: Important difference between Median and Mean for Richmond Hill, Vaughan, Markham, and Toronto.

Additional investigation in data to know why Vaughan has enormously higher values for %difference between Median and Mean: Two lists for Vaughan have the dimensions of 800,899 SQFT!! Renting of ~$2300 for 2B-2B condo. So, it looks like these listings have typos for dimensions.

Images7

Toronto, Mississauga, as well as Etobicoke have the maximum Median Rents among all cities.

This was expected that the costing of rent in Toronto might be considerably higher compared to other cities. However, to do data analysis, this looks as if Toronto, Etobicoke, and Mississauga have related median Rents.

Scarborough and Suburbs Brampton have the lowermost median Rents.

Using Median Rents for Mississauga and Toronto are alike, the median dimensions about lists in Mississauga is ~100 SQFT larger than those within Toronto.

Lists in Brampton are having lowest Rents however, largest dimensions compared with other cities.

Investigation of Relationship Between Dimensions and Rent

Images8
Images

Scatter plot A: with original DF

No relationships between Dimensions and Rent.

A few larger outliers of Dimensions and Rent skew a plot.

Scatter plot B: with DF after reducing rows having outlier Dimensions

Weaker positive correlation between Dimensions and Rent.

Larger Rent outliers skew a plot.

Scatter plot C: With DF after dipping rows having outlier Rent

A bit more positive association between Dimensions and Rent.

Note: Missing data of dimensions got replaced using zeros. Therefore, in the given three plots, we have observed many listings that look to get zero dimensions.

For more information about scraping rental website using Python and BeautifulSoup, contact 3i Data Scraping or ask for a free quote!

What Will We Do Next?

  • Our representative will contact you within 24 hours.

  • We will collect all the necessary requirements from you.

  • The team of analysts and developers will prepare estimation.

  • We keep confidentiality with all our clients by signing NDA.

Tell us about Your Project




    Please prove you are human by selecting the tree.