How to Extract Wayfair Product Using Python & Beautiful Soup?

3i Data Scraping can help you How to Extract Product Details from Wayfair using Python & Beautiful Soup, get the Wayfair product data in your required format like CVS, JSON, XML, etc.

May 6, 2021

Our achievements in the field of business digital transformation.

install BeautifulSoup

pip3 install beautifulsoup4

We would also require LXML, library’s requests, as well as soupsieve for fetching data, break that down to the XML, as well as utilize CSS selectors. Then install them with:

pip3 install requests soupsieve lxml

When you install it, open the editor as well as type in.

s# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests

Now go to the listing page of Wayfair products to inspect data we could get.

That is how it will look:

Now, coming back to our code, let’s get the data through pretending that we are the browser like that.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

Then save it as a scrapeWayfair.py.

In case, you run that.

python3 scrapeWayfair.py

You will get the entire HTML page.

Now, it’s time to utilize CSS selectors for getting the required data. To do it, let’s use Chrome as well as open an inspect tool.

We observe that all individual products data are controlled within a class ‘ProductCard-container.’ We could scrape this using CSS selector ‘.ProductCard-container’ very easily. Therefore, let’s see how the code will look like:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('.ProductCard-container'):
   try:
      print('----------------------------------------')
      print(item)
   except Exception as e:
      #raise e
      print('')

It will print the content of all the elements, which hold the product’s data.

Now, we can choose classes within these rows, which have the required data. We observe that a title is within the

                        # -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html'
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('.ProductCard-container'):
   try:
      print('----------------------------------------')
      #print(item)
      print(item.select('.ProductCard-name')[0].get_text().strip())
      print(item.select('.ProductCard-price--listPrice')[0].get_text().strip())
      print(item.select('.ProductCard-price')[0].get_text().strip())
      print(item.select('.pl-ReviewStars-reviews')[0].get_text().strip())
      print(item.select('.pl-VisuallyHidden')[2].get_text().strip())
      print(item.select('.pl-FluidImage-image')[0]['src'])
   except Exception as e:
      #raise e
      print('')

In case, you run that, it would print all the information.

And that’s it!! We have done that!

If you wish to utilize this in the production as well as wish to scale it to thousand links, you will discover that you would get the IP blocked very easily with Wayfair. With this scenario, utilizing rotating proxy services for rotating IPs is nearly a must. You may utilize the services including Proxies API for routing your calls using the pool of millions of domestic proxies.

In case, you wish to scale crawling speed as well as don’t wish to set the infrastructure, then you can utilize our Wayfair data crawler to easily scrape thousands of URLs with higher speed from the network of different crawlers. For more information, contact us!