top of page
Untitled

DATA DOUBLE CONFIRM

Web scraping using Beautifulsoup on embedded html - Process - Python


In this webscraping attempt, I want to get data on countries, sites and categories of sites in one table. One challenge I faced is to get the data for the sites to correspond/ match with the countries that are tied to them. The sites can be extracted through parsing the li header, but in order to get the sites tied to the specific countries, I will need to find a way to loop through the countries listed. That's where I noticed the sites for each country is also embedded under the div header.

div.find_allclass['class'].find_all

The entire notebook can be found below or here on Github.

If you would like more exercises, check out my previous

bottom of page