Process and product of various data science tasks— from data collection, data preparation, data visualization, to basic statistical analysis and modelling. Datasets for practice available.

November 2, 2019

In this webscraping attempt, I want to get data on countries, sites and categories of sites in one table. One challenge I faced is to get the data for the sites to correspond/ match with the countries that are tied to them. The sites can be extracted through parsing th...

October 30, 2019

While I mainly host my datasets on my Github repository, I have also cross-shared some datasets on as the platform is integrated with quite a couple of other tools. And also, is more user-friendly for users who might not want to dabble into Github...

September 8, 2019

I will be conducting a workshop on "Webscraping using Selenium, Beautifulsoup and APIs" at PyCon Singapore on 12 Oct! If interested, get your tickets here. On a side note, I'm not paid to conduct the tutorial; I'm volunteering my time to help grow the community :) The...

June 23, 2019

I got to chance upon the emoji python package that allows printing emojis in Python and decided to collect some data relating to emojis listed on the emoji cheat sheet. There were associated terms/ descriptions (i.e. alternative names) for each emoji and they were scra...

April 24, 2019

There are various data items, such as channel name, title of video, and number of views, likes, dislikes, and comments, that can be retrieved from using YouTube Data API v3. This is a free service however there are limitations on the number of requests we can make. Sho...

April 15, 2019

Today I decided to poke around a little to see if it would be possible to read csv files directly from Github, and the answer is yes. As I have published numerous csv datasets on Github, I thought it would be easier for people to access them without downloading the dat...

October 9, 2018

In addition to BeautifulSoup, selenium is a very useful package for webscraping when it involves repeated user interaction with the website (eg. to click to select options from certain dropdown list and submit) to generate a desired output/ result of interest. Selenium...

June 26, 2018

This post is a replicate of the previous post on R but using Python this time round. However, note that there is a difference in data randomly generated by R and Python. For most of this exercise, we use prepared hypothetical datasets.

Data cleaning is one of...

June 17, 2018

This post is a replicate of the previous post on R but using Python this time round. 

Sometimes you want to get started on analyzing data with the main objective of practising the basics of a certain language. So the focus is not so much on the analysis itself but getti...

May 11, 2018

This is Part I of a two-part post. Part I talks about scraping data from SGDI while Part II outlines the process of presenting the data using Tableau.  

The code builds on the one covered in a previous post on how to use Beautifulsoup in Py...

December 3, 2019

