Process and product of various data science tasks— from data collection, data preparation, data visualization, to basic statistical analysis and modelling. Datasets for practice available.

November 10, 2019

Was surprised to see my post trending. Since it caught on, I would cross-share it over here.

While the post was meant to answer a frequent question I got on whether data scientists will be automated away, it was actually more intended to be an outlet for me as there was...

December 15, 2018

This is a tutorial to get the frequency distribution of words used in a chunk of text and is a simpler alternative to a more elaborate text mining post that involves auto-removal of stopwords e.g. "the", "a", "and", etc.

The script basically breaks the chunk of tex...

July 3, 2018

It's always exciting when the data visualization or analysis you did is used to push forward a movement or a cause. Most would agree this is probably one of the greatest satisfaction we derive as data scientists (again, I'm using this title loosely). So, I got to...

June 26, 2018

This post is a replicate of the previous post on R but using Python this time round. However, note that there is a difference in data randomly generated by R and Python. For most of this exercise, we use prepared hypothetical datasets.

Data cleaning is one of...

June 16, 2018

Data cleaning is one of the most important tasks in data science but it is unglamorous, underappreciated and under-discussed. These are some common tasks involved in data cleaning but not limited to: 

  • Merging/ appending

  • Checking completeness of data​​

  • ...

January 30, 2018

This is Part II of a four-part post. Part I talks about scraping data from a website (, in this case) while Part II discusses data cleaning/ preparation. Part III outlines the process of presenting the data using Tableau and Part IV delves into insigh...

