Process and product of various data science tasks— from data collection, data preparation, data visualization, to basic statistical analysis and modelling. Datasets for practice available.

Selected as Top 100 Data Science Resources for 2018


January 1, 2020

Tweets from LTATrafficNews were scrapped using twitterscraper. There is a total of approx. 800 tweets that is the limit for the number of tweets shown on the page. Note that the data collected is between 2019-12-12 10:24:15 GMT and 2019-12-29 03:07:00 GMT.


November 10, 2019

Was surprised to see my post trending. Since it caught on, I would cross-share it over here.

While the post was meant to answer a frequent question I got on whether data scientists will be automated away, it was actually more intended to be an outlet for me as there was...

November 2, 2019

In this webscraping attempt, I want to get data on countries, sites and categories of sites in one table. One challenge I faced is to get the data for the sites to correspond/ match with the countries that are tied to them. The sites can be extracted through parsing th...

August 11, 2019

As with my other posts, I am using the title "data scientist" loosely because titles are not consistently used across the industry so to me, it is a broad umbrella term that covers any type of work that requires one to perform a lot of data analysis or modelling.


June 23, 2019

I got to chance upon the emoji python package that allows printing emojis in Python and decided to collect some data relating to emojis listed on the emoji cheat sheet. There were associated terms/ descriptions (i.e. alternative names) for each emoji and they were scra...

May 7, 2019

A typical question faced is how much data is considered enough. The answer is it depends. First and foremost, we need to know what comprises the total population. If the population is small, and there are enough resources to obtain whatever information you want on the...

April 15, 2019

Today I decided to poke around a little to see if it would be possible to read csv files directly from Github, and the answer is yes. As I have published numerous csv datasets on Github, I thought it would be easier for people to access them without downloading the dat...

December 15, 2018

This is a tutorial to get the frequency distribution of words used in a chunk of text and is a simpler alternative to a more elaborate text mining post that involves auto-removal of stopwords e.g. "the", "a", "and", etc.

The script basically breaks the chunk of tex...

October 9, 2018

In addition to BeautifulSoup, selenium is a very useful package for webscraping when it involves repeated user interaction with the website (eg. to click to select options from certain dropdown list and submit) to generate a desired output/ result of interest. Selenium...

September 1, 2018

This is Part I of a two-part post. Part I outlines the process of presenting the data using Tableau and Part II delves into insights from the analysis.  

This dashboard was done up for a #VizforSocialGood project. It consists of two sections: (i) violent incidents...

Please reload

Recent Posts

Please reload


Please reload