DATA DIVE DAYS

Process and product of various data science tasks— from data collection, data preparation, data visualization, to basic statistical analysis and modelling. Datasets for practice available.

Selected as Top 100 Data Science Resources for 2018/2019

on MastersInDataScience.com

April 6, 2020

Tried out textgenrnn, to create a text-generating neural network, using text from Singapore's Budget 2020 - Resilience Budget/ Supplementary Budget Statement as the training set. There's over 10,000 words in the statement/ text file but there are some Chinese text with...

March 26, 2020

Looking at the number of updates published on the Ministry of Health's website, we might be able to get a sense of the severity of the coronavirus situation in Singapore and also the amount of efforts/ changes in measures introduced by the government across time.

Data i...

February 16, 2020

Here I make use of the package fbprophet to forecast the number of COVID-19 cases in Singapore using Python. Quoting from the official site on Prophet: Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are f...

February 16, 2020

Part I covers the use of Monte Carlo simulations to estimate the likelihood of the number of cases for COVID-19/ 2019-nCov for the next five days. 

[From Wiki] Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on re...

February 1, 2020

SingStat (Singapore Department of Statistics) has made quite some data publicly available but not in the most analyst-friendly format. For those who are looking for more Singapore-related data to analyze, we can make use of the API function to call these data tables in...

January 1, 2020

Tweets from LTATrafficNews were scrapped using twitterscraper. There is a total of approx. 800 tweets that is the limit for the number of tweets shown on the page. Note that the data collected is between 2019-12-12 10:24:15 GMT and 2019-12-29 03:07:00 GMT.

In...

November 10, 2019

Was surprised to see my post trending. Since it caught on, I would cross-share it over here.

While the post was meant to answer a frequent question I got on whether data scientists will be automated away, it was actually more intended to be an outlet for me as there was...

November 2, 2019

In this webscraping attempt, I want to get data on countries, sites and categories of sites in one table. One challenge I faced is to get the data for the sites to correspond/ match with the countries that are tied to them. The sites can be extracted through parsing th...

August 11, 2019

As with my other posts, I am using the title "data scientist" loosely because titles are not consistently used across the industry so to me, it is a broad umbrella term that covers any type of work that requires one to perform a lot of data analysis or modelling.

There...

June 23, 2019

I got to chance upon the emoji python package that allows printing emojis in Python and decided to collect some data relating to emojis listed on the emoji cheat sheet. There were associated terms/ descriptions (i.e. alternative names) for each emoji and they were scra...

May 7, 2019

A typical question faced is how much data is considered enough. The answer is it depends. First and foremost, we need to know what comprises the total population. If the population is small, and there are enough resources to obtain whatever information you want on the...

April 15, 2019

Today I decided to poke around a little to see if it would be possible to read csv files directly from Github, and the answer is yes. As I have published numerous csv datasets on Github, I thought it would be easier for people to access them without downloading the dat...

December 15, 2018

This is a tutorial to get the frequency distribution of words used in a chunk of text and is a simpler alternative to a more elaborate text mining post that involves auto-removal of stopwords e.g. "the", "a", "and", etc.

The script basically breaks the chunk of tex...

Please reload

Recent Posts

Please reload

Archive