DATA DIVE DAYS

Process and product of various data science tasks— from data collection, data preparation, data visualization, to basic statistical analysis and modelling. Datasets for practice available.

Selected as Top 100 Data Science Resources for 2018/2019

on MastersInDataScience.com

July 31, 2020

The following information for over 5000 job opening listed on a government portal for virtual career fairs was scraped: title, company, date opening posted, job level, contract type, location, salary, job description, requirements, closing date for application, and url...

July 8, 2020

With current deep learning algorithms, we can create a new (averaged) face based on photos of multiple faces and this can be done easily. I decided to try this out on the candidates of the various parties running for the General Election in Singapore.   

...

June 13, 2020

For the latest call for proposals to include other timezones from Asia Pacific and Americas, I decided to submit a talk and am glad that my submission 'Top 15 Python Tips for Data Cleaning/ Understanding' was accepted for online EuroPython 2020. More details about my t...

May 16, 2020

It took me really long to figure out how to plot the charts out using matplotlib and seaborn. If you need to use major/ minor ticks, markers and data labels for the last point, check out this jupyter notebook:

https://github.com/hxchua/datadoubleconfirm/blob/master...

May 10, 2020

JSON files/ formats have a dictionary data structure and so it might make them less straightforward to use compared to list/ dataframes. This post gives a high-level overview of how a typical json data structure looks like and how we can retrieve data we want that is s...

May 3, 2020

It was a joy to support PyData Salamanca! Many thanks to my friend Víctor Vicente Palacios for hosting. We conducted the live stream yesterday and are thankful to have many tuning in live. Here's the link to the video for all interested who would like to catc...

April 6, 2020

Tried out textgenrnn, to create a text-generating neural network, using text from Singapore's Budget 2020 - Resilience Budget/ Supplementary Budget Statement as the training set. There's over 10,000 words in the statement/ text file but there are some Chinese text with...

March 26, 2020

Looking at the number of updates published on the Ministry of Health's website, we might be able to get a sense of the severity of the coronavirus situation in Singapore and also the amount of efforts/ changes in measures introduced by the government across time.

Data i...

February 16, 2020

Here I make use of the package fbprophet to forecast the number of COVID-19 cases in Singapore using Python. Quoting from the official site on Prophet: Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are f...

February 16, 2020

Part I covers the use of Monte Carlo simulations to estimate the likelihood of the number of cases for COVID-19/ 2019-nCov for the next five days. 

[From Wiki] Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on re...

February 1, 2020

SingStat (Singapore Department of Statistics) has made quite some data publicly available but not in the most analyst-friendly format. For those who are looking for more Singapore-related data to analyze, we can make use of the API function to call these data tables in...

January 1, 2020

Tweets from LTATrafficNews were scrapped using twitterscraper. There is a total of approx. 800 tweets that is the limit for the number of tweets shown on the page. Note that the data collected is between 2019-12-12 10:24:15 GMT and 2019-12-29 03:07:00 GMT.

In...

Please reload

Recent Posts

Please reload

Archive