top of page
Untitled

DATA DOUBLE CONFIRM

Primary Schools SG - Process - Tableau

This is Part I of a two-part blogpost. Part I talks about the process of developing the visualization below and Part II discusses the insights gleaned from the product.

Capabilities of tool Tableau:

- Postal code in Singapore are recognised by Tableau and latitude/ longitude can be obtained accordingly. This is used during data preparation.

- We can filter on quantitative variable through adjusting the range. This filter allows records with null values to show up. We can control whether certain values show up or not via Customize > Show Null Controls and then select from the dropdown accordingly.

- Sorting by a particular quantitative variable made easy with a click on the icon by the side of the axes.

- Range corresponding to the shape sizes can be adjusted.

- Identify points within a radius of certain distance using Radial Selection.

- [Edit] We can also add in an average line to find out the average number of places taken up till Phase 2B, and identify schools with placement below and above the average. Hovering over the line will reflect the average.

- Clicking on the school listed in the bar chart highlights the school on the map (and vice versa). To enable highlight action across worksheets on the dashboard, click on Dashboard tab at the top > Actions....

- Legend (or text or image) can be fixed at a particular position or floating (like the legend on 'Places taken up (by size)').

- Easily add a picture and hyperlink to a particular website on the dashboard if we want to.

PS: Capabilities covered in previous posts would not be repeated. eg. Multiple worksheets (i.e. visualizations) can be combined onto one dashboard. So do check out other posts as well :)

Challenges:

- Data used for the visualization not available in analysis-ready format. Data relating to geographical location of primary schools (i.e. zone and postal code) was scraped from the Ministry of Education website, while data on school characteristics were taken off from Wikipedia. Some data inconsistencies on the schools listed across both sites were observed and school status were verified through online search. Data on school placement was taken off a public site called Salary.sg. Data relating to geographical location of train stations were in the form of X and Y coordinates was publicly available but had to be converted to latitude and longitude format to be read by Tableau. This conversion was done via an API and scraped. All webscraping was done using scrapy in Python. Eventually the data from the various sources are put together in one spreadsheet as follows.

- Selecting the kinds of shape to represent the entities (school/ train) is not a trivial question as some shapes do not show up well against each other.

- As the sizes of the points are indicated by the number of Places taken up till Phase 2B, this poses an issue for train stations that have null values for this variable. As a workaround, I look for a value that's reasonably large as I do not want the size of the train stations to overpower that of the schools. We create another variable called 'Places taken up'.

- There is currently still no easy way of computing distances between two points within Tableau where the data points has to be joined onto one another using custom SQL first. Instructions here. I ran into some error implementing that but decided to work with the Radial Selection capability that still gives me an insight into the distance between points.

Alternatively, click here to view/ download the interactive Tableau dashboard.

Dataset to build the dashboard available here.

bottom of page