This is Part IV of a four-part post. Part I talks about scraping data from a website (bookdepository.com, in this case) while Part II discusses data cleaning/ preparation. Part III outlines the process of presenting the data using Tableau and Part IV delves into insights from the analysis.
Discussion:
- We can do an analysis to understand if rating is correlated with ranking. It is probably easy to guess that the relationship is not strong because books that are older (i.e. published earlier) can have a lower ranking (due to new books occupying the higher ranks) but still have a good rating. While not definitive, we can tell this from the dashboard as well, as certain books of a high rank have a below-average rating.
Analysis can be done within Tableau - using the function Trend Line under Analytics tab on the left hand side.
We can also choose to overlay the average line.
As the p-value is more than 0.05, we do not reject the null hypothesis that the coefficient is 0. i.e. we are unable to conclude that there is a linear relationship between rating and ranking. Also, with the extremely low R-squared, this means that the rank value is unable to explain the variance in rating well.
- From the treemap, we can see the sparse distribution of book categories. This could be a result of high specificity in categorizing the genre. Hence it might be more meaningful if there could be a more generic level of hierarchy above the current way of categorization. Nonetheless, the most common category among the bestsellers is contemporary fiction. Children books are hot favourites as well.
- To learn more about profile of books with high % discount, we can choose to overlay the % Discount of the book against Size and color gradient of the data points.
As the visualization above still appears cluttered, through filtering on the dashboard to look at discounts above 50%, we see that these books have low rankings (=high ranking value) in general. There isn't a great difference in the rating as the average drops from 4.18 to 4.02.
- Harry Potter and the Philosopher's Stone has the most Goodreads raters, with over 4.9 million ratings given. This is way higher than the rest of the books; majority has less than 1 million ratings.
- Some other basic analysis we can try would include exploring the relationship between number of pages and list price as well as if book material affects list price/ sale price.
Alternatively, the interactive dashboard can be found here.
The dataset can be found here.