Untitled

DATA DOUBLE CONFIRM

Toughest part in a data science project


Was surprised to see my post trending. Since it caught on, I would cross-share it over here.

While the post was meant to answer a frequent question I got on whether data scientists will be automated away, it was actually more intended to be an outlet for me as there was a really dirty dataset I was working with. One positive thing is that I got to improve my Python coding in the process. It was so unclean and inconsistent mainly because it was manually pulled together by many parties. The starting point is usually painful for every one involved (data collection/ data cleaning) when there wasn't any business need at the start to collect that data but yet that data was asked for at a later stage to perform some analysis to understand the current landscape. So a takeaway will be to always try to automate data collection if possible, which is why we see many apps out there because all these apps track data at the back, every tap and every transaction done.