Over the years, the number of tools (or software) I have to install/ use increased steadily given the different types of tasks I have to perform. Some tools serve a similar purpose but I ended up with another tool because of the school/ work environment setup. Not recommending any particular order for the tools in which one should pick up as it all depends on what one needs, but here's mine and I feel pretty comfortable this way (giving my take on the level of difficulty as well):
1. R
Started using since: 2008
Level of difficulty: Low
Used for: Statistical analysis/ data mining
Many functions and algorithms are built into, what we call, packages. Install the relevant packages necessary and the analysis can get going. R manuals describing the packages with examples are available online.
Link to installation: https://www.r-project.org/
1.1. R Studio
This is the GUI I used for R.
Link to installation: https://www.rstudio.com/
2. Tableau
Started using since: 2015
Level of difficulty: Low
Used for: Data visualization/ data exploration
Drag-and-drop feature makes the barrier to entry (or knowledge inertia) a lot lower. Main challenge for me stems from data preparation in the appropriate format to achieve the desired visualization.
Link to installation: https://www.tableau.com/academic/students
3. MySQL
Started using since: 2015
Level of difficulty: Medium
Used for: Database creation and management
Main challenge for me is developing nested queries or subqueries (basically queries within queries). As a quick win, I often end up creating more views (i.e. data subsets) which, of course, mean a less efficient code. Also, installation is painful. According to the tutorial below, "Installation could be the hardest part in this exercise."
Link to installation (instructions):
3.1. MySQL Workbench
This is the GUI I used for MySQL.
Link to installation: https://dev.mysql.com/downloads/workbench/5.2.html
4. Python
Started using since: 2016
Level of difficulty: High
Used for: Statistical analysis/ machine learning/ web-scraping
Main challenge for me is the language structure itself, where it can have so many ".", "[ ]", and "( )" in a single line of code. Performing a simple data transformation in Python is not as simple as it seems. Here is an example of what I mean. Also installing packages (or modules) is not as simple as R. You have to use the command line to do it (instructions).
Link to installation: https://www.anaconda.com/download/
(Spyder which will be included in the download is the Python development environment that we will use. Hmm Anaconda, Spyder and Python. Same same but different.)
5. PostgreSQL
Started using since: 2017
Level of difficulty: Low
Used for: Database creation and management
This is similar to MySQL as they are both database creation and query languages with a slight twist, so the prior experience with MySQL helps.
Link to installation: https://www.postgresql.org/
5.1. DBeaver
This is the GUI I used for PostgreSQL.
Link to installation: https://dbeaver.jkiss.org/
6. Git
Started using since: 2017
Level of difficulty: High
Used for: Version control/ Collaborative work
It's not easy to me because I do not have experience with the command line. I'm still new to this and mainly survived on commands like git pull/ git add/ git commit/ git push.
Link to installation: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
6.1 Github
Create a Github account at https://github.com/.
You can create a repository to upload your work or develop on the work of others in other repositories. For example, this is my Github repository containing some datasets I've curated. If you are interested about the technicalities regarding the difference between Git and Github, read this article.
7. Jupyter Notebook
Started using since: 2017
Level of difficulty: Low
Used for: Collaborative work/ code and output documentation
This is a web-based application that allows you to run and save the output of your code. There are many languages it support but I've only used it for Python so far. Also, as you can see Python is a pre-requisite for installing, so things get easier when you already have it installed.
Link to installation: http://jupyter.org/
8. PuTTy
Started using since: 2017
Level of difficulty: Medium
Used for: Remote access to server computers
This is not a data science tool but is necessary if we have to access work on remote servers. The command line is involved.
Link to installation: https://www.ssh.com/ssh/putty/download
8.1 MobaXterm
Decided to get this interface because uploading and downloading files were much easier with it.
Link to installation: https://mobaxterm.mobatek.net/
PS: Only listing open-source tools as everybody can have access to them. Tableau is an exception being it's free for students. Also, Excel is a pretty resourceful tool for data analysis as well, but in today's environment, knowing how to use Excel is expected of everyone. GUI (Graphic User Interface) is something good to have. It makes starting out less daunting and provides a more comfortable user experience.
PPS: There is a constant debate over R and Python. This is one good read I found and relate to.
PPPS: Yes, as you can see from 8 and 8.1, I'm a Windows user (in case it makes any difference).
Feel free to share your order of learning as well!