Skills to Learn to become a Data Analyst/ Data Scientist
Last updated: Apr, 2023. This is always a work in progress.
If you want to become a data analyst (and eventually a data scientist) but don’t know where to start, then this post is for you. These skills lean more on the technical side (as opposed to the soft skills, data visualization, and statistical/ mathematical side), and I consider them important to learn. If this post is helpful for you, let me know on LinkedIn!
In this post, we’ll talk about: SQL, Python Fundamentals, Pandas, Datasets for Personal Projects, APIs (with Python wrappers), Terminal Commands, Git/ GitHub, and more!
TLDR;
SQL
- Overview of SQL (4 minutes), the FREE version of SQL on CodeAcademy helped me, SQL Cheatsheet, and MySQL tutorial on Youtube (3 hours)
Terminal Commands
- For Windows, watch this tutorial (7 minutes) and for Mac watch this tutorial (13 minutes)
Python: The FUNdamentals
- Step 1: Install Python (I prefer Python 3.6.8), Step 2: Installing an IDE like Visual Studio Code or Jupyter Notebook. Step 3: Understand the basics of Python data structures, Step 4: For Loops, Functions, and If… Else will help you write cleaner code more efficiently, and a good Python tutorial series.
Python: Pandas and APIs
- For Pandas, watch CS Dojo video (22 minutes). There are amazing datatsets to use on Kaggle.com!
- For Python API tutorials, I recommend checking out sentdex’s tutorial for the Reddit API in Python or reading this Yelp API tutorial for Python
SQL
SQL is a computer language designed to manage, store, and query databases. It’s been around since the 1970s and is still extremely relevant. Technically there are many varieties of SQL (MySQL, Postgres, SQL Oracle, etc.), but they’re all so similar that you really just need to learn one. I recommend learning MySQL.
- Overview of SQL (4 minutes)
- The FREE version of SQL on CodeAcademy helped me
- SQL Cheatsheet. Probably all the commands you’ll ever need
- I’ve heard great things about this MySQL tutorial on Youtube (3 hours)
Terminal Commands
The Terminal on Windows and MacOS is such a powerful tool to use for anyone. Plus, it’s relatively easy to learn the basics. Why learn the terminal? Well, the “Terminal provides an efficient interface to access the true power of a computer better than any graphical interface.” The more you learn about technology, the more you’ll realize how fun (and important) the terminal is.
- If you’re using Windows, watch this tutorial (7 minutes)
- If you’re on Mac, watch this tutorial (13 minutes)
Python: The FUNdamentals
Python is a programming language that’s used for data analysis, data science, data visualization, web development, cloud solutions, server-side programs, scripting, etc. It’s used for a lot.
Before we get into the “cool” stuff in Python, we first must learn the fundamentals. They will be a part of everything that we code.
- Step 1: Install Python (As of writing, I use Python 3.6.8).
- Step 2: Install an IDE like Visual Studio Code. Many tutorials use Jupyter Notebook, but I personally don’t recommend it. However, if you feel more comfortable with Jupyter Notebook then feel free to use it.
- Step 3: Understand the basics of Python data structures (Specifically at least: list, dictionary, and strings).
- Step 4: For Loops, Functions, and If… Else will help you write cleaner code more efficiently.
- CS Dojo has a good Python tutorial series (He uses Jupyter Notebook).
Python: Gathering Data using Pandas and APIs with wrappers
Let’s get to some of the “cool” stuff. Before continuing, make sure you learned the skills above. We’re going to touch on two essential ways to gather and manipulate data: Pandas and API wrappers.
Pandas (and NumPy for that matter) are the libraries to learn for data cleaning and analysis. Using pandas we can also read in data from .xlsx, .csv, and more!
- For Pandas, I recommend this CS Dojo video (22 minutes). Not interested in the data that he’s working with? Kaggle.com has amazing datasets!
APIs (what is an API?) are a different animal and while they aren’t specific to Python, they are relevant in gathering data. There are plenty of fun and funner APIs to play with! Plus, most modern APIs can take the form of JSON, which is almost identical to Python dictionaries (which you learned earlier).
- For Python API tutorials, I recommend checking out sentdex’s tutorial for the Reddit API in Python or reading this Yelp API tutorial for Python
Git/ GitHub
Please leave comments, opinions, questions, and errors below.