Skills to Learn to become a Data Analyst/ Data Scientist

Dan Blevins
4 min readAug 15, 2020

--

Image from: https://www.inzata.com/wp-content/uploads/2019/09/shutterstock_669838285.jpg

Last updated: Apr, 2023. This is always a work in progress.

If you want to become a data analyst (and eventually a data scientist) but don’t know where to start, then this post is for you. These skills lean more on the technical side (as opposed to the soft skills, data visualization, and statistical/ mathematical side), and I consider them important to learn. If this post is helpful for you, let me know on LinkedIn!

In this post, we’ll talk about: SQL, Python Fundamentals, Pandas, Datasets for Personal Projects, APIs (with Python wrappers), Terminal Commands, Git/ GitHub, and more!

TLDR;

SQL

Terminal Commands

Python: The FUNdamentals

Python: Pandas and APIs

SQL

SQL is a computer language designed to manage, store, and query databases. It’s been around since the 1970s and is still extremely relevant. Technically there are many varieties of SQL (MySQL, Postgres, SQL Oracle, etc.), but they’re all so similar that you really just need to learn one. I recommend learning MySQL.

Terminal Commands

The Terminal on Windows and MacOS is such a powerful tool to use for anyone. Plus, it’s relatively easy to learn the basics. Why learn the terminal? Well, the “Terminal provides an efficient interface to access the true power of a computer better than any graphical interface.” The more you learn about technology, the more you’ll realize how fun (and important) the terminal is.

Python: The FUNdamentals

Python is a programming language that’s used for data analysis, data science, data visualization, web development, cloud solutions, server-side programs, scripting, etc. It’s used for a lot.

Before we get into the “cool” stuff in Python, we first must learn the fundamentals. They will be a part of everything that we code.

Python: Gathering Data using Pandas and APIs with wrappers

Let’s get to some of the “cool” stuff. Before continuing, make sure you learned the skills above. We’re going to touch on two essential ways to gather and manipulate data: Pandas and API wrappers.

Pandas (and NumPy for that matter) are the libraries to learn for data cleaning and analysis. Using pandas we can also read in data from .xlsx, .csv, and more!

APIs (what is an API?) are a different animal and while they aren’t specific to Python, they are relevant in gathering data. There are plenty of fun and funner APIs to play with! Plus, most modern APIs can take the form of JSON, which is almost identical to Python dictionaries (which you learned earlier).

Git/ GitHub

Please leave comments, opinions, questions, and errors below.

--

--

No responses yet