Step-by-Step: Reading and Querying Data in SQLite using Python

6 min readSep 4, 2021

Regardless of whether you’re an analyst, scientist, engineer, or developer, knowing how to query from SQL databases and tables is a valuable skill. By pairing SQL with Python, we get the extra benefit to automate tasks or go to data visualization rather quickly.

In this article we’ll focus on reading data from SQLite using Python. This should help you with your Python and SQL skills.

Image from: https://media.charlesleifer.com/blog/photos/sqlite-and-python.png

Step 1: Check you’ve already PIP installed the Packages

For this project, we’ll use SQLite3 and Pandas.

$ pip install sqlite3
$ pip install pandas

Step 2: Download Data, Connect to SQLite, and Query the Tables

Let’s download survey data about Mental Health in the Tech Industry, which is on Kaggle.com (Kaggle is a great resource for amazing open-source datasets!). Go to the link and click the button “Download (6MB)”. You may need to sign up before you can download the data. I simply downloaded the data to my Downloads folder and unzipped the file. After unzipping, notice the “.sqlite” extension.

Now that we have the data, let’s connect to the SQLite database and query the tables.

To select all of the data in one table in SQL, we simply run: “select * from database_name.table_name”. The * means all/ everything.

import sqlite3
import pandas as pd# You may need to update the directory path ("../...") below.
db_connection = sqlite3.connect("../Downloads/mental_health.sqlite")# We're selecting and printing all from the Answer table.
answer_table = pd.read_sql("select * from Answer", db_connection)
print(answer_table)# answer_table Output #
                AnswerText  SurveyID  UserID  QuestionID
0                       37      2014       1           1
1                       44      2014       2           1
2                       32      2014       3           1
3                       31      2014       4           1
4                       31      2014       5           1
...                    ...       ...     ...         ...
236893               Other      2016    2689         117
236894             Support      2016    2690         117
236895  Back-end Developer      2016    2691         117
236896     DevOps/SysAdmin      2016    2692         117
236897               Other      2016    2693         117# We're selecting and printing all from the Question table.
question_table = pd.read_sql("select * from Question", db_connection)
print(question_table)# question_table Output #
                                          questiontext  questionid
0                                    What is your age?           1
1                                 What is your gender?           2
2                         What country do you live in?           3
3    If you live in the United States, which state ...           4
4                               Are you self-employed?           5
..                                                 ...         ...
100  Do you think that team members/co-workers woul...         114
101  If yes, what condition(s) have you been diagno...         115
102  If maybe, what condition(s) do you believe you...         116
103  Which of the following best describes your wor...         117
104                              Do you work remotely?         118# We're selecting and printing all from the Survey table.
survey_table = pd.read_sql("select * from Survey", db_connection)
print(survey_table)# survey_table Output #     SurveyID         Description
0      2014  mental health survey for 2014
1      2016  mental health survey for 2016
2      2017  mental health survey for 2017
3      2018  mental health survey for 2018
4      2019  mental health survey for 2019

We can answer some simple questions using our Answer table, such as: counting the number of unique UserIDs/ how many people took the survey.

sql_query = "select count(distinct UserID) as unique_ids from Answer"
distinct_userid_in_answer = pd.read_sql(sql_query, db_connection)print(distinct_userid_in_answer)# distinct_userid_in_answer Output #
unique_ids
0        4218

Let’s discuss some of the sql commands from this query:

count(): Means we’re going to count whatever is inside count().
Inside count() is distinct UserID. distinct means only get one occurrence for each value of (in this case) UserID.
as unique_ids: Changes the column name output to (in this case) unique_ids.

So we now know 4,218 unique users answered the survey. But to answer more questions, we need to join our tables together, so let’s try that!

Step 3: Entity-Relationship Diagrams (ERDs)

Before we get into how to join tables in SQL, let’s first visualize how our tables relate to each other using Entity-Relationship Diagrams (ERDs).

Creating ERDs is a useful skill to have because they make it easier to explain how tables interact with each other.

Now let’s explain what some of these ERD signs and symbols mean.

BoldColumnName (Pk): The Primary key (Pk) for each table. The Pk is helpful because it provides a unique value that can identify a specific row in a table. For example, in our code at the bottom of Step 2, we used UserID Pk to get the number of distinct/unique users.
ColumnName (Fk): The Foreign key (Fk) for a table. The Fk of one table is the Pk of another table. As such, the Fk gives us the ability to link (or join) two or more tables.
One to One Relationship: These are shown to the right of the survey_table and the left side of the question_table. For example, the One to One relationship between the answer_table and the survey_table means that each SurveyID in the answer_table can be tied back to one and only one SurveyID in the survey_table. This makes logical sense because the SurveyID is the primary key in the survey_table.
One to Many Relationship: This can be seen on either side of the answer_table. The One to Many relationship between the question_table and the answer_table means that each questionid in the question_table must be in the answer_table at least once and up to many times. This makes logical sense because the same question will be asked in each survey.

Now that we’re more familiar with ERDs, let’s code the ERD above using SQL!

Step 4: Joining Tables

After looking at the tables again, I think it makes the most sense to join answer_table and question_table (because survey_table does not provide any new data. SurveyID is just the date of the survey). To join those two tables together, we simply run:

sql_query = """
select * 
from Question 
inner join Answer on Question.questionid = Answer.QuestionID
"""question_answer_table = pd.read_sql(sql_query, db_connection)
print(question_answer_table)

Let’s discuss some of the sql commands from this query:

inner join: Combines rows that have matching values (You can learn more about joins in the links below).
on Question.questionid = Answer.QuestionID: This explains the values (questionid and QuestionID) we wish to match.

Now that we can join tables together, let’s practice filtering them.

Step 5: Filtering Queries

Filtering queries helps you pull a subset of data. This is helpful if your data has millions (or billions) of rows and/or you’re only curious about specific columns of data.

To practice filtering our data, let’s answer these two questions:

How many survey takers were 30 years old or younger during each survey year?
How many survey takers work in tech companies as opposed to non-tech?

How many survey takers were 30 years old or younger during each survey year?

sql_query = """
select SurveyID, count(*) as surveyors_less_than_30
from Question 
inner join Answer on Question.questionid = Answer.QuestionID
where questiontext in ('What is your age?')
and AnswerText < 30
group by SurveyID"""less_than_30_each_year = pd.read_sql(sql_query, db_connection)
print(less_than_30_each_year)# less_than_30_each_year Output #    SurveyID  count(*)
0      2014       521
1      2016       450
2      2017       205
3      2018       117
4      2019        99

Let’s discuss some of the sql commands from this query:

where questiontext in (‘What is your age?’) and AnswerText < 30: Where statements are used to filter data. In this case, we wanted only questions that ask surveyors age and if their age was less than 30.
group by SurveyID: Group by will group the same rows based on a column (SurveyID). This way we can aggregate up know how many surveys are from each other.

How many survey takers work in tech companies as opposed to non-tech?

sql_query = """
select AnswerText, count(*) as tech_or_nontech
from Question 
inner join Answer on Question.questionid = Answer.QuestionID
where questiontext like ('Is your employer primarily a tech company/orga%')
group by AnswerText"""tech_or_nontech = pd.read_sql(sql_query, db_connection)
print(tech_or_nontech)

Let’s discuss some of the sql commands from this query:

where questiontext like (‘Is your employer primarily a tech company/orga%’): This where statement filters questiontext for the “Is your employer primarily a tech company/orga…” question. The like operator means “is similar to.” The % wildcard at the end of the string means that the beginning text must match specifically, but the text after company/orga can be anything.

Your Turn to Explore More!

Learn more about SQL using this Cheatsheet!
Query the data to answer more questions related to gender, country of origin, or self-employed workers.
Use the Seaborn data visualization library for analysis.

Thanks for reading! Feel free to comment and let me know if you want a YouTube tutorial.

Want to stay in touch? Connect with me on LinkedIn!

Step-by-Step: Reading and Querying Data in SQLite using Python

Step 1: Check you’ve already PIP installed the Packages

Step 2: Download Data, Connect to SQLite, and Query the Tables

Step 3: Entity-Relationship Diagrams (ERDs)

Step 4: Joining Tables

Step 5: Filtering Queries

Your Turn to Explore More!

Written by Dan Blevins

No responses yet