In this project we will use the cleaned data from the previous mission and work mainly in SQL. We will use SQL to transform the nominations tables into new tables, and create relations across these tables.
First let's take a look at the table's schema and the first ten rows.
import sqlite3
conn = sqlite3.connect("nominations.db")
cursor = conn.cursor()
q1 = '''
PRAGMA table_info(nominations)
'''
schema = cursor.execute(q1).fetchall()
q2 = '''
SELECT * FROM nominations LIMIT 10
'''
first_ten = cursor.execute(q2).fetchall()
first_ten
for item in first_ten:
print(item)
for item in schema:
print(item)
We want to create a new table within the databse named "ceremonies". This table will have the Year and the Host only
q3 = '''
CREATE TABLE ceremonies (
id INTEGER primary key,
Year INTEGER,
Host TEXT
);
'''
years_hosts = [(2010, "Steve Martin"),
(2009, "Hugh Jackman"),
(2008, "Jon Stewart"),
(2007, "Ellen DeGeneres"),
(2006, "Jon Stewart"),
(2005, "Chris Rock"),
(2004, "Billy Crystal"),
(2003, "Steve Martin"),
(2002, "Whoopi Goldberg"),
(2001, "Steve Martin"),
(2000, "Billy Crystal"),
]
insert_query = "INSERT INTO ceremonies (Year, Host) VALUES (?, ?)"
The .executemany() method allows us to perform many insert commands at once, until the list ends
conn.execute(q3)
conn.executemany(insert_query, years_hosts)
Let's check out the table we just created .
q4 = "SELECT * FROM ceremonies LIMIT 10"
q5 = "PRAGMA table_info(ceremonies)"
result = cursor.execute(q4)
result = result.fetchall()
result
result = cursor.execute(q5)
result = result.fetchall()
result
We want to avoid editing the original table. So let's make a new nominations table, but this time we'll include the ceremony_id as a foriegn key.
q6 = "PRAGMA foreign_keys = ON;"
cursor.execute(q6)
q7 = '''
CREATE TABLE nominations_two (
id integer primary key,
category text,
nominee text,
movie text,
character text,
won integer,
ceremony_id integer,
foreign key(ceremony_id) references ceremonies(id)
);
'''
#Query to be inserted into the new table
insert_query2 = "INSERT INTO nominations_two (category, nominee, movie, character, won, ceremony_id) VALUES (?, ?, ?, ?, ?, ?)"
cursor.execute(q7)
We'll need a list of tuples, we can do this by writing a query and setting the results to a list.
q8 = '''
SELECT nominations.category, nominations.nominee, nominations.movie, nominations.character, nominations.won, ceremonies.id
FROM nominations
INNER JOIN ceremonies
ON nominations.year == ceremonies.year
'''
#returns a list of tuples
joined_nominations = cursor.execute(q8).fetchall()
joined_nominations[0:5]
#populating the nominations_two table
conn.executemany(insert_query2, joined_nominations)
Let's check out the new table, "nominations_two" that we just made.
q9 = "SELECT * FROM nominations_two LIMIT 5"
result = cursor.execute(q9)
result = result.fetchall()
result[0:5]
We can change the table name of nominations_two back to nominations, replacing the table.
#We can drop the table "nominations" with the DROP TABLE sql command
q10 = "DROP TABLE nominations"
conn.execute(q10)
q11 = "ALTER TABLE nominations_two RENAME TO nominations"
conn.execute(q11)
We can repeat the process and create three new tables and use have their IDs cross reference each other.
q11 = '''
CREATE TABLE movies(
id integer PRIMARY KEY,
movie text
)'''
q12 = '''
CREATE TABLE actors(
id integer PRIMARY KEY,
actor text
)'''
q13 = '''
CREATE TABLE movies_actors(
id integer PRIMARY KEY,
movie_id integer references movies(id),
actor_id integer references actors(id)
);
'''
conn.execute(q11)
conn.execute(q12)
conn.execute(q13)
insert_into_movies = "INSERT INTO movies (movie) SELECT nominations.movie from nominations"
insert_into_actors = "INSERT INTO actors (actor) SELECT nominations.nominee from nominations"
conn.execute(insert_into_movies)
conn.execute(insert_into_actors)
print(conn.execute("SELECT * FROM actors LIMIT 5").fetchall())
print(conn.execute("SELECT * FROM movies LIMIT 5").fetchall())
q14 = "SELECT movie, nominee FROM nominations"
list_actors_movies = conn.execute(q14).fetchall()
list_actors_movies[0:5]
insert_query = '''
INSERT INTO movies_actors(movie_id, actor_id)
VALUES ((SELECT id FROM movies WHERE movie == ?), (SELECT id from actors where actor == ?))
'''
conn.executemany(insert_query, list_actors_movies)
print(conn.execute("SELECT * FROM movies_actors LIMIT 5").fetchall())
Learning Summary¶
Python/SQL concepts explored: python+sqlite3, pandas, multiple tables, foreign keys, subqueries, populating new tables
Python functions and methods used: .cursor(), .connect(), .execute(), .fetchall(), .executemany()
SQL statements used: PRAGMA, LIMIT, FROM, SELECT, INNER JOIN, DROP, ALTER, VALUES
The files used for this project can be found in my GitHub repository.
Comments
comments powered by Disqus