CS 500, Database Theory, Summer 2016 Homework 2: Relational Algebra and SQL Due at 5pm on Wednesday, July 20, 2016 NO LATE SUBMISSIONS WILL BE ACCEPTED Description This assignment covers relational algebra and SQL. When writing relational algebra statements and SQL queries, keep in mind that your answer should be correct for any valid instance of the given database schema. In other words you are writing programs that should compute correctly on any valid input, not only on the particular instance you are given as an example. Details and Grading This assignment is made up of 3 problems, collectively worth 80 points, or 5% of the over-all course grade. If this assignment is submitted late, you will receive no credit. This assignment is to be completed individually. Please consult the course syllabus for a description of our academic honesty policy. (1) Relational algebra queries can be written by hand, in LaTex or using any other editor / typesetting system of your choice. I prefer to receive this part of your assignment as a PDF file. (2) For the SQL part of the assignment, submit two text files, one per problem, containing your queries. I will execute the queries to test their correctness on several different database instances. I encourage you to use a relational database to test your queries. You should construct this database yourself. Your SQL queries should compile in PostgreSQL. Submission instructions Submit your assignment to gitlab on king.cs.drexel.edu. I assume that you already followed the steps to create your git repository. I will refer to the root directory of your git repository as $GIT_HOME. Create a directory called cs500-hw2 (case-sensitive, use exactly this name) under $GIT_HOME: mkdir $GIT_HOME/cs500- hw2 cd $GIT_HOME/cs500- hw2 Place the files you wish to submit, e.g., part1.pdf, part2.txt, part3.txt into this directory. You can now commit your assignment as follows: git add * git commit - m homework 2 git push You may submit multiple times before the deadline, only your last submission committed before the deadline will be graded. 1
Tennis_Players (name, country, ATP_rank, age, points) name country ATP_rank age points Djokovic Serbia 1 29 15040 Murray UK 2 29 10195 Federer Switzerland 3 34 5945 Nadal Spain 4 30 5290 Wawrinka Switzerland 5 31 4720 Nishikori Japan 6 26 4290 Raonic Serbia 7 25 4285 Years_Ranked_First (name, year) name year Djokovic 2015 Djokovic 2014 Nadal 2013 Djokovic 2012 Djokovic 2011 Nadal 2010 Federer 2009 Nadal 2008 Countries (name, GDP, population) name GDP (B) population (M) USA 18,558 325 China 11,383 1,383 Japan 4,412 126 Germany 3,467 80 UK 2,853 65 Spain 1,242 46 Switzerland 651 8 Serbia 37 9 Federer 2007 Federer 2006 Federer 2005 Federer 2004 2
Part 1 (30 points): Relational Algebra Consider relation instances on the previous page, with the given schemas. In each question below, write a relational algebra expression that computes the required answer. (a) List names of home countries of tennis players who were ranked first between 2013 and 2010 (inclusive). (b) List names and GDPs of countries from which there are no tennis player in our database. (c) List pairs of tennis players such that (i) the ATP rank of the first is lower (better) than that of the second, and (ii) the GDP of his home country is lower than that of the second. (d) List name, age, ATP rank and country s GDP of tennis players from Spain or Serbia. (e) List name, ATP rank and country of tennis players who were ranked first in 2010 or later but not before 2010. (f) List names and populations of countries of tennis players who are currently ranked 5 or lower (better), are currently 30 years old or older, and were ranked first in some year since 2004 (including 2004). 3
Part 2 (30 points): SQL Consider again relation instances on page 2, with the given schemas. In each question below, write a SQL query that computes the required answer. (a) For each country, compute the number of years in which one of its tennis players was ranked first. Result should have the schema (country, num_years). (b) List pairs of tennis players (player1, player2) in which player1 both has a lower (better) ATP rank than player 2 and comes from a less populous country. (c) List pairs of players from the same country. List each pair exactly once. That is, you should list either (Djokovic, Raonic, Serbia) or (Raonic, Djokovic, Serbia), but not both. Result should have the schema (player1, player2, country). (d) For countries with at least 2 tennis players, list country name, GDP and average age of its tennis players. Result should have the schema (country, GDP, avg_age). (e) List country name, GDP and population of each country. For countries that have tennis players in our database, also list the minimum age of its tennis players. Result should have the schema (country, GDP, population, min_age). (f) List names of countries who had a top-ranked tennis player both in 2010 or earlier (i.e., between 2004 and 2010, inclusive) and after 2010 (i.e., between 2011 and 2015, inclusive). 4
Part 3 (20 points) SQL Foods (food, category, calories) Dishes (dish, food) (a) (10 points) Write two equivalent SQL queries that lists dishes in which one of the ingredients is a meat and another is a veg. List each dish exactly once. Sort results in alphabetical order. Result should have the schema (dish). (b) (5 points) Write a SQL query that computes the number of ingredients and the number of calories per dish. Only return dishes that have fewer than 250 total calories. Result should have the schema (dish, num_ingredients, total_calories). (c) (5 points) Write a SQL query that list dishes with exactly 3 ingredients, along with the total number of calories per dish. Only return dishes that have at least 200 total calories. Result should have the schema (dish, total_calories). 5