Structural Query Language (SQL)

Queries

Select

SELECT [DISTINCT] target-list
FROM   relation-list
WHERE  qualification
  • Target-list
    • A list of attributes of relations in relation list
    • as and = both can name fields
    • Can additionally contain arithmetic (ex: age1 = S.age - 5 or 2 * S.age as age2)
  • Relation-list
    • A list of relation names
    • Possibly with a range-variable after each name
    • If multiple are included, the cross product is taken
  • Qualification: Comparisons
    • Formatted attr1 op const or attr1 op attr2
    • Where op is one of
    • Combined using AND, OR, and NOT
    • String comparison can also be done with LIKE
      • _ stands for any one character and
      • % stands for 0 or more arbitrary characters.
  • DISTINCT: optional keyword indicating that the answer should not contain duplicates
  • Can add ORDER BY attribute-list and LIMIT to make things run faster
  • Grouping
    • Add GROUP BY grouping-list HAVING group-qualification
      • grouping list = grouping attributes
      • target list = contains grouping attribute names or terms with aggregate operations
      • group-qualification = boolean conditions on grouping attributes or group aggregates
    • Each answer tuple corresponds to a group, and these attributes must have a single value per group
    • One answer tuple is generated per group
    • Grouping additionally allows you to include the attribute that is was grouped over alongside an aggregation in the select

Nested Queries

  • A WHERE, FROM, and HAVING clauses can contain an SQL query
  • Using an IN within the WHERE connecting to the nested clause is common
  • Semantics
    • Conceptual nested loop evaluation
    • For each top level tuple, check the qualification by computing the subquery
  • Using the result
    • Tuple IN R
    • Val > ANY/EVERY R
    • `Tuple = ANY/EVERY R

Joins

  • Can specify natural, outer, inner
  • Uses bag semantics
  • Theta join: R JOIN S ON <condition>
  • Outer Joins: R [NATURAL] [LEFT/RIGHT/FULL] OUTER JOIN S [ON <condition>]
    • left = pad dangling tuples of R only
    • right = pad dangling tuples of S only
    • full = pad both (default)

Set Operations

  • INTERSECT, UNION, MINUS/EXCEPT
  • Use set semantics by default
    • Because these are done via sorting, so it is cheap to also do deduplication afterwards
    • Can opt back into bag semantics by adding ALL after the operation

Aggregate Operators

COUNT (*)
COUNT ([DISTINCT] A)
SUM ([DISTINCT] A)
AVG ([DISTINCT] A)
MAX (A)
MIN (A)

Evaluation

  • Semantics of an SQL query are defined using a conceptual evaluation strategy
    1. Compute the cross-product of the relationship-list
    2. Discard the resulting tuples if they fail qualifications
    3. Delete attributes that are not in the target-list
    4. If distinct is specified, eliminate duplicate rows
  • Conceptual Evaluation Strategy != Actual Evaluation Strategy
    • The actual database uses a more efficient approach that generates the same answer

Data Modification Language

  • A modification command does not return a result but does change the database in some way

Insert

  • INSERT INTO <relation> VALUES (<list of values>)
  • Can specify attribute order INSERT INTO table(attr1, attr2) VALUES (val1, val2)
  • Can use subqueries to insert data from one table into another
    • INSERT INTO <relation> (<subquery>)

Delete

  • DELETE FROM <relation> WHERE <condition>
    • Where can be a full query condition
  • If where is not specified, it just deletes everything

Updates

UPDATE <relation>  
SET <list of attribute assignments>  
WHERE <condition on tuples>

Optimizations

  • Push selection down the tree
  • Change the order of joints (start a join with the smaller table)