Relational Algebra

  • A procedural language that is evaluated like a mathematical expression
  • Based on mathematical foundations of set theory
  • We don’t care about relationships vs entities here—they are all sets

Core

  • Operations
  • Select:
    • Picks rows that meet the condition
    • is a condition that refers to attributes
  • Projection:
    • Selects certain columns
    • is an ordered list of attributes from the schema
    • Alters the schema and data
    • Needs duplicate removal afterwards
  • Product:
    • Cartesian product of two sets
      • Pair each tuple of A with each tuple of B
    • Schema is the attributes of A and then B, ie
      • Attributes with the same name will be A.name, B.name
  • Theta-Join:
    • Equivalent to
    • Common form of boolean condition: where is =, <, >, etc.
  • Natural-Join:
    • A frequent type of join
    • Connects two relations by equating attributes of the same name (ie key to foreign key)
    • Projects out one copy of the equated attributes (removes the duplicate key)
  • Renaming:
    • Gives a new schema to a relation
    • Alternate notation: R1(A1,…An) := (R2)

Complex Expressions

  • Parenthesis override order of operations (like normal algebra)
  • Three representations
    • Sequences of assignments
    • Expressions with parenthesis
    • Trees
      • Leaves are operands (either variables of relations or constant relations)
      • Inner nodes are operators
  • Precedence
    1. Unary (select, project, rename)
    2. Product, Join
    3. Intersection
    4. Union, Difference

Bag Semantics

  • Bags (aka multiset) is like a set but an element can appear more than once
    • Sets are also bags that happen to be a set
  • Used because deduplication is an expensive operation that requires sorting
  • Most commercial databases use bags semantics unless the attribute is explicitly declared to be unique
    • SQL is implemented on top of bag semantics

Operators

  • Union: Add counts
  • Intersection: The minimum counts of elements that appear in both
  • Difference: The number of times an object appears in A minus the number of times in B (without going negative)
  • Selection: Same
  • Projection: Do not eliminate duplicates
  • Product and Joins: Same
  • Bag laws != Set laws (some but not all hold)
    • Ex: commutative law for union holds for both (because addition is commutative)
    • Ex: Union idempotent law () does not hold because each element would be duplicated

Equivalence Laws

Extended

  • Eliminate duplicates from bags:
  • Sort tuples:
  • Extended projection
    • Arithmetic, duplication of columns
    • can include arbitrary arithmetic
  • Grouping and aggregation:
    • is a list of elements that are either
      • Grouping from the base table
      • Aggregation
    • Ex: sum, average, min, max
  • Outer Join
    • Include “dangling tuples” (tuples that do not join with anything) in the result