aboutsummaryrefslogtreecommitdiff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e054359",
   "metadata": {},
   "outputs": [],
   "source": [
    "import agh_db_lectures\n",
    "agh_db_lectures.prepare_notebook_for_sql()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccff71a5",
   "metadata": {},
   "source": [
    "# Relational Algebra"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b72b893e",
   "metadata": {},
   "source": [
    "The following **formal** languages exist for expressing queries.\n",
    "\n",
    "- tuple relational calculus\n",
    "- domain relational calculus\n",
    "- **relational algebra**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e7139b8",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We mean queries on relational databases, of course.\n",
    "\n",
    "SQL is based on relational algebra.\n",
    "\n",
    "Sometimes a categorization into declarative (aka nonprocedural) languages (tuple and domain relational calculi) and procedural languges (SQL, relational algebra) is made."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "439b940a",
   "metadata": {},
   "source": [
    "## Relational Algebra Operators"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfaad0ad",
   "metadata": {},
   "source": [
    "- selection — $\\sigma$\n",
    "- projection – $\\Pi$\n",
    "- union — $\\cup$\n",
    "- set difference — $-$\n",
    "- cartesian product — $\\times$\n",
    "- renaming — $\\rho$\n",
    "\n",
    "- set intersection — $\\cap$\n",
    "- (natural) join — $\\bowtie$ (and \"outer\" variants $⟕$, $⟖$, and $⟗$)\n",
    "- assignment — $\\leftarrow$\n",
    "\n",
    "and extended relational algebra operators\n",
    "\n",
    "- generalized projection — $\\Pi$\n",
    "- aggregation – $\\mathcal{G}$ (or $\\gamma$)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08012f1f",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "The operators from intersection onwards are sometimes labeled \"additional\".\n",
    "\n",
    "The operators correspond to operations that we perform in SQL.\n",
    "\n",
    "They **operate on relations**, i.e., the arguments of the operators are relations (sets of tuples) and the result of each operation is a relation (a set of tuples).\n",
    "\n",
    "In its basic form, the algebra considers relations as mathematical sets (without duplicates)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb64cdc4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind\n",
    "%config SqlMagic.feedback = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acfd5a4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "agh_db_lectures.nw_diagram.download_open()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2faffdc",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We shall use the Northwind db as example.\n",
    "\n",
    "We are going to preset relational algebra epressions together with their SQL equivalents."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86d4ddff",
   "metadata": {},
   "source": [
    "## Selection"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79c78b7e",
   "metadata": {},
   "source": [
    "$\\sigma_{units\\_in\\_stock=0}(products)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55e68362",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM products WHERE units_in_stock = 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86f5bdda",
   "metadata": {},
   "source": [
    "Logical operators can be used in the condition of a selection.\n",
    "\n",
    "$\\sigma_{units\\_in\\_stock=0 \\land units\\_on\\_order > 0}(products)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0559ba2",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM products WHERE units_in_stock = 0 AND units_on_order > 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c78224a2",
   "metadata": {},
   "source": [
    "## Projection"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46286a9b",
   "metadata": {},
   "source": [
    "$\\Pi_{category\\_id, category\\_name}(categories)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0721688",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT category_id, category_name FROM categories"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "530f3e08",
   "metadata": {},
   "source": [
    "The operators can be combined (one can act on the result of another).\n",
    "\n",
    "$\\Pi_{product\\_name}(\\sigma_{units\\_in\\_stock=0}(products))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06f3cc54",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT product_name FROM products WHERE units_in_stock = 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a05a83a6",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Note that to fully match the formal meaning of relational algebra operators, we'd need to remove duplicates (e.g., with `SELECT DISTINCT`) in all SQL queries.  In our examples we only use it where actually necessary to avoid duplicates when working with our database instance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58cc2503",
   "metadata": {},
   "source": [
    "## Union"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a45ac8bf",
   "metadata": {},
   "source": [
    "$\\sigma_{units\\_in\\_stock=0}(products) \\cup \\sigma_{discontinued=1}(products)$\n",
    "\n",
    "Input relations must have the same number of attributes and attributes must have compatible domains."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd358205",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM products WHERE units_in_stock = 0 OR discontinued = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0a4eb06",
   "metadata": {},
   "source": [
    "We could've also written the SQL like this.\n",
    "\n",
    "```sql\n",
    "SELECT * FROM products WHERE units_in_stock = 0\n",
    "UNION\n",
    "SELECT * FROM products WHERE discontinued = 1\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68158f63",
   "metadata": {},
   "source": [
    "## Difference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8806fad4",
   "metadata": {},
   "source": [
    "$\\sigma_{units\\_in\\_stock=0}(products) - \\sigma_{discontinued=1}(products)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61f37d8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM products WHERE units_in_stock = 0 AND NOT discontinued = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45003f18",
   "metadata": {},
   "source": [
    "We could've also written the SQL using the `EXCEPT` keyword."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2effdb9",
   "metadata": {},
   "source": [
    "## Cartesian Product"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2226249",
   "metadata": {},
   "source": [
    "$\\Pi_{first\\_name, last\\_name, category\\_name}(employees \\times categories)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0be27021",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT first_name, last_name, category_name FROM employees, categories"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a66c12b",
   "metadata": {},
   "source": [
    "## Renaming"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffb122f5",
   "metadata": {},
   "source": [
    "$\\rho_{contacts(customer\\_name, customer\\_contact)}(\\Pi_{company\\_name, phone}(customers))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bd0b98d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT company_name AS customer_name, phone AS customer_contact FROM customers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9233b24e",
   "metadata": {},
   "source": [
    "$\\Pi_{a.first\\_name, a.last\\_name, b.first\\_name, b.last\\_name}(\\rho_a(employees) \\times \\rho_b(employees))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4de3c50a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT a.first_name, a.last_name, b.first_name, b.last_name\n",
    "FROM employees a, employees b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78c0997c",
   "metadata": {},
   "source": [
    "## Constant Relations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d70864e0",
   "metadata": {},
   "source": [
    "$\\rho_{city(name, latitude, longitude)}(\\{({\\rm Cracov}, 50.06, 19.93), ({\\rm Corinth}, 37.94, 22.93), ({\\rm Boston}, 42.36, -71.06)\\})$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58d7160e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT *\n",
    "FROM (VALUES ('Cracov', 50.06, 19.93),\n",
    "             ('Corinth', 37.94, 22.93),\n",
    "             ('Boston', 42.36, -71.06)) AS city (name, latitude, longitude)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05818736",
   "metadata": {},
   "source": [
    "## Intersection"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "078f87b1",
   "metadata": {},
   "source": [
    "Could also be expressed using a composition of set difference operations.\n",
    "\n",
    "$\\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products)) \\cap \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42017250",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT DISTINCT supplier_id FROM products WHERE category_id = 1\n",
    "INTERSECT\n",
    "SELECT supplier_id FROM suppliers WHERE country = 'USA'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b238ba7",
   "metadata": {},
   "source": [
    "## Assignment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65094d7d",
   "metadata": {},
   "source": [
    "$cat1\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products))$\n",
    "\n",
    "$usa\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$\n",
    "\n",
    "$cat1\\_suppliers \\cap usa\\_suppliers$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f3c67dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS cat1_suppliers;\n",
    "CREATE TABLE cat1_suppliers AS\n",
    "SELECT DISTINCT supplier_id FROM products WHERE category_id = 1;\n",
    "\n",
    "DROP TABLE IF EXISTS usa_suppliers;\n",
    "CREATE TABLE usa_suppliers AS\n",
    "SELECT supplier_id FROM suppliers WHERE country = 'USA';\n",
    "\n",
    "SELECT * FROM cat1_suppliers INTERSECT SELECT * FROM usa_suppliers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbe1ff8c",
   "metadata": {},
   "source": [
    "### Insertion, Modification, Deletion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac7ff10f",
   "metadata": {},
   "source": [
    "```sql\n",
    "DELETE FROM products WHERE discontinued = 1 AND units_in_stock = 0\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ad608cf",
   "metadata": {},
   "source": [
    "This can be expressed using the operators we already know."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a8545e",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Any ideas?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2263914e",
   "metadata": {},
   "source": [
    "$products \\leftarrow products - \\sigma_{discontinued=1 \\land units\\_in\\_stock=0}(products)$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a20bd4b",
   "metadata": {},
   "source": [
    "## Natural Join"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a4818aa",
   "metadata": {},
   "source": [
    "Could also be expressed as a composition of cartesian product, selection and projection.\n",
    "\n",
    "$\\Pi_{category\\_id, category\\_name, product\\_name}(categories \\bowtie products)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c0ce2c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT category_id, category_name, product_name\n",
    "FROM categories NATURAL JOIN products"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67d1f770",
   "metadata": {},
   "source": [
    "### Outer Joins"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c13c17c",
   "metadata": {},
   "source": [
    "$\\Pi_{company\\_name, order\\_id, order\\_date}(customers ⟕ orders)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67e3e476",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT company_name, order_id, order_date\n",
    "FROM customers NATURAL LEFT JOIN orders"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "260ef853",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "There are also symbols $⟖$ and $⟗$ that are used for natural left outer join and natural full outer join, respectively. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a81b9de",
   "metadata": {},
   "source": [
    "## Theta Join"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1e17ced",
   "metadata": {},
   "source": [
    "This is a join with arbitrary condition.\n",
    "\n",
    "$\\bowtie_{\\Theta}$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b166dbab",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "The name \"theta join\" name comes from the notation above, where $\\Theta$ (theta) in the subscript represets the condition."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "140b0b36",
   "metadata": {},
   "source": [
    "$\\Pi_{e.first\\_name, e.last\\_name, b.first\\_name, b.last\\_name}(\\\\\\rho_e(employees) \\bowtie_{e.reports\\_to = b.employee\\_id} \\rho_b(employees))$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aeea7146",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT e.first_name, e.last_name, b.first_name, b.last_name\n",
    "FROM employees e JOIN employees b ON e.reports_to = b.employee_id"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9e45942",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Outer join variants can also be used as theta joins."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7921f240",
   "metadata": {},
   "source": [
    "## Generalized Projection"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afc54d91",
   "metadata": {},
   "source": [
    "$\\Pi_{order\\_id, product\\_id, unit\\_price \\cdot quantity}(order\\_details)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eeec87fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT order_id, product_id, unit_price * quantity FROM order_details"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b246fb9d",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "The only difference from normal projection is that operations can be performed on values.  There is not dedicated symbol for generalized projection."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc804c3e",
   "metadata": {},
   "source": [
    "## Aggregation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d777b3f1",
   "metadata": {},
   "source": [
    "$_{id}\\mathcal{G}_{id, sum(value)}(\\rho_{v(id, value)}(\\Pi_{order\\_id, unit\\_price \\cdot quantity}(order\\_details)))$\n",
    "\n",
    "The meaning of lower-index text of the aggregation oprator is as annotated below.\n",
    "\n",
    "$_{\\underset{grouping \\atop attribtues}{id}}\\mathcal{G}_{\\underset{expressions\\ list \\atop (with\\ aggregate\\ functions)}{id, sum(value)}}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e06b5021",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT order_id AS id, SUM(unit_price * quantity)\n",
    "FROM order_details\n",
    "GROUP BY order_id"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}