{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "4e054359",
"metadata": {},
"outputs": [],
"source": [
"import agh_db_lectures\n",
"agh_db_lectures.prepare_notebook_for_sql()"
]
},
{
"cell_type": "markdown",
"id": "ccff71a5",
"metadata": {},
"source": [
"# Relational Algebra"
]
},
{
"cell_type": "markdown",
"id": "b72b893e",
"metadata": {},
"source": [
"The following **formal** languages exist for expressing queries.\n",
"\n",
"- tuple relational calculus\n",
"- domain relational calculus\n",
"- **relational algebra**"
]
},
{
"cell_type": "markdown",
"id": "3e7139b8",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"We mean queries on relational databases, of course.\n",
"\n",
"SQL is based on relational algebra.\n",
"\n",
"Sometimes a categorization into declarative (aka nonprocedural) languages (tuple and domain relational calculi) and procedural languges (SQL, relational algebra) is made."
]
},
{
"cell_type": "markdown",
"id": "439b940a",
"metadata": {},
"source": [
"## Relational Algebra Operators"
]
},
{
"cell_type": "markdown",
"id": "bfaad0ad",
"metadata": {},
"source": [
"- selection — $\\sigma$\n",
"- projection – $\\Pi$\n",
"- union — $\\cup$\n",
"- set difference — $-$\n",
"- cartesian product — $\\times$\n",
"- renaming — $\\rho$\n",
"\n",
"- set intersection — $\\cap$\n",
"- (natural) join — $\\bowtie$ (and \"outer\" variants $⟕$, $⟖$, and $⟗$)\n",
"- assignment — $\\leftarrow$\n",
"\n",
"and extended relational algebra operators\n",
"\n",
"- generalized projection — $\\Pi$\n",
"- aggregation – $\\mathcal{G}$ (or $\\gamma$)"
]
},
{
"cell_type": "markdown",
"id": "08012f1f",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"The operators from intersection onwards are sometimes labeled \"additional\".\n",
"\n",
"The operators correspond to operations that we perform in SQL.\n",
"\n",
"They **operate on relations**, i.e., the arguments of the operators are relations (sets of tuples) and the result of each operation is a relation (a set of tuples).\n",
"\n",
"In its basic form, the algebra considers relations as mathematical sets (without duplicates)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb64cdc4",
"metadata": {},
"outputs": [],
"source": [
"%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind\n",
"%config SqlMagic.feedback = False"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acfd5a4d",
"metadata": {},
"outputs": [],
"source": [
"agh_db_lectures.nw_diagram.download_open()"
]
},
{
"cell_type": "markdown",
"id": "c2faffdc",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"We shall use the Northwind db as example.\n",
"\n",
"We are going to preset relational algebra epressions together with their SQL equivalents."
]
},
{
"cell_type": "markdown",
"id": "86d4ddff",
"metadata": {},
"source": [
"## Selection"
]
},
{
"cell_type": "markdown",
"id": "79c78b7e",
"metadata": {},
"source": [
"$\\sigma_{units\\_in\\_stock=0}(products)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55e68362",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM products WHERE units_in_stock = 0"
]
},
{
"cell_type": "markdown",
"id": "86f5bdda",
"metadata": {},
"source": [
"Logical operators can be used in the condition of a selection.\n",
"\n",
"$\\sigma_{units\\_in\\_stock=0 \\land units\\_on\\_order > 0}(products)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0559ba2",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM products WHERE units_in_stock = 0 AND units_on_order > 0"
]
},
{
"cell_type": "markdown",
"id": "c78224a2",
"metadata": {},
"source": [
"## Projection"
]
},
{
"cell_type": "markdown",
"id": "46286a9b",
"metadata": {},
"source": [
"$\\Pi_{category\\_id, category\\_name}(categories)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f0721688",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT category_id, category_name FROM categories"
]
},
{
"cell_type": "markdown",
"id": "530f3e08",
"metadata": {},
"source": [
"The operators can be combined (one can act on the result of another).\n",
"\n",
"$\\Pi_{product\\_name}(\\sigma_{units\\_in\\_stock=0}(products))$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06f3cc54",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT product_name FROM products WHERE units_in_stock = 0"
]
},
{
"cell_type": "markdown",
"id": "a05a83a6",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"Note that to fully match the formal meaning of relational algebra operators, we'd need to remove duplicates (e.g., with `SELECT DISTINCT`) in all SQL queries. In our examples we only use it where actually necessary to avoid duplicates when working with our database instance."
]
},
{
"cell_type": "markdown",
"id": "58cc2503",
"metadata": {},
"source": [
"## Union"
]
},
{
"cell_type": "markdown",
"id": "a45ac8bf",
"metadata": {},
"source": [
"$\\sigma_{units\\_in\\_stock=0}(products) \\cup \\sigma_{discontinued=1}(products)$\n",
"\n",
"Input relations must have the same number of attributes and attributes must have compatible domains."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fd358205",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM products WHERE units_in_stock = 0 OR discontinued = 1"
]
},
{
"cell_type": "markdown",
"id": "c0a4eb06",
"metadata": {},
"source": [
"We could've also written the SQL like this.\n",
"\n",
"```sql\n",
"SELECT * FROM products WHERE units_in_stock = 0\n",
"UNION\n",
"SELECT * FROM products WHERE discontinued = 1\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "68158f63",
"metadata": {},
"source": [
"## Difference"
]
},
{
"cell_type": "markdown",
"id": "8806fad4",
"metadata": {},
"source": [
"$\\sigma_{units\\_in\\_stock=0}(products) - \\sigma_{discontinued=1}(products)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "61f37d8b",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM products WHERE units_in_stock = 0 AND NOT discontinued = 1"
]
},
{
"cell_type": "markdown",
"id": "45003f18",
"metadata": {},
"source": [
"We could've also written the SQL using the `EXCEPT` keyword."
]
},
{
"cell_type": "markdown",
"id": "c2effdb9",
"metadata": {},
"source": [
"## Cartesian Product"
]
},
{
"cell_type": "markdown",
"id": "a2226249",
"metadata": {},
"source": [
"$\\Pi_{first\\_name, last\\_name, category\\_name}(employees \\times categories)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0be27021",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT first_name, last_name, category_name FROM employees, categories"
]
},
{
"cell_type": "markdown",
"id": "7a66c12b",
"metadata": {},
"source": [
"## Renaming"
]
},
{
"cell_type": "markdown",
"id": "ffb122f5",
"metadata": {},
"source": [
"$\\rho_{contacts(customer\\_name, customer\\_contact)}(\\Pi_{company\\_name, phone}(customers))$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5bd0b98d",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT company_name AS customer_name, phone AS customer_contact FROM customers"
]
},
{
"cell_type": "markdown",
"id": "9233b24e",
"metadata": {},
"source": [
"$\\Pi_{a.first\\_name, a.last\\_name, b.first\\_name, b.last\\_name}(\\rho_a(employees) \\times \\rho_b(employees))$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4de3c50a",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT a.first_name, a.last_name, b.first_name, b.last_name\n",
"FROM employees a, employees b"
]
},
{
"cell_type": "markdown",
"id": "78c0997c",
"metadata": {},
"source": [
"## Constant Relations"
]
},
{
"cell_type": "markdown",
"id": "d70864e0",
"metadata": {},
"source": [
"$\\rho_{city(name, latitude, longitude)}(\\{({\\rm Cracov}, 50.06, 19.93), ({\\rm Corinth}, 37.94, 22.93), ({\\rm Boston}, 42.36, -71.06)\\})$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58d7160e",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT *\n",
"FROM (VALUES ('Cracov', 50.06, 19.93),\n",
" ('Corinth', 37.94, 22.93),\n",
" ('Boston', 42.36, -71.06)) AS city (name, latitude, longitude)"
]
},
{
"cell_type": "markdown",
"id": "05818736",
"metadata": {},
"source": [
"## Intersection"
]
},
{
"cell_type": "markdown",
"id": "078f87b1",
"metadata": {},
"source": [
"Could also be expressed using a composition of set difference operations.\n",
"\n",
"$\\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products)) \\cap \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42017250",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT DISTINCT supplier_id FROM products WHERE category_id = 1\n",
"INTERSECT\n",
"SELECT supplier_id FROM suppliers WHERE country = 'USA'"
]
},
{
"cell_type": "markdown",
"id": "1b238ba7",
"metadata": {},
"source": [
"## Assignment"
]
},
{
"cell_type": "markdown",
"id": "65094d7d",
"metadata": {},
"source": [
"$cat1\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products))$\n",
"\n",
"$usa\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$\n",
"\n",
"$cat1\\_suppliers \\cap usa\\_suppliers$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f3c67dc",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS cat1_suppliers;\n",
"CREATE TABLE cat1_suppliers AS\n",
"SELECT DISTINCT supplier_id FROM products WHERE category_id = 1;\n",
"\n",
"DROP TABLE IF EXISTS usa_suppliers;\n",
"CREATE TABLE usa_suppliers AS\n",
"SELECT supplier_id FROM suppliers WHERE country = 'USA';\n",
"\n",
"SELECT * FROM cat1_suppliers INTERSECT SELECT * FROM usa_suppliers"
]
},
{
"cell_type": "markdown",
"id": "bbe1ff8c",
"metadata": {},
"source": [
"### Insertion, Modification, Deletion"
]
},
{
"cell_type": "markdown",
"id": "ac7ff10f",
"metadata": {},
"source": [
"```sql\n",
"DELETE FROM products WHERE discontinued = 1 AND units_in_stock = 0\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "0ad608cf",
"metadata": {},
"source": [
"This can be expressed using the operators we already know."
]
},
{
"cell_type": "markdown",
"id": "f6a8545e",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"Any ideas?"
]
},
{
"cell_type": "markdown",
"id": "2263914e",
"metadata": {},
"source": [
"$products \\leftarrow products - \\sigma_{discontinued=1 \\land units\\_in\\_stock=0}(products)$"
]
},
{
"cell_type": "markdown",
"id": "8a20bd4b",
"metadata": {},
"source": [
"## Natural Join"
]
},
{
"cell_type": "markdown",
"id": "6a4818aa",
"metadata": {},
"source": [
"Could also be expressed as a composition of cartesian product, selection and projection.\n",
"\n",
"$\\Pi_{category\\_id, category\\_name, product\\_name}(categories \\bowtie products)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c0ce2c5",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT category_id, category_name, product_name\n",
"FROM categories NATURAL JOIN products"
]
},
{
"cell_type": "markdown",
"id": "67d1f770",
"metadata": {},
"source": [
"### Outer Joins"
]
},
{
"cell_type": "markdown",
"id": "9c13c17c",
"metadata": {},
"source": [
"$\\Pi_{company\\_name, order\\_id, order\\_date}(customers ⟕ orders)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67e3e476",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT company_name, order_id, order_date\n",
"FROM customers NATURAL LEFT JOIN orders"
]
},
{
"cell_type": "markdown",
"id": "260ef853",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"There are also symbols $⟖$ and $⟗$ that are used for natural left outer join and natural full outer join, respectively. "
]
},
{
"cell_type": "markdown",
"id": "7a81b9de",
"metadata": {},
"source": [
"## Theta Join"
]
},
{
"cell_type": "markdown",
"id": "c1e17ced",
"metadata": {},
"source": [
"This is a join with arbitrary condition.\n",
"\n",
"$\\bowtie_{\\Theta}$"
]
},
{
"cell_type": "markdown",
"id": "b166dbab",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"The name \"theta join\" name comes from the notation above, where $\\Theta$ (theta) in the subscript represets the condition."
]
},
{
"cell_type": "markdown",
"id": "140b0b36",
"metadata": {},
"source": [
"$\\Pi_{e.first\\_name, e.last\\_name, b.first\\_name, b.last\\_name}(\\\\\\rho_e(employees) \\bowtie_{e.reports\\_to = b.employee\\_id} \\rho_b(employees))$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aeea7146",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT e.first_name, e.last_name, b.first_name, b.last_name\n",
"FROM employees e JOIN employees b ON e.reports_to = b.employee_id"
]
},
{
"cell_type": "markdown",
"id": "b9e45942",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"Outer join variants can also be used as theta joins."
]
},
{
"cell_type": "markdown",
"id": "7921f240",
"metadata": {},
"source": [
"## Generalized Projection"
]
},
{
"cell_type": "markdown",
"id": "afc54d91",
"metadata": {},
"source": [
"$\\Pi_{order\\_id, product\\_id, unit\\_price \\cdot quantity}(order\\_details)$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eeec87fb",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT order_id, product_id, unit_price * quantity FROM order_details"
]
},
{
"cell_type": "markdown",
"id": "b246fb9d",
"metadata": {},
"source": [
"_notes_\n",
"\n",
"The only difference from normal projection is that operations can be performed on values. There is not dedicated symbol for generalized projection."
]
},
{
"cell_type": "markdown",
"id": "bc804c3e",
"metadata": {},
"source": [
"## Aggregation"
]
},
{
"cell_type": "markdown",
"id": "d777b3f1",
"metadata": {},
"source": [
"$_{id}\\mathcal{G}_{id, sum(value)}(\\rho_{v(id, value)}(\\Pi_{order\\_id, unit\\_price \\cdot quantity}(order\\_details)))$\n",
"\n",
"The meaning of lower-index text of the aggregation oprator is as annotated below.\n",
"\n",
"$_{\\underset{grouping \\atop attribtues}{id}}\\mathcal{G}_{\\underset{expressions\\ list \\atop (with\\ aggregate\\ functions)}{id, sum(value)}}$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e06b5021",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT order_id AS id, SUM(unit_price * quantity)\n",
"FROM order_details\n",
"GROUP BY order_id"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}