{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "4e054359", "metadata": {}, "outputs": [], "source": [ "import agh_db_lectures\n", "agh_db_lectures.prepare_notebook_for_sql()" ] }, { "cell_type": "markdown", "id": "ccff71a5", "metadata": {}, "source": [ "# Relational Algebra" ] }, { "cell_type": "markdown", "id": "b72b893e", "metadata": {}, "source": [ "The following **formal** languages exist for expressing queries.\n", "\n", "- tuple relational calculus\n", "- domain relational calculus\n", "- **relational algebra**" ] }, { "cell_type": "markdown", "id": "3e7139b8", "metadata": {}, "source": [ "_notes_\n", "\n", "We mean queries on relational databases, of course.\n", "\n", "SQL is based on relational algebra.\n", "\n", "Sometimes a categorization into declarative (aka nonprocedural) languages (tuple and domain relational calculi) and procedural languges (SQL, relational algebra) is made." ] }, { "cell_type": "markdown", "id": "439b940a", "metadata": {}, "source": [ "## Relational Algebra Operators" ] }, { "cell_type": "markdown", "id": "bfaad0ad", "metadata": {}, "source": [ "- selection — $\\sigma$\n", "- projection – $\\Pi$\n", "- union — $\\cup$\n", "- set difference — $-$\n", "- cartesian product — $\\times$\n", "- renaming — $\\rho$\n", "\n", "- set intersection — $\\cap$\n", "- (natural) join — $\\bowtie$ (and \"outer\" variants $⟕$, $⟖$, and $⟗$)\n", "- assignment — $\\leftarrow$\n", "\n", "and extended relational algebra operators\n", "\n", "- generalized projection — $\\Pi$\n", "- aggregation – $\\mathcal{G}$ (or $\\gamma$)" ] }, { "cell_type": "markdown", "id": "08012f1f", "metadata": {}, "source": [ "_notes_\n", "\n", "The operators from intersection onwards are sometimes labeled \"additional\".\n", "\n", "The operators correspond to operations that we perform in SQL.\n", "\n", "They **operate on relations**, i.e., the arguments of the operators are relations (sets of tuples) and the result of each operation is a relation (a set of tuples).\n", "\n", "In its basic form, the algebra considers relations as mathematical sets (without duplicates)." ] }, { "cell_type": "code", "execution_count": null, "id": "bb64cdc4", "metadata": {}, "outputs": [], "source": [ "%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind\n", "%config SqlMagic.feedback = False" ] }, { "cell_type": "code", "execution_count": null, "id": "acfd5a4d", "metadata": {}, "outputs": [], "source": [ "agh_db_lectures.nw_diagram.download_open()" ] }, { "cell_type": "markdown", "id": "c2faffdc", "metadata": {}, "source": [ "_notes_\n", "\n", "We shall use the Northwind db as example.\n", "\n", "We are going to preset relational algebra epressions together with their SQL equivalents." ] }, { "cell_type": "markdown", "id": "86d4ddff", "metadata": {}, "source": [ "## Selection" ] }, { "cell_type": "markdown", "id": "79c78b7e", "metadata": {}, "source": [ "$\\sigma_{units\\_in\\_stock=0}(products)$" ] }, { "cell_type": "code", "execution_count": null, "id": "55e68362", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT * FROM products WHERE units_in_stock = 0" ] }, { "cell_type": "markdown", "id": "86f5bdda", "metadata": {}, "source": [ "Logical operators can be used in the condition of a selection.\n", "\n", "$\\sigma_{units\\_in\\_stock=0 \\land units\\_on\\_order > 0}(products)$" ] }, { "cell_type": "code", "execution_count": null, "id": "f0559ba2", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT * FROM products WHERE units_in_stock = 0 AND units_on_order > 0" ] }, { "cell_type": "markdown", "id": "c78224a2", "metadata": {}, "source": [ "## Projection" ] }, { "cell_type": "markdown", "id": "46286a9b", "metadata": {}, "source": [ "$\\Pi_{category\\_id, category\\_name}(categories)$" ] }, { "cell_type": "code", "execution_count": null, "id": "f0721688", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT category_id, category_name FROM categories" ] }, { "cell_type": "markdown", "id": "530f3e08", "metadata": {}, "source": [ "The operators can be combined (one can act on the result of another).\n", "\n", "$\\Pi_{product\\_name}(\\sigma_{units\\_in\\_stock=0}(products))$" ] }, { "cell_type": "code", "execution_count": null, "id": "06f3cc54", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT product_name FROM products WHERE units_in_stock = 0" ] }, { "cell_type": "markdown", "id": "a05a83a6", "metadata": {}, "source": [ "_notes_\n", "\n", "Note that to fully match the formal meaning of relational algebra operators, we'd need to remove duplicates (e.g., with `SELECT DISTINCT`) in all SQL queries. In our examples we only use it where actually necessary to avoid duplicates when working with our database instance." ] }, { "cell_type": "markdown", "id": "58cc2503", "metadata": {}, "source": [ "## Union" ] }, { "cell_type": "markdown", "id": "a45ac8bf", "metadata": {}, "source": [ "$\\sigma_{units\\_in\\_stock=0}(products) \\cup \\sigma_{discontinued=1}(products)$\n", "\n", "Input relations must have the same number of attributes and attributes must have compatible domains." ] }, { "cell_type": "code", "execution_count": null, "id": "fd358205", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT * FROM products WHERE units_in_stock = 0 OR discontinued = 1" ] }, { "cell_type": "markdown", "id": "c0a4eb06", "metadata": {}, "source": [ "We could've also written the SQL like this.\n", "\n", "```sql\n", "SELECT * FROM products WHERE units_in_stock = 0\n", "UNION\n", "SELECT * FROM products WHERE discontinued = 1\n", "```" ] }, { "cell_type": "markdown", "id": "68158f63", "metadata": {}, "source": [ "## Difference" ] }, { "cell_type": "markdown", "id": "8806fad4", "metadata": {}, "source": [ "$\\sigma_{units\\_in\\_stock=0}(products) - \\sigma_{discontinued=1}(products)$" ] }, { "cell_type": "code", "execution_count": null, "id": "61f37d8b", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT * FROM products WHERE units_in_stock = 0 AND NOT discontinued = 1" ] }, { "cell_type": "markdown", "id": "45003f18", "metadata": {}, "source": [ "We could've also written the SQL using the `EXCEPT` keyword." ] }, { "cell_type": "markdown", "id": "c2effdb9", "metadata": {}, "source": [ "## Cartesian Product" ] }, { "cell_type": "markdown", "id": "a2226249", "metadata": {}, "source": [ "$\\Pi_{first\\_name, last\\_name, category\\_name}(employees \\times categories)$" ] }, { "cell_type": "code", "execution_count": null, "id": "0be27021", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT first_name, last_name, category_name FROM employees, categories" ] }, { "cell_type": "markdown", "id": "7a66c12b", "metadata": {}, "source": [ "## Renaming" ] }, { "cell_type": "markdown", "id": "ffb122f5", "metadata": {}, "source": [ "$\\rho_{contacts(customer\\_name, customer\\_contact)}(\\Pi_{company\\_name, phone}(customers))$" ] }, { "cell_type": "code", "execution_count": null, "id": "5bd0b98d", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT company_name AS customer_name, phone AS customer_contact FROM customers" ] }, { "cell_type": "markdown", "id": "9233b24e", "metadata": {}, "source": [ "$\\Pi_{a.first\\_name, a.last\\_name, b.first\\_name, b.last\\_name}(\\rho_a(employees) \\times \\rho_b(employees))$" ] }, { "cell_type": "code", "execution_count": null, "id": "4de3c50a", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT a.first_name, a.last_name, b.first_name, b.last_name\n", "FROM employees a, employees b" ] }, { "cell_type": "markdown", "id": "78c0997c", "metadata": {}, "source": [ "## Constant Relations" ] }, { "cell_type": "markdown", "id": "d70864e0", "metadata": {}, "source": [ "$\\rho_{city(name, latitude, longitude)}(\\{({\\rm Cracov}, 50.06, 19.93), ({\\rm Corinth}, 37.94, 22.93), ({\\rm Boston}, 42.36, -71.06)\\})$" ] }, { "cell_type": "code", "execution_count": null, "id": "58d7160e", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT *\n", "FROM (VALUES ('Cracov', 50.06, 19.93),\n", " ('Corinth', 37.94, 22.93),\n", " ('Boston', 42.36, -71.06)) AS city (name, latitude, longitude)" ] }, { "cell_type": "markdown", "id": "05818736", "metadata": {}, "source": [ "## Intersection" ] }, { "cell_type": "markdown", "id": "078f87b1", "metadata": {}, "source": [ "Could also be expressed using a composition of set difference operations.\n", "\n", "$\\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products)) \\cap \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$" ] }, { "cell_type": "code", "execution_count": null, "id": "42017250", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT DISTINCT supplier_id FROM products WHERE category_id = 1\n", "INTERSECT\n", "SELECT supplier_id FROM suppliers WHERE country = 'USA'" ] }, { "cell_type": "markdown", "id": "1b238ba7", "metadata": {}, "source": [ "## Assignment" ] }, { "cell_type": "markdown", "id": "65094d7d", "metadata": {}, "source": [ "$cat1\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{category\\_id=1}(products))$\n", "\n", "$usa\\_suppliers \\leftarrow \\Pi_{supplier\\_id}(\\sigma_{country={\\rm USA}}(suppliers))$\n", "\n", "$cat1\\_suppliers \\cap usa\\_suppliers$" ] }, { "cell_type": "code", "execution_count": null, "id": "2f3c67dc", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "DROP TABLE IF EXISTS cat1_suppliers;\n", "CREATE TABLE cat1_suppliers AS\n", "SELECT DISTINCT supplier_id FROM products WHERE category_id = 1;\n", "\n", "DROP TABLE IF EXISTS usa_suppliers;\n", "CREATE TABLE usa_suppliers AS\n", "SELECT supplier_id FROM suppliers WHERE country = 'USA';\n", "\n", "SELECT * FROM cat1_suppliers INTERSECT SELECT * FROM usa_suppliers" ] }, { "cell_type": "markdown", "id": "bbe1ff8c", "metadata": {}, "source": [ "### Insertion, Modification, Deletion" ] }, { "cell_type": "markdown", "id": "ac7ff10f", "metadata": {}, "source": [ "```sql\n", "DELETE FROM products WHERE discontinued = 1 AND units_in_stock = 0\n", "```" ] }, { "cell_type": "markdown", "id": "0ad608cf", "metadata": {}, "source": [ "This can be expressed using the operators we already know." ] }, { "cell_type": "markdown", "id": "f6a8545e", "metadata": {}, "source": [ "_notes_\n", "\n", "Any ideas?" ] }, { "cell_type": "markdown", "id": "2263914e", "metadata": {}, "source": [ "$products \\leftarrow products - \\sigma_{discontinued=1 \\land units\\_in\\_stock=0}(products)$" ] }, { "cell_type": "markdown", "id": "8a20bd4b", "metadata": {}, "source": [ "## Natural Join" ] }, { "cell_type": "markdown", "id": "6a4818aa", "metadata": {}, "source": [ "Could also be expressed as a composition of cartesian product, selection and projection.\n", "\n", "$\\Pi_{category\\_id, category\\_name, product\\_name}(categories \\bowtie products)$" ] }, { "cell_type": "code", "execution_count": null, "id": "9c0ce2c5", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT category_id, category_name, product_name\n", "FROM categories NATURAL JOIN products" ] }, { "cell_type": "markdown", "id": "67d1f770", "metadata": {}, "source": [ "### Outer Joins" ] }, { "cell_type": "markdown", "id": "9c13c17c", "metadata": {}, "source": [ "$\\Pi_{company\\_name, order\\_id, order\\_date}(customers ⟕ orders)$" ] }, { "cell_type": "code", "execution_count": null, "id": "67e3e476", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT company_name, order_id, order_date\n", "FROM customers NATURAL LEFT JOIN orders" ] }, { "cell_type": "markdown", "id": "260ef853", "metadata": {}, "source": [ "_notes_\n", "\n", "There are also symbols $⟖$ and $⟗$ that are used for natural left outer join and natural full outer join, respectively. " ] }, { "cell_type": "markdown", "id": "7a81b9de", "metadata": {}, "source": [ "## Theta Join" ] }, { "cell_type": "markdown", "id": "c1e17ced", "metadata": {}, "source": [ "This is a join with arbitrary condition.\n", "\n", "$\\bowtie_{\\Theta}$" ] }, { "cell_type": "markdown", "id": "b166dbab", "metadata": {}, "source": [ "_notes_\n", "\n", "The name \"theta join\" name comes from the notation above, where $\\Theta$ (theta) in the subscript represets the condition." ] }, { "cell_type": "markdown", "id": "140b0b36", "metadata": {}, "source": [ "$\\Pi_{e.first\\_name, e.last\\_name, b.first\\_name, b.last\\_name}(\\\\\\rho_e(employees) \\bowtie_{e.reports\\_to = b.employee\\_id} \\rho_b(employees))$" ] }, { "cell_type": "code", "execution_count": null, "id": "aeea7146", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT e.first_name, e.last_name, b.first_name, b.last_name\n", "FROM employees e JOIN employees b ON e.reports_to = b.employee_id" ] }, { "cell_type": "markdown", "id": "b9e45942", "metadata": {}, "source": [ "_notes_\n", "\n", "Outer join variants can also be used as theta joins." ] }, { "cell_type": "markdown", "id": "7921f240", "metadata": {}, "source": [ "## Generalized Projection" ] }, { "cell_type": "markdown", "id": "afc54d91", "metadata": {}, "source": [ "$\\Pi_{order\\_id, product\\_id, unit\\_price \\cdot quantity}(order\\_details)$" ] }, { "cell_type": "code", "execution_count": null, "id": "eeec87fb", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT order_id, product_id, unit_price * quantity FROM order_details" ] }, { "cell_type": "markdown", "id": "b246fb9d", "metadata": {}, "source": [ "_notes_\n", "\n", "The only difference from normal projection is that operations can be performed on values. There is not dedicated symbol for generalized projection." ] }, { "cell_type": "markdown", "id": "bc804c3e", "metadata": {}, "source": [ "## Aggregation" ] }, { "cell_type": "markdown", "id": "d777b3f1", "metadata": {}, "source": [ "$_{id}\\mathcal{G}_{id, sum(value)}(\\rho_{v(id, value)}(\\Pi_{order\\_id, unit\\_price \\cdot quantity}(order\\_details)))$\n", "\n", "The meaning of lower-index text of the aggregation oprator is as annotated below.\n", "\n", "$_{\\underset{grouping \\atop attribtues}{id}}\\mathcal{G}_{\\underset{expressions\\ list \\atop (with\\ aggregate\\ functions)}{id, sum(value)}}$" ] }, { "cell_type": "code", "execution_count": null, "id": "e06b5021", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT order_id AS id, SUM(unit_price * quantity)\n", "FROM order_details\n", "GROUP BY order_id" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }