{ "cells": [ { "cell_type": "markdown", "id": "87ba5784", "metadata": {}, "source": [ "# Relational and other data models" ] }, { "cell_type": "markdown", "id": "a1a43a92", "metadata": {}, "source": [ "## Various data models\n", "\n", "That overlap anyway…" ] }, { "cell_type": "markdown", "id": "fa9d0851", "metadata": {}, "source": [ "### Relational data model\n", "\n", "- data as **sets** of **tuples**\n", "- typically represented as tables\n", "- schema defines types of columns of a table\n", "- tables connected with foreign keys\n", "- SQL\n", "\n", "_Table \"DB\\_grades\\_2025\\_winter\"_\n", "\n", "| student\\_id | class_type | grade |\n", "|:--------------|:-----------|:------|\n", "| 512345 | lab | 5 |\n", "| 512371 | lab | 5 |\n", "| 512600 | lab | 3 |\n", "| 513293 | lab | 5 |\n", "| 513898 | lab | 4 |\n", "| 513929 | lab | 3 |\n", "| 514168 | lab | 4.5 |\n", "| 512345 | proj | 3 |\n", "| 512371 | proj | 3.5 |\n", "| 512600 | proj | 5 |\n", "| 513293 | proj | 4.5 |\n", "| ... |\n", "\n", "_Table \"DB\\_students\\_2025\\_winter\"_\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "| student\\_id | name | surname |\n", "|:------------|:--------|:----------|\n", "| 512345 | Andrew\t| Dowd |\n", "| 512371 | Dominic\t| Ellison |\n", "| 512600 | Adam\t| Underwood |\n", "| 513293 | Ruth\t| Edmunds |\n", "| 513898 | Rachel\t| Johnston |\n", "| 513929 | Brian\t| Burgess |\n", "| 514168 | Ryan\t| Carr |\n", "| ... |\n" ] }, { "cell_type": "markdown", "id": "0a13c285", "metadata": {}, "source": [ "### From network to graph data model\n", "- network — one of the initially popular data models\n", "- 70s\n", "- \"reborn\" as graph model\n", "- https://dist.neo4j.com/wp-content/uploads/20180730163601/sophisticated-email-fraud-detection-graph-data-model.png\n", "- https://neo4j.com/docs/getting-started/_images/user-ratings.svg\n", "- https://neo4j.com/graphacademy/training-gdm-40/_images/ArrowTool.png\n", "- efficient and more suitable for certain kinds of tasks" ] }, { "cell_type": "markdown", "id": "b697c4de", "metadata": {}, "source": [ "### Hierarchical data model\n", "\n", "- a tree structure; parent with possibly many children\n", "- https://mariadb.com/docs/~gitbook/image?url=https%3A%2F%2F1321769154-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FWCInJQ9cmGjq1lsTG91E%252Fuploads%252Fgit-blob-4a8f2fe76e08e7bd4460a00cfd0eb93433bfafac%252Fhierarchical_model2.png%3Falt%3Dmedia&width=768&dpr=2&quality=100&sign=5b7aa4cb&sv=2\n", "- 60s\n", "- what about many-to-many relations? (e.g., an author might have written multiple books and a book can have more than one author)\n", "- DNS" ] }, { "cell_type": "markdown", "id": "6ee1f327", "metadata": {}, "source": [ "### Key-value data store\n", "\n", "- the NoSQL movement\n", "- https://i.sstatic.net/otGGc.jpg\n", "- https://redis.io/wp-content/uploads/2020/06/key-value-data-stores-2-v2.png\n", "- https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/images/key-value.png\n", "- https://webimages.mongodb.com/_com_assets/cms/m4d81jiycbuj7vms3-image2.png?auto=format%252Ccompress\n", "- also git" ] }, { "cell_type": "markdown", "id": "6aafcdb2", "metadata": {}, "source": [ "### Wide-column store\n", "\n", "- can be viewed as\n", " - a 2-dimensional key-value database, or\n", " - a tabular database with sparse data\n", " - column families\n", " - columns are \"cheap\"\n", "- used with data in size of PBs\n", "- distributed databases, model not agnostic of where data is stored" ] }, { "cell_type": "markdown", "id": "89faf275", "metadata": {}, "source": [ "### Document model\n", "\n", "- data stored as documents\n", "- typically XML or JSON\n", "- query languages suited for querying data from documents\n", "- https://docs.basex.org/main/Graphical_User_Interface\n", "- https://docs.couchdb.org/en/stable/api/database/find.html\n", "- used by, e.g., https://www.npmjs.com\n", "- also the LDAP protocol" ] }, { "cell_type": "markdown", "id": "4e33d461", "metadata": {}, "source": [ "### Object-oriented model\n", "\n", "- OOP concepts applied to databases\n", "- minor significance by itself\n", "- some support in various relational database systems\n", "- object-relational mapping more popular" ] }, { "cell_type": "markdown", "id": "dd7c830b", "metadata": {}, "source": [ "## Relational model\n", "\n", "- Edgar F. Codd, IBM\n", "- 1970\n", "\n", "### Terminology\n", "\n", "| relational model | SQL equivalent |\n", "|:----------------:|:--------------:|\n", "| relation | table |\n", "| tuple | row |\n", "| attribute | column |\n", "\n", "- atomic values\n", "- database schema\n", "- database instance" ] }, { "cell_type": "code", "execution_count": null, "id": "1e218a11", "metadata": {}, "outputs": [], "source": [ "![ -e ../Northwind-database-structure.jpg ] || wget -O ../Northwind-database-structure.jpg https://static.packt-cdn.com/products/9781782170907/graphics/0907EN_02_09.jpg\n", "\n", "#!sh -c 'eom ../Northwind-database-structure.jpg & disown'" ] }, { "cell_type": "markdown", "id": "e6bc9982", "metadata": {}, "source": [ "![](../Northwind-database-structure.jpg)" ] }, { "cell_type": "markdown", "id": "672d3974", "metadata": {}, "source": [ "## SQL — introduction\n", "\n", "- Donald D. Chamberlin and Raymond F. Boyce, IBM\n", "- 70s\n", "- renamed from \"SEQUEL\"\n", "- backronym of \"Structured Query Language\"\n", "- actually used for\n", " - querying data,\n", " - updating data, and\n", " - defining (schema of) data\n", "- relational databases **Ɛ>** SQL\n", "- standards\n", " - ANSI X3.135:1986\n", " - ISO/IEC 9075:1987\n", " - …\n", " - ISO/IEC 9075:2023" ] }, { "cell_type": "markdown", "id": "abce2e45", "metadata": {}, "source": [ "### selected popular SQL-based relational\\* database management systems (**RDBMS**):\n", "\n", "||||\n", "|:-|:-:|:-|\n", "|Oracle Database|1981|proprietary|\n", "|IBM Db2|1983|proprietary|\n", "|MS SQL Server|1989|proprietary|\n", "|PostgreSQL (aka Postgres)|1996|FLOSS|\n", "|MySQL/MariaDB|1995/2009|FLOSS|\n", "\n", "also the SQLite RDBMS database engine (2000, FLOSS)\n", "\n", "\\* among other supported models…" ] }, { "cell_type": "markdown", "id": "77e86678", "metadata": {}, "source": [ "### The \"Northwind\" database" ] }, { "cell_type": "code", "execution_count": null, "id": "b791bf77", "metadata": {}, "outputs": [], "source": [ "%load_ext sql\n", "\n", "![ -e ../northwind.db ] || wget -O ../northwind.db https://github.com/jpwhite3/northwind-SQLite3/raw/4f56e7f5906dfd23b25244c5bfe8fb5da6402efd/dist/northwind.db\n", "\n", "%sql sqlite:///../northwind.db" ] }, { "cell_type": "code", "execution_count": null, "id": "c4fabc82", "metadata": {}, "outputs": [], "source": [ "%config SqlMagic.style = '_DEPRECATED_DEFAULT'\n", "\n", "# https://stackoverflow.com/questions/41046955/formatting-sql-query-inside-an-ipython-jupyter-notebook#answer-54782158\n", "from IPython.display import Javascript\n", "display.display(Javascript('''\n", "require(['notebook/js/codecell'], function(codecell) {\n", " console.log(codecell);\n", " codecell.CodeCell.options_default.highlight_modes['magic_sqlite'] = {'reg':[/^%%sql/]} ;\n", " Jupyter.notebook.events.one('kernel_ready.Kernel', function(){\n", " Jupyter.notebook.get_cells().map(function(cell){\n", " if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;\n", " });\n", "});\n", "'''))" ] }, { "cell_type": "markdown", "id": "53bfb386", "metadata": {}, "source": [ "### Basic data types\n", "\n", "#### CHARACTER\n", "\n", "\"CHAR\" for friends :)\n", "\n", "- fixed width\n", "- 'wkosior\\@agh.edu.pl' → CHAR(22) → 'wkosior\\@agh.edu.pl    '\n", "- 'wkosior\\@student.agh.edu.pl' → CHAR(22) → 'wkosior\\@student.agh.ed'\n", "\n", "#### CHARACTER VARYING\n", "\n", "\"VARCHAR\" for friends ^^\n", "\n", "- 'wkosior\\@agh.edu.pl' → **VAR**CHAR(22) → 'wkosior\\@agh.edu.pl'\n", "- 'wkosior\\@student.agh.edu.pl' → VARCHAR(22) → 'wkosior\\@student.agh.ed'\n", "\n", "#### DECIMAL(p,q)\n", "\n", "- 21345.52652 → DECIMAL(10, 4) → 21345.5265\n", "- DECIMAL(10, 4) → up to 999,999.9999\n", "- money…" ] }, { "cell_type": "code", "execution_count": null, "id": "c482250f", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT CAST(345.353266 AS DECIMAL(12,4))" ] }, { "cell_type": "markdown", "id": "94c4f62a", "metadata": {}, "source": [ "#### Integer types\n", "\n", "- SMALLINT → -32,767 … 32,767\n", "- INT → -2,147,483,648 … 2,147,483,647\n", "- BIGINT → -263 … 263-1\n", " - **non**standard\n", " \n", "#### Date/time types\n", "\n", "- DATE\n", "- TIME\n", "- DATETIME" ] }, { "cell_type": "code", "execution_count": null, "id": "6484d045", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT DATETIME('now','localtime');" ] }, { "cell_type": "code", "execution_count": null, "id": "0e735648", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT DATE('2025-10-07', '+7 days');" ] }, { "cell_type": "markdown", "id": "bdb365a3", "metadata": {}, "source": [ "#### floating-point types\n", "\n", "- FLOAT\n", "\n", "#### BLOB\n", "\n", "- Should we store binary data in separate files or the DB?" ] }, { "cell_type": "code", "execution_count": null, "id": "bb3b9ecf", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT EmployeeID, LastName, FirstName FROM Employees;" ] }, { "cell_type": "code", "execution_count": null, "id": "2692598f", "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT * FROM Customers where Region = 'Western Europe'\n", "--SELECT COUNT(*) AS western_customers FROM Customers WHERE Region = 'Western Europe'" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }