{
"cells": [
{
"cell_type": "markdown",
"id": "87ba5784",
"metadata": {},
"source": [
"# Relational and other data models"
]
},
{
"cell_type": "markdown",
"id": "a1a43a92",
"metadata": {},
"source": [
"## Various data models\n",
"\n",
"That overlap anyway…"
]
},
{
"cell_type": "markdown",
"id": "fa9d0851",
"metadata": {},
"source": [
"### Relational data model\n",
"\n",
"- data as **sets** of **tuples**\n",
"- typically represented as tables\n",
"- schema defines types of columns of a table\n",
"- tables connected with foreign keys\n",
"- SQL\n",
"\n",
"_Table \"DB\\_grades\\_2025\\_winter\"_\n",
"\n",
"| student\\_id | class_type | grade |\n",
"|:--------------|:-----------|:------|\n",
"| 512345 | lab | 5 |\n",
"| 512371 | lab | 5 |\n",
"| 512600 | lab | 3 |\n",
"| 513293 | lab | 5 |\n",
"| 513898 | lab | 4 |\n",
"| 513929 | lab | 3 |\n",
"| 514168 | lab | 4.5 |\n",
"| 512345 | proj | 3 |\n",
"| 512371 | proj | 3.5 |\n",
"| 512600 | proj | 5 |\n",
"| 513293 | proj | 4.5 |\n",
"| ... |\n",
"\n",
"_Table \"DB\\_students\\_2025\\_winter\"_\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"| student\\_id | name | surname |\n",
"|:------------|:--------|:----------|\n",
"| 512345 | Andrew\t| Dowd |\n",
"| 512371 | Dominic\t| Ellison |\n",
"| 512600 | Adam\t| Underwood |\n",
"| 513293 | Ruth\t| Edmunds |\n",
"| 513898 | Rachel\t| Johnston |\n",
"| 513929 | Brian\t| Burgess |\n",
"| 514168 | Ryan\t| Carr |\n",
"| ... |\n"
]
},
{
"cell_type": "markdown",
"id": "0a13c285",
"metadata": {},
"source": [
"### From network to graph data model\n",
"- network — one of the initially popular data models\n",
"- 70s\n",
"- \"reborn\" as graph model\n",
"- https://dist.neo4j.com/wp-content/uploads/20180730163601/sophisticated-email-fraud-detection-graph-data-model.png\n",
"- https://neo4j.com/docs/getting-started/_images/user-ratings.svg\n",
"- https://neo4j.com/graphacademy/training-gdm-40/_images/ArrowTool.png\n",
"- efficient and more suitable for certain kinds of tasks"
]
},
{
"cell_type": "markdown",
"id": "b697c4de",
"metadata": {},
"source": [
"### Hierarchical data model\n",
"\n",
"- a tree structure; parent with possibly many children\n",
"- https://mariadb.com/docs/~gitbook/image?url=https%3A%2F%2F1321769154-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FWCInJQ9cmGjq1lsTG91E%252Fuploads%252Fgit-blob-4a8f2fe76e08e7bd4460a00cfd0eb93433bfafac%252Fhierarchical_model2.png%3Falt%3Dmedia&width=768&dpr=2&quality=100&sign=5b7aa4cb&sv=2\n",
"- 60s\n",
"- what about many-to-many relations? (e.g., an author might have written multiple books and a book can have more than one author)\n",
"- DNS"
]
},
{
"cell_type": "markdown",
"id": "6ee1f327",
"metadata": {},
"source": [
"### Key-value data store\n",
"\n",
"- the NoSQL movement\n",
"- https://i.sstatic.net/otGGc.jpg\n",
"- https://redis.io/wp-content/uploads/2020/06/key-value-data-stores-2-v2.png\n",
"- https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/images/key-value.png\n",
"- https://webimages.mongodb.com/_com_assets/cms/m4d81jiycbuj7vms3-image2.png?auto=format%252Ccompress\n",
"- also git"
]
},
{
"cell_type": "markdown",
"id": "6aafcdb2",
"metadata": {},
"source": [
"### Wide-column store\n",
"\n",
"- can be viewed as\n",
" - a 2-dimensional key-value database, or\n",
" - a tabular database with sparse data\n",
" - column families\n",
" - columns are \"cheap\"\n",
"- used with data in size of PBs\n",
"- distributed databases, model not agnostic of where data is stored"
]
},
{
"cell_type": "markdown",
"id": "89faf275",
"metadata": {},
"source": [
"### Document model\n",
"\n",
"- data stored as documents\n",
"- typically XML or JSON\n",
"- query languages suited for querying data from documents\n",
"- https://docs.basex.org/main/Graphical_User_Interface\n",
"- https://docs.couchdb.org/en/stable/api/database/find.html\n",
"- used by, e.g., https://www.npmjs.com\n",
"- also the LDAP protocol"
]
},
{
"cell_type": "markdown",
"id": "4e33d461",
"metadata": {},
"source": [
"### Object-oriented model\n",
"\n",
"- OOP concepts applied to databases\n",
"- minor significance by itself\n",
"- some support in various relational database systems\n",
"- object-relational mapping more popular"
]
},
{
"cell_type": "markdown",
"id": "dd7c830b",
"metadata": {},
"source": [
"## Relational model\n",
"\n",
"- Edgar F. Codd, IBM\n",
"- 1970\n",
"\n",
"### Terminology\n",
"\n",
"| relational model | SQL equivalent |\n",
"|:----------------:|:--------------:|\n",
"| relation | table |\n",
"| tuple | row |\n",
"| attribute | column |\n",
"\n",
"- atomic values\n",
"- database schema\n",
"- database instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e218a11",
"metadata": {},
"outputs": [],
"source": [
"![ -e ../Northwind-database-structure.jpg ] || wget -O ../Northwind-database-structure.jpg https://static.packt-cdn.com/products/9781782170907/graphics/0907EN_02_09.jpg\n",
"\n",
"#!sh -c 'eom ../Northwind-database-structure.jpg & disown'"
]
},
{
"cell_type": "markdown",
"id": "e6bc9982",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "672d3974",
"metadata": {},
"source": [
"## SQL — introduction\n",
"\n",
"- Donald D. Chamberlin and Raymond F. Boyce, IBM\n",
"- 70s\n",
"- renamed from \"SEQUEL\"\n",
"- backronym of \"Structured Query Language\"\n",
"- actually used for\n",
" - querying data,\n",
" - updating data, and\n",
" - defining (schema of) data\n",
"- relational databases **Ɛ>** SQL\n",
"- standards\n",
" - ANSI X3.135:1986\n",
" - ISO/IEC 9075:1987\n",
" - …\n",
" - ISO/IEC 9075:2023"
]
},
{
"cell_type": "markdown",
"id": "abce2e45",
"metadata": {},
"source": [
"### selected popular SQL-based relational\\* database management systems (**RDBMS**):\n",
"\n",
"||||\n",
"|:-|:-:|:-|\n",
"|Oracle Database|1981|proprietary|\n",
"|IBM Db2|1983|proprietary|\n",
"|MS SQL Server|1989|proprietary|\n",
"|PostgreSQL (aka Postgres)|1996|FLOSS|\n",
"|MySQL/MariaDB|1995/2009|FLOSS|\n",
"\n",
"also the SQLite RDBMS database engine (2000, FLOSS)\n",
"\n",
"\\* among other supported models…"
]
},
{
"cell_type": "markdown",
"id": "77e86678",
"metadata": {},
"source": [
"### The \"Northwind\" database"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b791bf77",
"metadata": {},
"outputs": [],
"source": [
"%load_ext sql\n",
"\n",
"![ -e ../northwind.db ] || wget -O ../northwind.db https://github.com/jpwhite3/northwind-SQLite3/raw/4f56e7f5906dfd23b25244c5bfe8fb5da6402efd/dist/northwind.db\n",
"\n",
"%sql sqlite:///../northwind.db"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4fabc82",
"metadata": {},
"outputs": [],
"source": [
"%config SqlMagic.style = '_DEPRECATED_DEFAULT'\n",
"\n",
"# https://stackoverflow.com/questions/41046955/formatting-sql-query-inside-an-ipython-jupyter-notebook#answer-54782158\n",
"from IPython.display import Javascript\n",
"display.display(Javascript('''\n",
"require(['notebook/js/codecell'], function(codecell) {\n",
" console.log(codecell);\n",
" codecell.CodeCell.options_default.highlight_modes['magic_sqlite'] = {'reg':[/^%%sql/]} ;\n",
" Jupyter.notebook.events.one('kernel_ready.Kernel', function(){\n",
" Jupyter.notebook.get_cells().map(function(cell){\n",
" if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;\n",
" });\n",
"});\n",
"'''))"
]
},
{
"cell_type": "markdown",
"id": "53bfb386",
"metadata": {},
"source": [
"### Basic data types\n",
"\n",
"#### CHARACTER\n",
"\n",
"\"CHAR\" for friends :)\n",
"\n",
"- fixed width\n",
"- 'wkosior\\@agh.edu.pl' → CHAR(22) → 'wkosior\\@agh.edu.pl '\n",
"- 'wkosior\\@student.agh.edu.pl' → CHAR(22) → 'wkosior\\@student.agh.ed'\n",
"\n",
"#### CHARACTER VARYING\n",
"\n",
"\"VARCHAR\" for friends ^^\n",
"\n",
"- 'wkosior\\@agh.edu.pl' → **VAR**CHAR(22) → 'wkosior\\@agh.edu.pl'\n",
"- 'wkosior\\@student.agh.edu.pl' → VARCHAR(22) → 'wkosior\\@student.agh.ed'\n",
"\n",
"#### DECIMAL(p,q)\n",
"\n",
"- 21345.52652 → DECIMAL(10, 4) → 21345.5265\n",
"- DECIMAL(10, 4) → up to 999,999.9999\n",
"- money…"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c482250f",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT CAST(345.353266 AS DECIMAL(12,4))"
]
},
{
"cell_type": "markdown",
"id": "94c4f62a",
"metadata": {},
"source": [
"#### Integer types\n",
"\n",
"- SMALLINT → -32,767 … 32,767\n",
"- INT → -2,147,483,648 … 2,147,483,647\n",
"- BIGINT → -263 … 263-1\n",
" - **non**standard\n",
" \n",
"#### Date/time types\n",
"\n",
"- DATE\n",
"- TIME\n",
"- DATETIME"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6484d045",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT DATETIME('now','localtime');"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e735648",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT DATE('2025-10-07', '+7 days');"
]
},
{
"cell_type": "markdown",
"id": "bdb365a3",
"metadata": {},
"source": [
"#### floating-point types\n",
"\n",
"- FLOAT\n",
"\n",
"#### BLOB\n",
"\n",
"- Should we store binary data in separate files or the DB?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bb3b9ecf",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT EmployeeID, LastName, FirstName FROM Employees;"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2692598f",
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM Customers where Region = 'Western Europe'\n",
"--SELECT COUNT(*) AS western_customers FROM Customers WHERE Region = 'Western Europe'"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}