aboutsummaryrefslogtreecommitdiff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb0ec281",
   "metadata": {},
   "outputs": [],
   "source": [
    "import agh_db_lectures\n",
    "agh_db_lectures.prepare_notebook_for_sql()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87ba5784",
   "metadata": {},
   "source": [
    "# Relational and other data models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1a43a92",
   "metadata": {},
   "source": [
    "## Various data models\n",
    "\n",
    "That overlap anyway…"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa9d0851",
   "metadata": {},
   "source": [
    "### Relational data model\n",
    "\n",
    "- data as **sets** of **tuples**\n",
    "- typically represented as tables\n",
    "- schema defines types of columns of a table\n",
    "- tables connected with foreign keys\n",
    "- SQL\n",
    "\n",
    "_Table `DB_grades_2025_winter`_\n",
    "\n",
    "| student\\_id   | class_type | grade |\n",
    "|:--------------|:-----------|:------|\n",
    "| 512345        | lab        | 5     |\n",
    "| 512371        | lab        | 5     |\n",
    "| 512600        | lab        | 3     |\n",
    "| 513293        | lab        | 5     |\n",
    "| 513898        | lab        | 4     |\n",
    "| 513929        | lab        | 3     |\n",
    "| 514168        | lab        | 4.5   |\n",
    "| 512345        | proj       | 3     |\n",
    "| 512371        | proj       | 3.5   |\n",
    "| 512600        | proj       | 5     |\n",
    "| 513293        | proj       | 4.5   |\n",
    "|                  ...               |\n",
    "\n",
    "_Table `DB_students_2025_winter`_\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "| student\\_id | name    | surname   |\n",
    "|:------------|:--------|:----------|\n",
    "| 512345      | Andrew\t| Dowd      |\n",
    "| 512371      | Dominic\t| Ellison   |\n",
    "| 512600      | Adam\t| Underwood |\n",
    "| 513293      | Ruth\t| Edmunds   |\n",
    "| 513898      | Rachel\t| Johnston  |\n",
    "| 513929      | Brian\t| Burgess   |\n",
    "| 514168      | Ryan\t| Carr      |\n",
    "| ...                               |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a13c285",
   "metadata": {},
   "source": [
    "### From network to graph data model\n",
    "- network — one of the initially popular data models\n",
    "- 70s\n",
    "- \"reborn\" as graph model\n",
    "- https://dist.neo4j.com/wp-content/uploads/20180730163601/sophisticated-email-fraud-detection-graph-data-model.png\n",
    "- https://neo4j.com/docs/getting-started/_images/user-ratings.svg\n",
    "- https://neo4j.com/graphacademy/training-gdm-40/_images/ArrowTool.png\n",
    "- efficient and more suitable for certain kinds of tasks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b697c4de",
   "metadata": {},
   "source": [
    "### Hierarchical data model\n",
    "\n",
    "- a tree structure; parent with possibly many children\n",
    "- https://mariadb.com/docs/~gitbook/image?url=https%3A%2F%2F1321769154-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FWCInJQ9cmGjq1lsTG91E%252Fuploads%252Fgit-blob-4a8f2fe76e08e7bd4460a00cfd0eb93433bfafac%252Fhierarchical_model2.png%3Falt%3Dmedia&width=768&dpr=2&quality=100&sign=5b7aa4cb&sv=2\n",
    "- 60s\n",
    "- what about many-to-many relations? (e.g., an author might have written multiple books and a book can have more than one author)\n",
    "- DNS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ee1f327",
   "metadata": {},
   "source": [
    "### Key-value data store\n",
    "\n",
    "- the NoSQL movement\n",
    "- https://i.sstatic.net/otGGc.jpg\n",
    "- https://redis.io/wp-content/uploads/2020/06/key-value-data-stores-2-v2.png\n",
    "- https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/images/key-value.png\n",
    "- https://webimages.mongodb.com/_com_assets/cms/m4d81jiycbuj7vms3-image2.png?auto=format%252Ccompress\n",
    "- also git"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aafcdb2",
   "metadata": {},
   "source": [
    "### Wide-column store\n",
    "\n",
    "- can be viewed as\n",
    "  - a 2-dimensional key-value database, or\n",
    "  - a tabular database with sparse data\n",
    "    - column families\n",
    "    - columns are \"cheap\"\n",
    "- used with data in size of PBs\n",
    "- distributed databases, model not agnostic of where data is stored"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89faf275",
   "metadata": {},
   "source": [
    "### Document model\n",
    "\n",
    "- data stored as documents\n",
    "- typically XML or JSON\n",
    "- query languages suited for querying data from documents\n",
    "- https://docs.basex.org/main/Graphical_User_Interface\n",
    "- https://docs.couchdb.org/en/stable/api/database/find.html\n",
    "- used by, e.g., https://www.npmjs.com\n",
    "- also the LDAP protocol"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e33d461",
   "metadata": {},
   "source": [
    "### Object-oriented model\n",
    "\n",
    "- OOP concepts applied to databases\n",
    "- minor significance by itself\n",
    "- some support in various relational database systems\n",
    "- object-relational mapping more popular"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd7c830b",
   "metadata": {},
   "source": [
    "## Relational model\n",
    "\n",
    "- Edgar F. Codd, IBM\n",
    "- 1970\n",
    "\n",
    "### Terminology\n",
    "\n",
    "| relational model | SQL equivalent |\n",
    "|:----------------:|:--------------:|\n",
    "| relation         | table          |\n",
    "| tuple            | row            |\n",
    "| attribute        | column         |\n",
    "\n",
    "- atomic values\n",
    "- database schema\n",
    "- database instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e218a11",
   "metadata": {},
   "outputs": [],
   "source": [
    "agh_db_lectures.nw_diagram.download_display()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "672d3974",
   "metadata": {},
   "source": [
    "## SQL — introduction\n",
    "\n",
    "- Donald D. Chamberlin and Raymond F. Boyce, IBM\n",
    "- 70s\n",
    "- renamed from \"SEQUEL\"\n",
    "- backronym of \"Structured Query Language\"\n",
    "- actually used for\n",
    "  - querying data,\n",
    "  - updating data, and\n",
    "  - defining (schema of) data\n",
    "- relational databases **Ɛ>** SQL\n",
    "- standards\n",
    "  - ANSI X3.135:1986\n",
    "  - ISO/IEC 9075:1987\n",
    "  - …\n",
    "  - ISO/IEC 9075:2023"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abce2e45",
   "metadata": {},
   "source": [
    "### selected popular SQL-based relational\\* database management systems (**RDBMS**):\n",
    "\n",
    "||||\n",
    "|:-|:-:|:-|\n",
    "|Oracle Database|1981|proprietary|\n",
    "|IBM Db2|1983|proprietary|\n",
    "|MS SQL Server|1989|proprietary|\n",
    "|PostgreSQL (aka Postgres)|1996|FLOSS|\n",
    "|MySQL/MariaDB|1995/2009|FLOSS|\n",
    "\n",
    "also the SQLite <strike>RDBMS</strike> database engine (2000, FLOSS)\n",
    "\n",
    "<sup>\\* among other supported models…</sup>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53bfb386",
   "metadata": {},
   "source": [
    "### Basic data types\n",
    "\n",
    "#### CHARACTER\n",
    "\n",
    "\"CHAR\" for friends :)\n",
    "\n",
    "- fixed width\n",
    "- 'wkosior\\@agh.edu.pl' → CHAR(22) → 'wkosior\\@agh.edu.pl&nbsp;&nbsp;&nbsp;&nbsp;'\n",
    "- 'wkosior\\@student.agh.edu.pl' → CHAR(22) → 'wkosior\\@student.agh.ed'\n",
    "\n",
    "#### CHARACTER VARYING\n",
    "\n",
    "\"VARCHAR\" for friends ^^\n",
    "\n",
    "- 'wkosior\\@agh.edu.pl' → **VAR**CHAR(22) → 'wkosior\\@agh.edu.pl'\n",
    "- 'wkosior\\@student.agh.edu.pl' → VARCHAR(22) → 'wkosior\\@student.agh.ed'\n",
    "\n",
    "#### DECIMAL(p,q)\n",
    "\n",
    "- 21345.52652 → DECIMAL(10, 4) → 21345.5265\n",
    "- DECIMAL(10, 4) → up to 999,999.9999\n",
    "- money…"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94c4f62a",
   "metadata": {},
   "source": [
    "#### Integer types\n",
    "\n",
    "- SMALLINT → -32,767 … 32,767\n",
    "- INT → -2,147,483,648 … 2,147,483,647\n",
    "- BIGINT → -2<sup>63</sup> … 2<sup>63</sup>-1\n",
    "  - **non**standard\n",
    "  \n",
    "#### Date/time types\n",
    "\n",
    "- DATE\n",
    "- TIME\n",
    "- DATETIME"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdb365a3",
   "metadata": {},
   "source": [
    "#### floating-point types\n",
    "\n",
    "- FLOAT\n",
    "\n",
    "#### BLOB\n",
    "\n",
    "- Should we store binary data in separate files or the DB?"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}