aboutsummaryrefslogtreecommitdiff
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a21adafc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import agh_db_lectures\n",
    "agh_db_lectures.prepare_notebook_for_sql()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "511c9ea5",
   "metadata": {},
   "source": [
    "# Simple SQL queries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6391087",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- SQL consists of\n",
    "  - a Data Definition Language,\n",
    "  - a Data Manipulation Language, and\n",
    "  - a Data Query Language, which we begin covering here.\n",
    "- E.g., \"How to obtain the desired information from the DB\"."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93389e26",
   "metadata": {},
   "source": [
    "## The Northwind database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40b6824b",
   "metadata": {},
   "outputs": [],
   "source": [
    "agh_db_lectures.nw_diagram.download_display()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73dc5561",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Open the diagram in an image viewer for more convenience.\n",
    "agh_db_lectures.nw_diagram.download_open()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67d63ccb",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- For now, we shall assume the existence of a DB populated with data.\n",
    "- Northwind: a learning (_toy_) database of a warehouse.\n",
    "- Made by Microsoft but available under a free license (you can use, modify and share it further)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a146ef1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Connect to the (locally-hosted) database.\n",
    "%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind\n",
    "# %sql mysql:///agh_it_northwind?unix_socket=/var/run/mysql/mysql.sock"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "052b68af",
   "metadata": {},
   "outputs": [],
   "source": [
    "agh_db_lectures.download_restore_nw_postgres_dump()\n",
    "# agh_db_lectures.download_restore_nw_mariadb_dump()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "514cc38e",
   "metadata": {},
   "source": [
    "## The \"SELECT\" clause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0655d75f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "-- One of the simplest possible queries in SQL.\n",
    "SELECT 1;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a1aafb8",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- The result of an SQL query is always a table.\n",
    "- It can be minimal or even empty, though.\n",
    "- Comments start with double minus sign (\"--\").\n",
    "- The amount of white-space doesn't matter…\n",
    "  - …except where at least a single white-space character is needed to separate words.\n",
    "- A query can be terminated with a semicolon (\";\").\n",
    "  - It is optional in many cases but necessary in some cases.\n",
    "  - Examples: a script with many SQL commands, an SQL REPL.\n",
    "- More:\n",
    "  - `SELECT 'This is a string.'`, and\n",
    "  - `SELECT 43.5`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dceb1ebc",
   "metadata": {},
   "source": [
    "### The \"FROM\" clause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d376dcf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT -- FILL ME\n",
    "FROM products"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b56e951a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Theodore, please pick 3 attributes from the `products` table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da4ddfd9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM PRODUCTS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51c33a98",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- The asterisk (\"*\") allows us to select all columns of a table.\n",
    "- Case also does not matter in table names.\n",
    "- Again, SQL is free-form, we can apply various code formatting conventions, there is no \"one to rule them all\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e47b2338",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT product_name, unit_price * units_in_stock\n",
    "FROM PRODUCTS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87e20cce",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Arithmetic operations (and string ops, etc.) can be used with table column names.\n",
    "- The syntax `unit_price * units_in_stock AS value_in_stock` can be used to rename the column.\n",
    "- The `AS` can usually be omitted.\n",
    "- Theodore, what does double quote character («\"») do in your favorite programming language?\n",
    "- A column name that contains white-space or is the same as language keyword can be wrapped in doube quotes, `unit_price * units_in_stock AS \"SELECT\" -- <- nonsensical but valid`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73f81ca8",
   "metadata": {},
   "source": [
    "### The \"WHERE\"  clause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2f14419",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT product_name, quantity_per_unit, unit_price\n",
    "FROM products\n",
    "WHERE unit_price < 10"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b518da46",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- `WHERE` limits result rows.\n",
    "- Comparison operators `<`, `>`, `<=`, `>=`, `=`, and `<>`.\n",
    "- Logical operators `AND`, and `OR`.\n",
    "- Column used in `WHERE` need not appear in "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca6c0d70",
   "metadata": {},
   "source": [
    "#### The \"LIKE\" operator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fccef75e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT product_name, quantity_per_unit, unit_price\n",
    "FROM products\n",
    "WHERE quantity_per_unit LIKE '%bottle%'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b85484c7",
   "metadata": {},
   "source": [
    "#### Regular expression test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "617ed062",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT product_name, quantity_per_unit, unit_price\n",
    "FROM products\n",
    "WHERE quantity_per_unit ~ 'box(es)?$'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ffa401a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Using `$` we excluded the rows that can be viewed with `quantity_per_unit ~ 'box(es)? '`.\n",
    "- Additionally, strings can also be compared\n",
    "  - for equality with `<>` and `=`, and\n",
    "  - for lexicographic order with `<` & friends."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54ab8145",
   "metadata": {},
   "source": [
    "### Set operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aebe714f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT supplier_id\n",
    "FROM products\n",
    "WHERE quantity_per_unit LIKE '%can%'\n",
    "\n",
    "-- SELECT supplier_id\n",
    "-- FROM suppliers\n",
    "-- WHERE country = 'Germany'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd9957ed",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Suppliers that sell canned products.\n",
    "- Suppliers from Germany.\n",
    "- The operations\n",
    "  - `UNION`,\n",
    "  - `INTERSECT`,\n",
    "  - `EXCEPT`.\n",
    "- Columns with types need to match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08b57b1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT supplier_id from suppliers where country = 'Germany'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b338004",
   "metadata": {},
   "source": [
    "## The \"DISTINCT\" keyword"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9192b00c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "\n",
    "SELECT country\n",
    "FROM suppliers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1e59c68",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Change `SELECT` → `SELECT DISTINCT`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d656206",
   "metadata": {},
   "source": [
    "### Different value types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cd1ec6a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT 'This is a string.'\n",
    "-- select 43.5\n",
    "-- SeLeCt CuRrEnT_dAtE\n",
    "-- SELECT NOW()\n",
    "-- SELECT TRUE"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d96722c",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- We can experiments with different datatypes like\n",
    "  - floating-point numbers,\n",
    "  - `SELECT DATE '2025-10-07'` — dates,\n",
    "  - `SELECT TIME '12:00'` — times,\n",
    "    - also with time zone (`SELECT TIME WITH TIME ZONE '12:00+02:00'`, this is a different type in standard SQL),\n",
    "  - `SELECT TIMESTAMP '2025-10-14 12:06:49.599321'` — timestamps,\n",
    "     - also with time zone, analogously to times,\n",
    "  - `SELECT INTERVAL '1-2 3 4:05:06'` – time intervals, and\n",
    "  - `FALSE` — booleans.\n",
    "- Even authors of DBMSes say the date/time types are not well designed.\n",
    "- Character case does not matter in SQL keywords."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aeccfa26",
   "metadata": {},
   "source": [
    "### Relevant operators and builtin functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d64e30c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT 'This is ' || 'a concatenated ' || 'string.'\n",
    "-- SELECT ROUND(4.5)\n",
    "-- SELECT NOW() - INTERVAL '7 DAYS'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9e61b9e",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- What does the pipe character (\"|\") do in your favorite programming language?\n",
    "- SQL features some _builtin functions_ for certain tasks.\n",
    "  - `SELECT SQRT(4*4 + 3*3)`\n",
    "  - `SELECT POW(POW(5, 2) - POW(4, 2), 0.5)`\n",
    "- Rounding:\n",
    "  - `ROUND()`,\n",
    "  - `FLOOR()`, and\n",
    "  - `CEIING()` (or just `CEIL()`.\n",
    "- Timestamp to date, etc.:\n",
    "  - `SELECT DATE (NOW() - INTERVAL '7 DAYS')`\n",
    "- `NOT`\n",
    "- `SELECT DATE('now') BETWEEN '2025-10-01' AND '2025-11-01'`\n",
    "  - Also `NOT BETWEEN`.\n",
    "  - Inclusive."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7402c992",
   "metadata": {},
   "source": [
    "### The \"IN\" operator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5659301b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT company_name, country\n",
    "FROM customers\n",
    "WHERE country = 'Poland' OR country = 'Germany' OR country = 'France'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df5a1027",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Change the condition to `country IN ('Poland', 'Germany', 'France')`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be5d5fde",
   "metadata": {},
   "source": [
    "### The \"CASE\" form"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19678724",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT order_id,\n",
    "       shipped_date,\n",
    "       shipped_date - order_date AS time_to_shipped\n",
    "--       CASE shipped_date - order_date\n",
    "--           WHEN 1 THEN '1 day'\n",
    "--           WHEN 7 THEN '1 week'\n",
    "--           ELSE shipped_date - order_date || ' days'\n",
    "--       END AS time_to_shipped\n",
    "FROM orders"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e73d1ac",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- There are 2 types of `CASE` expression that can be used for conditional logic.\n",
    "- First is simple `CASE` that matches a value against a list of other possible values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "928767ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT order_id,\n",
    "       shipped_date,\n",
    "       CASE\n",
    "           WHEN shipped_date - order_date = 1 THEN\n",
    "               '1 day'\n",
    "           WHEN shipped_date - order_date < 7 THEN\n",
    "               (shipped_date - order_date) || ' days'\n",
    "           WHEN shipped_date - order_date < 14 THEN\n",
    "               '1 week'\n",
    "           WHEN shipped_date - order_date < 28 THEN\n",
    "               (shipped_date - order_date) / 7 || ' weeks'\n",
    "           WHEN shipped_date - order_date < 56 THEN\n",
    "               '1 month'\n",
    "           ELSE\n",
    "               (shipped_date - order_date) / 28 || ' months'\n",
    "       END AS time_to_shipped\n",
    "FROM orders"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "328f0710",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "- Second is \"searched\" `CASE` that searches a list of expressions for a truish one.\n",
    "- It can be also thought of as a simple `CASE` that matches a value of `TRUE` against a list of other possible values.\n",
    "  - Try adding `TRUE` after `CASE` in this example :)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}