{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d578f915",
   "metadata": {},
   "outputs": [],
   "source": [
    "import agh_db_lectures\n",
    "agh_db_lectures.prepare_notebook_for_sql()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d88e77f8",
   "metadata": {},
   "source": [
    "# Data definition in SQL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ff3e265",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Data is stored in rows.  Rows are organized into tables.  Tables into **schemas**.  And schemas into **catalogs** (often called databases)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46c00d78",
   "metadata": {},
   "source": [
    "## Referring to tables in different schemas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbc145bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e28ddb06",
   "metadata": {},
   "outputs": [],
   "source": [
    "agh_db_lectures.download_restore_nw_postgres_dump()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e5b6a0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT category_id, category_name FROM categories"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "818082c0",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Our Northwind database tables are in the catalog `agh_it_northwind` under a schema with default name `public`.\n",
    "\n",
    "We can refer to tables under our current schema using just their names.  But we can also explicitly name the schema and even the catalog with `FROM public.categories` or `FROM agh_it_northwind.public.categories`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61b8469f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM pg_catalog.pg_tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00ce511a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Under most relational DMBSes using SQL there also exist special table that store information about the database itself.  For example, under Postgres the table `pg_catalog.pg_tables` has the metadata of all tables in the catalog.\n",
    "\n",
    "Although syntax permits this, note that Postgres does not support accessing tables from other catalogs than the one to which the client is connected.  Try `SELECT * FROM postgres.pg_catalog.pg_tables`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ce18966",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SET search_path = pg_catalog"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3886eb5",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We can instruct Postgres to look for tables in certain schemas and not in the others."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2a0476c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT category_id, category_name FROM categories"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "493f8ff7",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "If table is not in a schema from the `search_path`, then it has to be referenced with its schema name.  Add prefix `public.` in this query to make it work.\n",
    "\n",
    "Interestingly, it seems impossible to prevent Postgres from searching the `pg_catalog` schema…"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e2ec39c",
   "metadata": {},
   "source": [
    "## Catalog creation and removal under Postgres"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9701775d",
   "metadata": {},
   "outputs": [],
   "source": [
    "!psql --port 25432 --command='CREATE DATABASE ddl_examples' postgres"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8ed592a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Note that operations (syntax of commands and their detailed semantic) on catalogs are DBMS-specific.\n",
    "\n",
    "Due to limitations of ipython-sql, we shall use a command line program to run a command that creates a new catalog.\n",
    "\n",
    "An attempt to re-create an existing catalog unsurprisingly results in an error.\n",
    "\n",
    "Once we create our catalog, we can remove it with `DROP DATABASE ddl_examples` command.\n",
    "\n",
    "Now, let us re-create the database with `WITH OWNER demo_user` appended to the command.  This will allow our demo account to modify the schema (i.e., create, alter and drop tables).\n",
    "\n",
    "Note that the command line program was connecting to Postgres over socket as a privileged user who is allowed to operate on catalogs.  Users and privileges are going to be covered in more detail in a later topic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "058f229c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql postgresql://demo_user:demo_pwd@localhost:25432/ddl_examples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f81c04c",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We can connect to the newly-created catalog."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "398f045e",
   "metadata": {},
   "source": [
    "## Creation and removal of schemas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7274f41a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "CREATE SCHEMA some_schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b510763a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Before we can use a schema, we need to explicitly create it.\n",
    "\n",
    "`CREATE SCHEMA IF NOT EXISTS` can be used to avoid an error if the schema is already there.\n",
    "\n",
    "Note that Postgres does not allow analogous `CREATE DATABASE IF NOT EXISTS` :(\n",
    "\n",
    "Analogously to schema creation, we can use the following two statements to remove the schema from the catalog.\n",
    "\n",
    "```sql\n",
    "DROP SCHEMA some_schema;\n",
    "```\n",
    "\n",
    "```sql\n",
    "DROP SCHEMA IF EXISTS some_schema;\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c4e2cea",
   "metadata": {},
   "source": [
    "## Creation and removal of schemas and tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6626477b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "CREATE TABLE ids (id INT)\n",
    "\n",
    "-- INSERT INTO ids(id) VALUES(1)\n",
    "-- SELECT * FROM ids;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b02dfaa9",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "This supler-simple statement creates table `ids` under the default schema (`public`).  The table has a single column named `id` of type `INTEGER`.\n",
    "\n",
    "Let's try using the new table.\n",
    "\n",
    "Note that you can execute the same `INSERT` command successfully multiple times.  The uniqueness of the `id` attribute is not mandated.\n",
    "\n",
    "This table does not represent a relation :(\n",
    "\n",
    "Most tables we work with are going to have a **primary key** — a subset of columns whose values, together, uniquely identify each row.  In our case, no primary key was specified.\n",
    "\n",
    "We can drop this table with the following statement.\n",
    "\n",
    "```sql\n",
    "DROP TABLE ids;\n",
    "```\n",
    "\n",
    "Note that `DROP TABLE IF EXISTS` exists and works as you'd expect.\n",
    "\n",
    "Note that we can explicitly name the schema of the table in `CREATE` and `DROP` statements.\n",
    "\n",
    "Let's repeat the `CREATE TABLE` command with the following code inside parentheses.\n",
    "\n",
    "```sql\n",
    "    id_part_1 SMALLINT,\n",
    "    id_part_2 SMALLINT,\n",
    "    owner     VARCHAR(50),\n",
    "    PRIMARY KEY (id_part_1, id_part_2)\n",
    "```\n",
    "\n",
    "The `CREATE TABLE` syntax allows us to add a list of constraints after a list of column definitions.  We are using a 2-column primary key in our example to highlight that there is **no** requirement for keys to be single-column.\n",
    "\n",
    "Note that `SMALLINT` is a **signed** integer type that is typically at least 16 bits wide.\n",
    "\n",
    "Identifier consisting of 2 numbers is used in some cases, for example with USB devices.  We can now try inserting data.\n",
    "\n",
    "```sql\n",
    "INSERT INTO ids(id_part_1, id_part_2, owner)\n",
    "VALUES(1, 1, 'Theodore''s Corporation')\n",
    "```\n",
    "\n",
    "```sql\n",
    "VALUES(2, 1, 'Datapol Sp. z o.o.')\n",
    "```\n",
    "\n",
    "```sql\n",
    "VALUES(2, 1, 'Basepol Sp. z o.o.')\n",
    "```\n",
    "\n",
    "We see that a single column that is part of the primary key **can** have repeated values.  But the combination of all primary key column values (the tuple of `(id_part_1, id_part_2)`) cannot have repeated values.\n",
    "\n",
    "Not that the `PRIMARY KEY` constraint implies that the key columns do not accept `NULL`.\n",
    "\n",
    "```sql\n",
    "VALUES(2, NULL, 'NULLpol Sp. z o.o.')\n",
    "```\n",
    "\n",
    "The following works nonetheless, as `owner` is not part of the primary key.\n",
    "\n",
    "```sql\n",
    "VALUES(2, 2, NULL)\n",
    "```\n",
    "\n",
    "Note that the `PRIMARY KEY` constraint not only restricts what we can store in the table.  It also makes the DBMS create a data structure called **index** that allows efficient querying of the table by key value.  There can only be one such constraint for the entire table.\n",
    "\n",
    "It is often desirable to disallow `NULL` values in certain non-key columns as well (we are going to talk more about this and other good database design practices in the future).  The `NOT NULL` constraint can be used for this.  This constraint is specified together with the column definition.\n",
    "\n",
    "```sql\n",
    "    owner     VARCHAR(50) NOT NULL,\n",
    "```\n",
    "\n",
    "The last `INSERT` commands that we tried and the following one now both fail.\n",
    "\n",
    "```sql\n",
    "INSERT INTO ids(id_part_1, id_part_2)\n",
    "VALUES(2, 2)\n",
    "```\n",
    "\n",
    "We can allow the above command to succeed if we define a default value for the column.\n",
    "\n",
    "```sql\n",
    "    owner     VARCHAR(50) NOT NULL DEFAULT 'Ids Consortium Inc.',\n",
    "```\n",
    "\n",
    "We can mandate that certain non-key group of columns is unique.  Let's say that the Consortium is a company that sells its ids to other companies (however funny it might be, there is a real world precedence of it — the USB Implementers Forum Inc. 😉).  Assume there is a requirement to have a separate invoice for each id sold.  We can add the following columns and constraint.\n",
    "\n",
    "```sql\n",
    "    invoice_month  VARCHAR(7),\n",
    "    invoice_number BIGINT,\n",
    "```\n",
    "\n",
    "```sql\n",
    "    UNIQUE (invoice_month, invoice_number)\n",
    "```\n",
    "\n",
    "Now, the following sequence of insertions fails.\n",
    "\n",
    "```sql\n",
    "INSERT INTO ids(id_part_1, id_part_2, owner, invoice_month, invoice_number)\n",
    "VALUES(1, 1, 'Theodore''s Corporation', '2025-10', 1)\n",
    "```\n",
    "\n",
    "```sql\n",
    "VALUES(1, 2, 'Theodore''s Corporation', '2025-10', 1)\n",
    "```\n",
    "\n",
    "Note that the following shall still work as `NULL` values are never considered equal to anything.\n",
    "\n",
    "```sql\n",
    "VALUES(1, 2, 'Theodore''s Corporation', '2025-10', NULL)\n",
    "```\n",
    "\n",
    "```sql\n",
    "VALUES(1, 3, 'Theodore''s Corporation', '2025-10', NULL)\n",
    "```\n",
    "\n",
    "We can also give arbitrary conditions for values of a single row with the `CHECK` constraint.\n",
    "\n",
    "```sql\n",
    "    CHECK (id_part_2 < 256),\n",
    "    CHECK (id_part_1 < 256)\n",
    "```\n",
    "\n",
    "Note that although the SQL standard allows using subqueries inside `CHECK` expressions, most DBMSes don't support it.  That means, e.g., `CHECK (id_part_1 < (SELECT 256))` shall fail.\n",
    "\n",
    "Note that some constraints (`UNIQUE`, `CHECK`, etc.) can be used multiple times with a single table.\n",
    "\n",
    "This now fails.\n",
    "\n",
    "```sql\n",
    "VALUES(321, 3, 'Datapol Sp. z o.o.', '2025-10', NULL)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1aa083d0",
   "metadata": {},
   "source": [
    "### Foreign keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb98fc9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS invoices;\n",
    "CREATE TABLE invoices (\n",
    "    invoice_month  VARCHAR(7),\n",
    "    invoice_number BIGINT,\n",
    "    document       BYTEA,\n",
    "    PRIMARY KEY (invoice_month, invoice_number),\n",
    "    CHECK (invoice_month ~ '[0-9][0-9][0-9][0-9]-[0-9][0-9]')\n",
    ");\n",
    "INSERT INTO invoices(invoice_month, invoice_number, document)\n",
    "VALUES('2025-10', 1, BYTEA '%PDF-1.6\\015%BinaryContentsOfAPdf...');\n",
    "INSERT INTO invoices(invoice_month, invoice_number, document)\n",
    "VALUES('2025-10', 2, BYTEA '%PDF-1.6\\015%ContentsAnotherPdf...')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fd9c0b4",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "SQL tables sometimes share a common attribute (or set of attributes) that we typically make use of in `JOIN` operations.  The `invoices` relation has an `(invoice_month, invoice_number)` attribute pair that is also present in `ids`.\n",
    "\n",
    "Note that we could possibly store contents of files (like PDF documents) in an SQL database.  **There are mixed opinions on this practice**, but it is good to be aware that it is possible.\n",
    "\n",
    "Postgres uses a `BYTEA` type for this rather than the similar `BLOB` type from the SQL standard.  In this example we populate the `BYTEA` column with some dummy byte sequences.  Detailed description of the format of binary literals is available in the official DBMS documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b36aed8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT invoice_month,\n",
    "       invoice_number,\n",
    "       ENCODE(document, 'hex') as document_in_hex\n",
    "FROM invoices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "785a863b",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Jupyter with ipython-sql is incapable of displaying binary data so we fetch it in hex for our preview."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a63ec589",
   "metadata": {},
   "source": [
    "At this point we can let our DBMS know about the connection between the `ids` and `invoices` tables, by using a `FOREIGN KEY` constraint.  It takes the following form.\n",
    "\n",
    "```sql\n",
    "FOREIGN KEY (t1col_A, t1col_B) REFERENCES tab2 (t2col_X, t2col_Y)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c97fb003",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We are once again using a 2-column key in our example to highlight that there is **no** requirement for keys to be single-column.  Rather intutively, a foreign key (just as a primay key) could also comprise 3 or more columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12c73e53",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS ids;\n",
    "CREATE TABLE ids (\n",
    "    id_part_1      SMALLINT,\n",
    "    id_part_2      SMALLINT,\n",
    "    invoice_month  VARCHAR(7),\n",
    "    invoice_number BIGINT,\n",
    "    owner          VARCHAR(50) NOT NULL DEFAULT 'Ids Consortium Inc.',\n",
    "    PRIMARY KEY(id_part_1, id_part_2),\n",
    "    UNIQUE (invoice_month, invoice_number),\n",
    "    CHECK (id_part_2 < 256),\n",
    "    CHECK (id_part_1 < 256),\n",
    "    FOREIGN KEY             (invoice_month, invoice_number)\n",
    "        REFERENCES invoices (invoice_month, invoice_number)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92421f4e",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We declare that the columns `invoice_month`, and `invoice_number` in table `ids` correspond to the same-named columns in table `invoices`.  \n",
    "\n",
    "The practical consequence of this is that each row in `ids` is now required to have a matching row in `invoices`.  Try to violate this requirement with code below.\n",
    "\n",
    "```sql\n",
    "INSERT INTO ids(id_part_1, id_part_2, owner, invoice_month, invoice_number)\n",
    "VALUES(1, 1, 'Theodore''s Corporation', '2025-10', 987)\n",
    "```\n",
    "\n",
    "On the other hand, if we use an `(invoice_month, invoice_number)` pair that has a match in `invoices` (e.g., `('2025-10', 1)`), the `INSERT` command shall succeed.\n",
    "\n",
    "Note that the DMBS prevents not only constraint-violating insertions, but also updates and deletions.  Try deleting the row in `invoices` that is referenced by the newly added row in `ids`.\n",
    "\n",
    "```sql\n",
    "DELETE FROM invoices\n",
    "```\n",
    "\n",
    "Note that the default behavior of disallowing row deletion can be overriden.  The following will cause the \"orphaned\" rows in `ids` to be automatically deleted.\n",
    "\n",
    "```sql\n",
    "FOREIGN KEY             (invoice_month, invoice_number)\n",
    "        REFERENCES invoices (invoice_month, invoice_number)\n",
    "        ON DELETE CASCADE\n",
    "```\n",
    "\n",
    "We can also specify `ON DELETE SET DEFAULT` to have the foreign key columns of the orphaned rows filled with, well, their default values.  There also exist\n",
    "\n",
    "- `ON DELETE NO ACTION`, the behavior we had initially,\n",
    "- `ON DELETE RESTRICT` that behaves mostly like `NO ACTION`, but — under DBMSes that support it — has slightly different semantic with respect to transactions, which will be coverent in a later topic, and\n",
    "- `ON DELETE SET NULL` that fills the columns with `NULL`s.\n",
    "\n",
    "Yes, it is possible for foreign key columns to store `NULL` values (as long as these columns are not declared `NOT NULL`).  There are also ways to specify whether or not some of the foreign key columns of a row can be `NULL`.\n",
    "\n",
    "Analogously, for updates to the referenced table rows, we can specify `ON UPDATE SET DEFAULT`, etc.\n",
    "\n",
    "Since the `PRIMARY KEY` constraint creates a dependence of one table on another, a DBMS shall, by default, prevent dropping of a table that is referenced by another one.  This can be overriden with the `CASCADE` keyword.  Therefore, `DROP TABLE invoices` shall fail, and `DROP TABLE invoices CASCADE` shall succeed, causing the `ids` table to be dropped as well.\n",
    "\n",
    "The `FOREIGN KEY` clause requires the referenced columns hold unique tuples.  As long as we have either\n",
    "\n",
    "- `PRIMARY KEY (invoice_month, invoice_number)`, or\n",
    "- `UNIQUE (invoice_month, invoice_number)`\n",
    "\n",
    "constraint in the `invoices` table, a `FOREIGN KEY` can be declared.  Otherwise, the attempt to declare it shall fail."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9601efd9",
   "metadata": {},
   "source": [
    "### Shorthands for constraints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da98cb11",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS users CASCADE;\n",
    "CREATE TABLE users (\n",
    "    login         VARCHAR(50),\n",
    "    password_hash VARCHAR(50) NOT NULL,\n",
    "    PRIMARY KEY (login)\n",
    ");\n",
    "\n",
    "DROP TABLE IF EXISTS posts;\n",
    "CREATE TABLE posts (\n",
    "    id       INT,\n",
    "    title    VARCHAR(200),\n",
    "    author   VARCHAR(50),\n",
    "    contents TEXT,\n",
    "    PRIMARY KEY (id),\n",
    "    UNIQUE (title),\n",
    "    FOREIGN KEY (author) REFERENCES users (login)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c32344",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "If the list of referenced columns of the other table is identical to that table's primary key, Postgres (but not every DBMS) allows that list to be omitted in the `FOREIGN KEY` clause.  Remove `(login)` from `REFERENCES users (login)`.\n",
    "\n",
    "A single-column `PRIMARY KEY` constraint can be declared in one line with column definition, as below.\n",
    "\n",
    "```sql\n",
    "    id       INT PRIMARY KEY,\n",
    "```\n",
    "\n",
    "Likewise in case of a single-column `UNIQUE` constraint.\n",
    "\n",
    "```sql\n",
    "    title    VARCHAR(200) UNIQUE,\n",
    "```\n",
    "\n",
    "And, finally, a shorthand for a single-column `FOREIGN KEY` constraint.\n",
    "\n",
    "```sql\n",
    "    author   VARCHAR(50) REFERENCES users (login),\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "414a7cd9",
   "metadata": {},
   "source": [
    "## Altering existing tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "741aa2cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "INSERT INTO users(login, password_hash)\n",
    "VALUES ('theodore', 'ea56b986135de17142f91e5523ce9d19');\n",
    "INSERT INTO posts(id, title, author, contents)\n",
    "VALUES (1,\n",
    "        'Towards Full Test Coverage of Database Migration Code',\n",
    "        'theodore',\n",
    "        'Programs commonly alter the database schemas created by their…')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0800bcc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "ALTER TABLE posts ADD COLUMN when_published DATE;\n",
    "\n",
    "SELECT * FROM posts;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d85a68aa",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We can dynamically add columns to tables with SQL.  If we already have data in the table, the DBMS is going to fill the added column with `NULL` values.\n",
    "\n",
    "We can also remove an existing column.\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts DROP COLUMN when_published\n",
    "```\n",
    "\n",
    "A default value or the `NOT NULL` constraint can also be specified for a column being added.\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts ADD COLUMN when_published TIMESTAMP\n",
    "    DEFAULT '1970-01-01T00:00'\n",
    "    NOT NULL;\n",
    "```\n",
    "\n",
    "A default value can also be removed and set for a column that already exists.\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts ALTER COLUMN when_published DROP DEFAULT\n",
    "```\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts ALTER COLUMN when_published SET DEFAULT NOW()\n",
    "```\n",
    "\n",
    "Analogously, `SET`/`DROP NOT NULL` can also be used with `ALTER COLUMN`.\n",
    "\n",
    "Note that (under Postgres, at least) the `NOW()` function would be executed again each time a row is added with the default value.  If we add multiple posts, their publication times shall differ.\n",
    "\n",
    "```sql\n",
    "INSERT INTO posts(id, title, author, contents)\n",
    "VALUES (2,\n",
    "        'Fasibility of correlating untagged VCS sources with releases',\n",
    "        'theodore',\n",
    "        'The story of XZ backdoor shows that building software from…');\n",
    "INSERT INTO posts(id, title, author, contents)\n",
    "```\n",
    "\n",
    "```sql\n",
    "VALUES (3,\n",
    "        'Making vector screenshots of websites',\n",
    "        'theodore',\n",
    "        'As many know, a good document is one where every image…');\n",
    "\n",
    "SELECT * FROM posts\n",
    "```\n",
    "\n",
    "There also exist\n",
    "\n",
    "- `ADD COLUMN IF NOT EXISTS`, and\n",
    "- `DROP COLUMN IF EXISTS`\n",
    "\n",
    "variants of the commands."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d1bc77b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "ALTER TABLE posts RENAME COLUMN when_published TO when_posted"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7feda21f",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "A column or table can be renamed.\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts RENAME TO articles;\n",
    "\n",
    "SELECT * FROM articles\n",
    "```\n",
    "\n",
    "```sql\n",
    "ALTER TABLE articles RENAME TO posts\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23ba8b9e",
   "metadata": {},
   "source": [
    "### Altering constraints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87840b0e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT * FROM pg_constraint WHERE conname LIKE 'posts%'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9d155f5",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We can see that Postgres assigned names to our constraints and stores these constraints' data in one of its tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "543a60eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "INSERT INTO posts(id, title, author, contents)\n",
    "VALUES (4,\n",
    "        'Towards Full Test Coverage of Database Migration Code',\n",
    "        'theodore',\n",
    "        'Programs commonly alter the database schemas created by their…')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42d7147b",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Fails due to `UNIQUE` constraint on `title`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a24f1af4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "ALTER TABLE posts DROP CONSTRAINT posts_title_key"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cec71ba",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "The name of a constraint can be used to `DROP` it.  We can see that repeating titles can now exist in the table (the `INSERT` above now succeeds).  Also, the respective row in the `pg_constraints` table is now gone.\n",
    "\n",
    "```sql\n",
    "SELECT conname, contype\n",
    "FROM pg_constraint\n",
    "WHERE conname LIKE '%post%'\n",
    "```\n",
    "\n",
    "We can also amend a table with new constraints.  The sample code below tries to re-add the constraint that has just been removed.\n",
    "\n",
    "```sql\n",
    "ALTER TABLE posts ADD UNIQUE(title)\n",
    "```\n",
    "\n",
    "Creation of a new constraint on an existing table shall fail if the data already present in the table does not meet that constraint's requirements.  This is what happened here.\n",
    "\n",
    "```sql\n",
    "SELECT * FROM posts\n",
    "```\n",
    "\n",
    "We see that we could make titles unique again by removing the post with `id` of 1.\n",
    "\n",
    "```sql\n",
    "DELETE FROM posts WHERE id = 1\n",
    "```\n",
    "\n",
    "After this, the constraint creation command runs successfully.  We can see in `pg_constraints` that Postgres has once again given a name to our `UNIQUE` constraint.  However, we can also choose a name by ourselves."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a03c0628",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "ALTER TABLE posts ADD\n",
    "    CONSTRAINT x\n",
    "    CHECK(title ~ '^[^a-z]')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba5be553",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Let's mandate that titles don't start with a lowercase latter.\n",
    "\n",
    "We are going to use a regular expression in a `CHECK` constraint.  It could be read as \"Match this expression at the beginning of a tested string (`^`) and require that the first character is **not** from the set (`[^`…`]`) of lowercase letters from «a» to «z» (`a-z`).\"\n",
    "\n",
    "The `CONSTRAINT some_name` can optionally appear before the actual constraint specification.  Theodore, please choose a name for this constraint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de9b0723",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "UPDATE posts\n",
    "SET title = LOWER(SUBSTR(title, 1, 1)) || SUBSTR(title, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e88466e3",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "The created constraint can be seen under its new name in `pg_constraints`.  Also, the code above now fails.\n",
    "\n",
    "The\n",
    "\n",
    "- `FOREIGN KEY`, and\n",
    "- `PRIMARY KEY`\n",
    "\n",
    "clauses can be used analogously with `ALTER TABLE`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85fc5792",
   "metadata": {},
   "source": [
    "### Automatic identifier sequences"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d67452d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS posts;\n",
    "CREATE TABLE posts (\n",
    "    id       INT PRIMARY KEY,\n",
    "    title    VARCHAR(200) UNIQUE,\n",
    "    author   VARCHAR(50) REFERENCES users (login),\n",
    "    contents TEXT\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eafcb70a",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Postgres allows us to change `INT` to `SERIAL`.  It causes the DBMS to use the next natural number from a sequence as the default column value upon insertion.  This is highly useful for primary keys in the form of numeric ids (although not restricted to primary keys).\n",
    "\n",
    "Additionally, `BIGINT` has a corresponding 8-byte `BIGSERIAL`, and there is also a 2-byte `SMALLSERIAL`.  Keep in mind that Postgres' \\*`SERIAL` types are **unsigned**, unlike `INT` & friends."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7122cb46",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Towards Full Test Coverage of Database Migration Code',\n",
    "        'theodore',\n",
    "        'Programs commonly alter the database schemas created by their…');\n",
    "\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Fasibility of correlating untagged VCS sources with releases',\n",
    "        'theodore',\n",
    "        'The story of XZ backdoor shows that building software from…');\n",
    "\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Making vector screenshots of websites',\n",
    "        'theodore',\n",
    "        'As many know, a good document is one where every image…')\n",
    "\n",
    "SELECT * FROM posts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13928ced",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Subsequent natural numbers get used for the `id` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ca150a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DELETE FROM posts WHERE title LIKE 'Making%';\n",
    "\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Making vector screenshots of websites',\n",
    "        'theodore',\n",
    "        'As many know, a good document is one where every image…')\n",
    "RETURNING id"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5d5752d",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Let's remove and re-add a row to show that the `SERIAL` type does not reuse freed-up numbers.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b169ed25",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS posts;\n",
    "CREATE TABLE posts (\n",
    "    id       INT PRIMARY KEY,\n",
    "    title    VARCHAR(200) UNIQUE,\n",
    "    author   VARCHAR(50) REFERENCES users (login),\n",
    "    contents TEXT\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f79ccfb3",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "With \\*`SERIAL` types, Postgres uses a so-called sequence mechanism under the hood.  Sequences can also be created (and dropped) manually.\n",
    "\n",
    "```sql\n",
    "CREATE SEQUENCE posts_id_seq AS INT;\n",
    "```\n",
    "\n",
    "A sequence can be used for column's `DEFAULT` value.\n",
    "\n",
    "```sql\n",
    "    id       INT PRIMARY KEY DEFAULT NEXTVAL('posts_id_seq'),\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84ca5af0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%sql\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Towards Full Test Coverage of Database Migration Code',\n",
    "        'theodore',\n",
    "        'Programs commonly alter the database schemas created by their…');\n",
    "\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Fasibility of correlating untagged VCS sources with releases',\n",
    "        'theodore',\n",
    "        'The story of XZ backdoor shows that building software from…');\n",
    "\n",
    "INSERT INTO posts(title, author, contents)\n",
    "VALUES ('Making vector screenshots of websites',\n",
    "        'theodore',\n",
    "        'As many know, a good document is one where every image…');\n",
    "\n",
    "SELECT * FROM posts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31c4cfe2",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "We can use the same `INSERT`s to verify that the behavior is similar as with `SERIAL`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef2f9d18",
   "metadata": {},
   "source": [
    "## More about Data Definition Language in Postgres\n",
    "\n",
    "- [documentation of `CREATE TABLE`](https://www.postgresql.org/docs/current/sql-createtable.html)\n",
    "- [documentation of `ALTER TABLE`](https://www.postgresql.org/docs/current/sql-altertable.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c10c6577",
   "metadata": {},
   "source": [
    "## The \"CHAR\" trap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9364ba65",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "DROP TABLE IF EXISTS names;\n",
    "CREATE TABLE names (\n",
    "    some_name CHAR(20) PRIMARY KEY\n",
    ");\n",
    "INSERT INTO names(some_name) VALUES ('Teodor')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc8219db",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "SELECT '*' || some_name || '*' FROM names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c9a301d",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Seems like the value in the table is just a 6-character string.\n",
    "\n",
    "Now, add `WHERE some_name LIKE '_eodor'` or even `WHERE some_name = 'Teodor'`.\n",
    "\n",
    "Nothing is returned?  Try appending 14 spaces to the string in the `WHERE` clause.\n",
    "\n",
    "Conclusion: it is safer to always create columns with `VARCHAR` type, as it doesn't pad calues with spaces.  `CHAR` gives little to none performance boost and only makes sense for attributes that are guaranteed to always have the same string width (some alphanumerical ids issued by some organization or government  maybe?)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90ec1db2",
   "metadata": {},
   "source": [
    "## Data definition in MariaDB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c6f9cda",
   "metadata": {},
   "outputs": [],
   "source": [
    "!printf \"CREATE DATABASE IF NOT EXISTS ddl_examples\" | mariadb"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d196a08",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "Unlike Postgres, MariaDB supports `IF NOT EXISTS` with `CRETE DATABASE`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9dbe1a3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%sql mysql:///ddl_examples?unix_socket=/var/run/mysql/mysql.sock"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09c82cd8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "-- todo: test for differences"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6909cd6b",
   "metadata": {},
   "source": [
    "_notes_\n",
    "\n",
    "TODO!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}