In [None]:
import agh_db_lectures
agh_db_lectures.prepare_notebook_for_sql()

# Data definition in SQL

_notes_

Data is stored in rows. Rows are organized into tables. Tables into **schemas**. And schemas into **catalogs** (often called databases).

## Referring to tables in different schemas

In [None]:
%sql postgresql://demo_user:demo_pwd@localhost:25432/agh_it_northwind

In [None]:
agh_db_lectures.download_restore_nw_postgres_dump()

In [None]:
%%sql
SELECT category_id, category_name FROM categories

_notes_

Our Northwind database tables are in the catalog `agh_it_northwind` under a schema with default name `public`.

We can refer to tables under our current schema using just their names. But we can also explicitly name the schema and even the catalog with `FROM public.categories` or `FROM agh_it_northwind.public.categories`.

In [None]:
%%sql
SELECT * FROM pg_catalog.pg_tables

_notes_

Under most relational DMBSes using SQL there also exist special table that store information about the database itself. For example, under Postgres the table `pg_catalog.pg_tables` has the metadata of all tables in the catalog.

Although syntax permits this, note that Postgres does not support accessing tables from other catalogs than the one to which the client is connected. Try `SELECT * FROM postgres.pg_catalog.pg_tables`.

In [None]:
%%sql
SET search_path = pg_catalog

_notes_

We can instruct Postgres to look for tables in certain schemas and not in the others.

In [None]:
%%sql
SELECT category_id, category_name FROM categories

_notes_

If table is not in a schema from the `search_path`, then it has to be referenced with its schema name. Add prefix `public.` in this query to make it work.

Interestingly, it seems impossible to prevent Postgres from searching the `pg_catalog` schema…

## Catalog creation and removal under Postgres

In [None]:
!psql --port 25432 --command='CREATE DATABASE ddl_examples' postgres

_notes_

Note that operations (syntax of commands and their detailed semantic) on catalogs are DBMS-specific.

Due to limitations of ipython-sql, we shall use a command line program to run a command that creates a new catalog.

An attempt to re-create an existing catalog unsurprisingly results in an error.

Once we create our catalog, we can remove it with `DROP DATABASE ddl_examples` command.

Now, let us re-create the database with `WITH OWNER demo_user` appended to the command. This will allow our demo account to modify the schema (i.e., create, alter and drop tables).

Note that the command line program was connecting to Postgres over socket as a privileged user who is allowed to operate on catalogs. Users and privileges are going to be covered in more detail in a later topic.

In [None]:
%sql postgresql://demo_user:demo_pwd@localhost:25432/ddl_examples

_notes_

We can connect to the newly-created catalog.

## Creation and removal of schemas

In [None]:
%%sql
CREATE SCHEMA some_schema

_notes_

Before we can use a schema, we need to explicitly create it.

`CREATE SCHEMA IF NOT EXISTS` can be used to avoid an error if the schema is already there.

Note that Postgres does not allow analogous `CREATE DATABASE IF NOT EXISTS` :(

Analogously to schema creation, we can use the following two statements to remove the schema from the catalog.

```sql
DROP SCHEMA some_schema;
```

```sql
DROP SCHEMA IF EXISTS some_schema;
```

## Creation and removal of schemas and tables

In [None]:
%%sql
CREATE TABLE ids (id INT)

-- INSERT INTO ids(id) VALUES(1)
-- SELECT * FROM ids;

_notes_

This supler-simple statement creates table `ids` under the default schema (`public`). The table has a single column named `id` of type `INTEGER`.

Let's try using the new table.

Note that you can execute the same `INSERT` command successfully multiple times. The uniqueness of the `id` attribute is not mandated.

This table does not represent a relation :(

Most tables we work with are going to have a **primary key** — a subset of columns whose values, together, uniquely identify each row. In our case, no primary key was specified.

We can drop this table with the following statement.

```sql
DROP TABLE ids;
```

Note that `DROP TABLE IF EXISTS` exists and works as you'd expect.

Note that we can explicitly name the schema of the table in `CREATE` and `DROP` statements.

Let's repeat the `CREATE TABLE` command with the following code inside parentheses.

```sql
 id_part_1 SMALLINT,
 id_part_2 SMALLINT,
 owner VARCHAR(50),
 PRIMARY KEY (id_part_1, id_part_2)
```

The `CREATE TABLE` syntax allows us to add a list of constraints after a list of column definitions. We are using a 2-column primary key in our example to highlight that there is **no** requirement for keys to be single-column.

Note that `SMALLINT` is a **signed** integer type that is typically at least 16 bits wide.

Identifier consisting of 2 numbers is used in some cases, for example with USB devices. We can now try inserting data.

```sql
INSERT INTO ids(id_part_1, id_part_2, owner)
VALUES(1, 1, 'Theodore''s Corporation')
```

```sql
VALUES(2, 1, 'Datapol Sp. z o.o.')
```

```sql
VALUES(2, 1, 'Basepol Sp. z o.o.')
```

We see that a single column that is part of the primary key **can** have repeated values. But the combination of all primary key column values (the tuple of `(id_part_1, id_part_2)`) cannot have repeated values.

Not that the `PRIMARY KEY` constraint implies that the key columns do not accept `NULL`.

```sql
VALUES(2, NULL, 'NULLpol Sp. z o.o.')
```

The following works nonetheless, as `owner` is not part of the primary key.

```sql
VALUES(2, 2, NULL)
```

Note that the `PRIMARY KEY` constraint not only restricts what we can store in the table. It also makes the DBMS create a data structure called **index** that allows efficient querying of the table by key value. There can only be one such constraint for the entire table.

It is often desirable to disallow `NULL` values in certain non-key columns as well (we are going to talk more about this and other good database design practices in the future). The `NOT NULL` constraint can be used for this. This constraint is specified together with the column definition.

```sql
 owner VARCHAR(50) NOT NULL,
```

The last `INSERT` commands that we tried and the following one now both fail.

```sql
INSERT INTO ids(id_part_1, id_part_2)
VALUES(2, 2)
```

We can allow the above command to succeed if we define a default value for the column.

```sql
 owner VARCHAR(50) NOT NULL DEFAULT 'Ids Consortium Inc.',
```

We can mandate that certain non-key group of columns is unique. Let's say that the Consortium is a company that sells its ids to other companies (however funny it might be, there is a real world precedence of it — the USB Implementers Forum Inc. 😉). Assume there is a requirement to have a separate invoice for each id sold. We can add the following columns and constraint.

```sql
 invoice_month VARCHAR(7),
 invoice_number BIGINT,
```

```sql
 UNIQUE (invoice_month, invoice_number)
```

Now, the following sequence of insertions fails.

```sql
INSERT INTO ids(id_part_1, id_part_2, owner, invoice_month, invoice_number)
VALUES(1, 1, 'Theodore''s Corporation', '2025-10', 1)
```

```sql
VALUES(1, 2, 'Theodore''s Corporation', '2025-10', 1)
```

Note that the following shall still work as `NULL` values are never considered equal to anything.

```sql
VALUES(1, 2, 'Theodore''s Corporation', '2025-10', NULL)
```

```sql
VALUES(1, 3, 'Theodore''s Corporation', '2025-10', NULL)
```

We can also give arbitrary conditions for values of a single row with the `CHECK` constraint.

```sql
 CHECK (id_part_2 < 256),
 CHECK (id_part_1 < 256)
```

Note that although the SQL standard allows using subqueries inside `CHECK` expressions, most DBMSes don't support it. That means, e.g., `CHECK (id_part_1 < (SELECT 256))` shall fail.

Note that some constraints (`UNIQUE`, `CHECK`, etc.) can be used multiple times with a single table.

This now fails.

```sql
VALUES(321, 3, 'Datapol Sp. z o.o.', '2025-10', NULL)
```

### Foreign keys

In [None]:
%%sql
DROP TABLE IF EXISTS invoices;
CREATE TABLE invoices (
 invoice_month VARCHAR(7),
 invoice_number BIGINT,
 document BYTEA,
 PRIMARY KEY (invoice_month, invoice_number),
 CHECK (invoice_month ~ '[0-9][0-9][0-9][0-9]-[0-9][0-9]')
);
INSERT INTO invoices(invoice_month, invoice_number, document)
VALUES('2025-10', 1, BYTEA '%PDF-1.6\015%BinaryContentsOfAPdf...');
INSERT INTO invoices(invoice_month, invoice_number, document)
VALUES('2025-10', 2, BYTEA '%PDF-1.6\015%ContentsAnotherPdf...')

_notes_

SQL tables sometimes share a common attribute (or set of attributes) that we typically make use of in `JOIN` operations. The `invoices` relation has an `(invoice_month, invoice_number)` attribute pair that is also present in `ids`.

Note that we could possibly store contents of files (like PDF documents) in an SQL database. **There are mixed opinions on this practice**, but it is good to be aware that it is possible.

Postgres uses a `BYTEA` type for this rather than the similar `BLOB` type from the SQL standard. In this example we populate the `BYTEA` column with some dummy byte sequences. Detailed description of the format of binary literals is available in the official DBMS documentation.

In [None]:
%%sql
SELECT invoice_month,
 invoice_number,
 ENCODE(document, 'hex') as document_in_hex
FROM invoices

_notes_

Jupyter with ipython-sql is incapable of displaying binary data so we fetch it in hex for our preview.

At this point we can let our DBMS know about the connection between the `ids` and `invoices` tables, by using a `FOREIGN KEY` constraint. It takes the following form.

```sql
FOREIGN KEY (t1col_A, t1col_B) REFERENCES tab2 (t2col_X, t2col_Y)
```

_notes_

We are once again using a 2-column key in our example to highlight that there is **no** requirement for keys to be single-column. Rather intutively, a foreign key (just as a primay key) could also comprise 3 or more columns.

In [None]:
%%sql
DROP TABLE IF EXISTS ids;
CREATE TABLE ids (
 id_part_1 SMALLINT,
 id_part_2 SMALLINT,
 invoice_month VARCHAR(7),
 invoice_number BIGINT,
 owner VARCHAR(50) NOT NULL DEFAULT 'Ids Consortium Inc.',
 PRIMARY KEY(id_part_1, id_part_2),
 UNIQUE (invoice_month, invoice_number),
 CHECK (id_part_2 < 256),
 CHECK (id_part_1 < 256),
 FOREIGN KEY (invoice_month, invoice_number)
 REFERENCES invoices (invoice_month, invoice_number)
)

_notes_

We declare that the columns `invoice_month`, and `invoice_number` in table `ids` correspond to the same-named columns in table `invoices`. 

The practical consequence of this is that each row in `ids` is now required to have a matching row in `invoices`. Try to violate this requirement with code below.

```sql
INSERT INTO ids(id_part_1, id_part_2, owner, invoice_month, invoice_number)
VALUES(1, 1, 'Theodore''s Corporation', '2025-10', 987)
```

On the other hand, if we use an `(invoice_month, invoice_number)` pair that has a match in `invoices` (e.g., `('2025-10', 1)`), the `INSERT` command shall succeed.

Note that the DMBS prevents not only constraint-violating insertions, but also updates and deletions. Try deleting the row in `invoices` that is referenced by the newly added row in `ids`.

```sql
DELETE FROM invoices
```

Note that the default behavior of disallowing row deletion can be overriden. The following will cause the "orphaned" rows in `ids` to be automatically deleted.

```sql
FOREIGN KEY (invoice_month, invoice_number)
 REFERENCES invoices (invoice_month, invoice_number)
 ON DELETE CASCADE
```

We can also specify `ON DELETE SET DEFAULT` to have the foreign key columns of the orphaned rows filled with, well, their default values. There also exist

- `ON DELETE NO ACTION`, the behavior we had initially,
- `ON DELETE RESTRICT` that behaves mostly like `NO ACTION`, but — under DBMSes that support it — has slightly different semantic with respect to transactions, which will be coverent in a later topic, and
- `ON DELETE SET NULL` that fills the columns with `NULL`s.

Yes, it is possible for foreign key columns to store `NULL` values (as long as these columns are not declared `NOT NULL`). There are also ways to specify whether or not some of the foreign key columns of a row can be `NULL`.

Analogously, for updates to the referenced table rows, we can specify `ON UPDATE SET DEFAULT`, etc.

Since the `PRIMARY KEY` constraint creates a dependence of one table on another, a DBMS shall, by default, prevent dropping of a table that is referenced by another one. This can be overriden with the `CASCADE` keyword. Therefore, `DROP TABLE invoices` shall fail, and `DROP TABLE invoices CASCADE` shall succeed, causing the `ids` table to be dropped as well.

The `FOREIGN KEY` clause requires the referenced columns hold unique tuples. As long as we have either

- `PRIMARY KEY (invoice_month, invoice_number)`, or
- `UNIQUE (invoice_month, invoice_number)`

constraint in the `invoices` table, a `FOREIGN KEY` can be declared. Otherwise, the attempt to declare it shall fail.

### Shorthands for constraints

In [None]:
%%sql
DROP TABLE IF EXISTS users CASCADE;
CREATE TABLE users (
 login VARCHAR(50),
 password_hash VARCHAR(50) NOT NULL,
 PRIMARY KEY (login)
);

DROP TABLE IF EXISTS posts;
CREATE TABLE posts (
 id INT,
 title VARCHAR(200),
 author VARCHAR(50),
 contents TEXT,
 PRIMARY KEY (id),
 UNIQUE (title),
 FOREIGN KEY (author) REFERENCES users (login)
)

_notes_

If the list of referenced columns of the other table is identical to that table's primary key, Postgres (but not every DBMS) allows that list to be omitted in the `FOREIGN KEY` clause. Remove `(login)` from `REFERENCES users (login)`.

A single-column `PRIMARY KEY` constraint can be declared in one line with column definition, as below.

```sql
 id INT PRIMARY KEY,
```

Likewise in case of a single-column `UNIQUE` constraint.

```sql
 title VARCHAR(200) UNIQUE,
```

And, finally, a shorthand for a single-column `FOREIGN KEY` constraint.

```sql
 author VARCHAR(50) REFERENCES users (login),
```

## Altering existing tables

In [None]:
%%sql
INSERT INTO users(login, password_hash)
VALUES ('theodore', 'ea56b986135de17142f91e5523ce9d19');
INSERT INTO posts(id, title, author, contents)
VALUES (1,
 'Towards Full Test Coverage of Database Migration Code',
 'theodore',
 'Programs commonly alter the database schemas created by their…')

In [None]:
%%sql
ALTER TABLE posts ADD COLUMN when_published DATE;

SELECT * FROM posts;

_notes_

We can dynamically add columns to tables with SQL. If we already have data in the table, the DBMS is going to fill the added column with `NULL` values.

We can also remove an existing column.

```sql
ALTER TABLE posts DROP COLUMN when_published
```

A default value or the `NOT NULL` constraint can also be specified for a column being added.

```sql
ALTER TABLE posts ADD COLUMN when_published TIMESTAMP
 DEFAULT '1970-01-01T00:00'
 NOT NULL;
```

A default value can also be removed and set for a column that already exists.

```sql
ALTER TABLE posts ALTER COLUMN when_published DROP DEFAULT
```

```sql
ALTER TABLE posts ALTER COLUMN when_published SET DEFAULT NOW()
```

Analogously, `SET`/`DROP NOT NULL` can also be used with `ALTER COLUMN`.

Note that (under Postgres, at least) the `NOW()` function would be executed again each time a row is added with the default value. If we add multiple posts, their publication times shall differ.

```sql
INSERT INTO posts(id, title, author, contents)
VALUES (2,
 'Fasibility of correlating untagged VCS sources with releases',
 'theodore',
 'The story of XZ backdoor shows that building software from…');
INSERT INTO posts(id, title, author, contents)
```

```sql
VALUES (3,
 'Making vector screenshots of websites',
 'theodore',
 'As many know, a good document is one where every image…');

SELECT * FROM posts
```

There also exist

- `ADD COLUMN IF NOT EXISTS`, and
- `DROP COLUMN IF EXISTS`

variants of the commands.

In [None]:
%%sql
ALTER TABLE posts RENAME COLUMN when_published TO when_posted

_notes_

A column or table can be renamed.

```sql
ALTER TABLE posts RENAME TO articles;

SELECT * FROM articles
```

```sql
ALTER TABLE articles RENAME TO posts
```

### Altering constraints

In [None]:
%%sql
SELECT * FROM pg_constraint WHERE conname LIKE 'posts%'

_notes_

We can see that Postgres assigned names to our constraints and stores these constraints' data in one of its tables.

In [None]:
%%sql
INSERT INTO posts(id, title, author, contents)
VALUES (4,
 'Towards Full Test Coverage of Database Migration Code',
 'theodore',
 'Programs commonly alter the database schemas created by their…')

_notes_

Fails due to `UNIQUE` constraint on `title`.

In [None]:
%%sql
ALTER TABLE posts DROP CONSTRAINT posts_title_key

_notes_

The name of a constraint can be used to `DROP` it. We can see that repeating titles can now exist in the table (the `INSERT` above now succeeds). Also, the respective row in the `pg_constraints` table is now gone.

```sql
SELECT conname, contype
FROM pg_constraint
WHERE conname LIKE '%post%'
```

We can also amend a table with new constraints. The sample code below tries to re-add the constraint that has just been removed.

```sql
ALTER TABLE posts ADD UNIQUE(title)
```

Creation of a new constraint on an existing table shall fail if the data already present in the table does not meet that constraint's requirements. This is what happened here.

```sql
SELECT * FROM posts
```

We see that we could make titles unique again by removing the post with `id` of 1.

```sql
DELETE FROM posts WHERE id = 1
```

After this, the constraint creation command runs successfully. We can see in `pg_constraints` that Postgres has once again given a name to our `UNIQUE` constraint. However, we can also choose a name by ourselves.

In [None]:
%%sql
ALTER TABLE posts ADD
 CONSTRAINT x
 CHECK(title ~ '^[^a-z]')

_notes_

Let's mandate that titles don't start with a lowercase latter.

We are going to use a regular expression in a `CHECK` constraint. It could be read as "Match this expression at the beginning of a tested string (`^`) and require that the first character is **not** from the set (`[^`…`]`) of lowercase letters from «a» to «z» (`a-z`)."

The `CONSTRAINT some_name` can optionally appear before the actual constraint specification. Theodore, please choose a name for this constraint.

In [None]:
%%sql
UPDATE posts
SET title = LOWER(SUBSTR(title, 1, 1)) || SUBSTR(title, 2)

_notes_

The created constraint can be seen under its new name in `pg_constraints`. Also, the code above now fails.

The

- `FOREIGN KEY`, and
- `PRIMARY KEY`

clauses can be used analogously with `ALTER TABLE`.

### Automatic identifier sequences

In [None]:
%%sql
DROP TABLE IF EXISTS posts;
CREATE TABLE posts (
 id INT PRIMARY KEY,
 title VARCHAR(200) UNIQUE,
 author VARCHAR(50) REFERENCES users (login),
 contents TEXT
)

_notes_

Postgres allows us to change `INT` to `SERIAL`. It causes the DBMS to use the next natural number from a sequence as the default column value upon insertion. This is highly useful for primary keys in the form of numeric ids (although not restricted to primary keys).

Additionally, `BIGINT` has a corresponding 8-byte `BIGSERIAL`, and there is also a 2-byte `SMALLSERIAL`. Keep in mind that Postgres' \*`SERIAL` types are **unsigned**, unlike `INT` & friends.

In [None]:
%%sql
INSERT INTO posts(title, author, contents)
VALUES ('Towards Full Test Coverage of Database Migration Code',
 'theodore',
 'Programs commonly alter the database schemas created by their…');

INSERT INTO posts(title, author, contents)
VALUES ('Fasibility of correlating untagged VCS sources with releases',
 'theodore',
 'The story of XZ backdoor shows that building software from…');

INSERT INTO posts(title, author, contents)
VALUES ('Making vector screenshots of websites',
 'theodore',
 'As many know, a good document is one where every image…')

SELECT * FROM posts

_notes_

Subsequent natural numbers get used for the `id` column.

In [None]:
%%sql
DELETE FROM posts WHERE title LIKE 'Making%';

INSERT INTO posts(title, author, contents)
VALUES ('Making vector screenshots of websites',
 'theodore',
 'As many know, a good document is one where every image…')
RETURNING id

_notes_

Let's remove and re-add a row to show that the `SERIAL` type does not reuse freed-up numbers. 

In [None]:
%%sql
DROP TABLE IF EXISTS posts;
CREATE TABLE posts (
 id INT PRIMARY KEY,
 title VARCHAR(200) UNIQUE,
 author VARCHAR(50) REFERENCES users (login),
 contents TEXT
)

_notes_

With \*`SERIAL` types, Postgres uses a so-called sequence mechanism under the hood. Sequences can also be created (and dropped) manually.

```sql
CREATE SEQUENCE posts_id_seq AS INT;
```

A sequence can be used for column's `DEFAULT` value.

```sql
 id INT PRIMARY KEY DEFAULT NEXTVAL('posts_id_seq'),
```

In [None]:
%%sql
INSERT INTO posts(title, author, contents)
VALUES ('Towards Full Test Coverage of Database Migration Code',
 'theodore',
 'Programs commonly alter the database schemas created by their…');

INSERT INTO posts(title, author, contents)
VALUES ('Fasibility of correlating untagged VCS sources with releases',
 'theodore',
 'The story of XZ backdoor shows that building software from…');

INSERT INTO posts(title, author, contents)
VALUES ('Making vector screenshots of websites',
 'theodore',
 'As many know, a good document is one where every image…');

SELECT * FROM posts

_notes_

We can use the same `INSERT`s to verify that the behavior is similar as with `SERIAL`.

## More about Data Definition Language in Postgres

- [documentation of `CREATE TABLE`](https://www.postgresql.org/docs/current/sql-createtable.html)
- [documentation of `ALTER TABLE`](https://www.postgresql.org/docs/current/sql-altertable.html)

## The "CHAR" trap

In [None]:
%%sql
DROP TABLE IF EXISTS names;
CREATE TABLE names (
 some_name CHAR(20) PRIMARY KEY
);
INSERT INTO names(some_name) VALUES ('Teodor')

In [None]:
%%sql
SELECT '*' || some_name || '*' FROM names

_notes_

Seems like the value in the table is just a 6-character string.

Now, add `WHERE some_name LIKE '_eodor'` or even `WHERE some_name = 'Teodor'`.

Nothing is returned? Try appending 14 spaces to the string in the `WHERE` clause.

Conclusion: it is safer to always create columns with `VARCHAR` type, as it doesn't pad calues with spaces. `CHAR` gives little to none performance boost and only makes sense for attributes that are guaranteed to always have the same string width (some alphanumerical ids issued by some organization or government maybe?).

## Data definition in MariaDB

In [None]:
!printf "CREATE DATABASE IF NOT EXISTS ddl_examples" | mariadb

_notes_

Unlike Postgres, MariaDB supports `IF NOT EXISTS` with `CRETE DATABASE`.

In [None]:
%sql mysql:///ddl_examples?unix_socket=/var/run/mysql/mysql.sock

In [None]:
%%sql
-- todo: test for differences

_notes_

TODO!