{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "fa6bdd5a", "metadata": {}, "outputs": [], "source": [ "import agh_db_lectures" ] }, { "cell_type": "markdown", "id": "893f4758", "metadata": {}, "source": [ "# Relational Database Design" ] }, { "cell_type": "markdown", "id": "d7034051", "metadata": {}, "source": [ "## Key Types" ] }, { "cell_type": "code", "execution_count": null, "id": "78001f45", "metadata": {}, "outputs": [], "source": [ "agh_db_lectures.nw_diagram.download_open()" ] }, { "cell_type": "markdown", "id": "b84d4552", "metadata": {}, "source": [ "- superkey\n", "- candidate key\n", "- primary key\n", "- foreign key" ] }, { "cell_type": "markdown", "id": "ba402167", "metadata": {}, "source": [ "_notes_\n", "\n", "**Superkey** is a set of attributes whose values uniquely identify each tuple in a relation. The `supplier_id` attributes itself forms a superkey of the `suppliers` relation. But every superset of `{supplier_id}` (e.g., `{supplier_id, city}`) is also a superkey.\n", "\n", "For now, we can use our knowledge of the world and the business to tell whether something is a superkey.\n", "\n", "**Candidate key** is a minimal superkey, e.g., one that cannot have any attribute removed or it will no longer be a superkey.\n", "\n", "**Primary key** is one of the candidate keys selected by DB schema designer to use for identifying tuples in a relation.\n", "\n", "**Foreign key** is a set of attributes in a relation that are used to reference the primary key of another relation. The foreign key attributes must be of the same types as the target primary key attributes.\n", "\n", "For each tuple in a relation with a foreign key there has to be a tuple in the referened relation that has the same values of primary key attributes (unless foreign key contains `NULL`s).\n", "\n", "Look for sample keys of all kinds in the Northwind db schema." ] }, { "cell_type": "markdown", "id": "c4234ac0", "metadata": {}, "source": [ "## Entity Relationship Model" ] }, { "cell_type": "markdown", "id": "9e668e60", "metadata": {}, "source": [ "With examples featuring a model of a (git) source code hosting platform, like [Codeberg](https://codeberg.org/) or GitLab." ] }, { "cell_type": "markdown", "id": "9aa04046", "metadata": {}, "source": [ "_notes_\n", "\n", "So far we've been working with databases using moderately low-level abstractions. Tables and rows are sometimes called a logical model of the database. They are specific to the relational data model and somewhat specific to the DBMS in use (for example, a Postgres DB schema might use the unsigned type `SERIAL` for a table column, whereas other DMBS might lack support for such data type).\n", "\n", "When designing a database schema, we might start with a conceptual model, like the **entity-relationship model**, instead. Such a model can make it easier to comprehend a large design and it also might (or might not, depending who you work with) represent business data relationships in a way closer to how people on nontechnical positions think of them.\n", "\n", "There are various diagram notations for the ER model. The notation of Peter Chen is the most popular and will be (more or less loosely) followed." ] }, { "cell_type": "markdown", "id": "bca2777c", "metadata": {}, "source": [ "An **entity set**.\n", "\n", "![](er-model-entity-set-example.svg)" ] }, { "cell_type": "markdown", "id": "e40c088c", "metadata": {}, "source": [ "_notes_\n", "\n", "We shall define **entity** as an element of the world that we want to model. A person, product, article, purchase, etc. can be an entity. We represent an **entity set** (\"type\" in some literature) with a rectangle.\n", "\n", "An entity can have **attributes**. E.g., an `ACCOUNT` entity (being, e.g., a git forge account) can have an `EMAIL` attribute associated with it. Attributes of an entity set (the \"attribute types\" in some more formal literature) are represented with ellipses.\n", "\n", "An attribute (type) can have a domain — the set of allowed values of that attribute. For example, a set of valid email addresses. A special value `null` can also belong to some attribute's domain. Even though domains are often mentioned when discussing the ER model, they are not represented on the diagrams.\n", "\n", "A subset of entity set attributes should be selected that uniquely identifies each entity. It is called the primary key. Attibutes that are part of the primary key (key attributes) are represented with underlined attribute name.\n", "\n", "The ER model permits modeling attributes that are **derived** from other attributes. For example, the age of an account can be computed from the registration date. It is a derived attribute. We represent it with an ellipse with a dashed outline.\n", "\n", "The ER model also permits modeling attributes that can have multiple values. For example, a git forge account can have multiple SSH keys configured in order to authenticate from multiple devices. The number of SSH keys would vary between the `ACCOUNT` entities. We can represent SSH keys as a **multivalued** attribute, with double outline on the diagram.\n", "\n", "An attribute can also be **composite**, meaning it is further broken into parts. E.g., the `LOGIN_DATA` of an `ACCOUNT` entity can be a composite attribute consisting of `PASSWORD_HASH` and `SALT`. We could also make the parts themselves composite. Perhaps it would make sense to represent `SALT` as consiting of algorithms type, iterations count and random data? Nonetheless, an authentication API used by an application is likely to accept the entire `LOGIN_DATA` of an `ACCOUNT` as a single string-valued argument.\n", "\n", "Entities can form relationships. For example, en `ACCOUNT` entity could be in a membership relationship with a `GROUP` entity. Consider groups like [guix](https://codeberg.org/guix) on Codeberg. Since we work with the notion of entity sets and multiple entities can be in multiplem membership relationships, we shall talk about **relationship sets**. A relationship set is going to be represented with a diamond." ] }, { "cell_type": "markdown", "id": "9e5b5957", "metadata": {}, "source": [ "A **many-to-many** mapping.\n", "\n", "![](many-to-many.svg)" ] }, { "cell_type": "markdown", "id": "36cf5a59", "metadata": {}, "source": [ "_notes_\n", "\n", "A single `ACCOUNT` entity can be in a membership relationship with zero, one, or multiple `GROUP` entities. Also, multiple `ACCOUNT` entities can be in a relationship with a single `GROUP` entity. There can be many entities on each side of the relationship set, so we say it has a **many-to-many** cardinality. There can also exist relationship sets with **one-to-one**, **one-to-many**, and **many-to-one** cardinalities. The **cardinality** can be represented on the diagram with annotations like `1`, or `0..n`. Some other common notations are just `n` or an asterisk (`*`) for the \"many\" side of a relationship set.\n", "\n", "If we wanted each group to have a single owner (aka admin), the relationship set of group ownership could be modeled with many-to-one mapping cardinality. If, for some reason, we were to require that each user owns at most one group, that could modeled with a one-to-one mapping." ] }, { "cell_type": "markdown", "id": "91d3ba17", "metadata": {}, "source": [ "_notes_\n", "\n", "Entity set participates **totally** in a relationship set if every entity **has to** be in some relationship from the set. Otherwise, it is participates **partially**.\n", "\n", "Notice that groups on a git forge could own code repositories (Codeberg works like this). We can include an entity set `REPOSITORY` in our model. Note that even though it has an attribute `REPO_NAME`, it is insufficient to uniquely identify all repositories on the forge. Multiple groups can have repositories with the same name (like there are multiple repositories named \"guix\" on Codeberg).\n", "\n", "If an entity set does not have enough attributes to form a key, we say that it is **weak**. A weak entity set has to participate in a relationship set with another entity set (or multiple other entity sets…) that is not weak (i.e., is **strong**). We use the notions of an **identifying relationship**, and an **identifying entity set**. In the weak entity set we designate a subset of attributes called the **discriminator**. Together with the primary key of the identifying entity set, the discriminator forms the primary key of the weak entity set. The weak entity and its identifying relationship set are going to be represented with double outlines on the diagrams. A weak entity could of course additionally participate in other relationship sets that are not used to identify it (and are thus **not** represented with a double outline).\n", "\n", "An entity can be in a relationship with another entity of the same type. As in case of the `FOLLOWS` relationship set (which is a **recursive** relationship set). To distinguish between the left-side and right-side participation, we can **name the roles** of entities on each side (e.g., `follower` and `followed`).\n", "\n", "Finally, a relationship set can have attributes of its own. We can add a `MEMBERSHIP_DATE` attribute to the relationship set is we want to record when someone has joined given group." ] }, { "cell_type": "markdown", "id": "16ba4d5e", "metadata": {}, "source": [ "![](er-model-example.svg)" ] }, { "cell_type": "markdown", "id": "0d854025", "metadata": {}, "source": [ "_notes_\n", "\n", "A ternary (or higher degree) relationship can occur between 3 (or more) entities. Here we used it to record who added each account to group. It is a minor detail that 2 sides of this ternary relationship set happen to invole the same entity set — an n-ary relationship set can just as well involve distinct entity sets.\n", "\n", "It is impossible to model it correctly by just binary relationship sets between the entity sets that we already have. However, it would be possible to model it correctly with a weak entity set identified jointly by three other entity sets." ] }, { "cell_type": "markdown", "id": "0090871b", "metadata": {}, "source": [ "## Enhanced Entity Relationship Model" ] }, { "cell_type": "markdown", "id": "3b5e50a0", "metadata": {}, "source": [ "EER model builds upon Peter Chen's ER model and adds specialization/generalization (described here) as well as aggregation (not covered).\n", "\n", "![](eer-model-example.svg)" ] }, { "cell_type": "markdown", "id": "f5b98e37", "metadata": {}, "source": [ "_notes_\n", "\n", "A typical git forge would allow both groups **and** accounts to own repositories. As such, the `ACCOUNT` and `GROUP` entity types could be thought of as **specializations** of another entity type that we could call `NAMESPACE_HOLDER` (since we typically refer to repositories as `https://some-forge.org/namespace_name/repo_name`).\n", "\n", "We could also say that our `NAMESPACE_HOLDER` entity type is a **generalization** of the other two types. Whether we talk about specialization or generalization depends on whether we think top-down or bottom-up.\n", "\n", "Specialization is similar to inheritance in object-oriented programming languages. A entity type that is a specialization of another type **inherits** its attributes, possibly also including some additional attributes of its own.\n", "\n", "Specialization can be represented with line going from an entity set to a circle and other lines going from the circle to the specializations. The lines that go to specializations are crossed with arcs, as shown in the diagram. Multi-level specialization/generalization hierarchies can be used.\n", "\n", "A specialization is **disjoint** if each entity is a member of at most one specialized type. It is **overlapping** if an entity can be a member of multiple specialized types. We denote it with letter \"d\" or \"o\" inside the circle.\n", "\n", "A specialization is **total** if each higher-level entity has to be a member of some specialized type. It is **partial** if we allow higher-level entities not to be members of any specialized type. We denote it with letter \"t\" or \"p\" along the line going from higher-level entity set to the circle.\n", "\n", "Aggregation and **categorization** (aka multiple inheritance) are also used in EER model but are not going to be covered here." ] }, { "cell_type": "markdown", "id": "273226ba", "metadata": {}, "source": [ "## Notes on (E)ER modeling" ] }, { "cell_type": "markdown", "id": "f17dfe29", "metadata": {}, "source": [ "_notes_\n", "\n", "Theodore, are there any reasons not to represent the `SSH_KEY` as another entity set rather than a multivalued attribute?\n", "\n", "There isn't always a rule whether to use a new entity type or a multivalued attribute (or a ternary relationship set). \"Good taste\" is needed.\n", "\n", "There can be multiple valid choices for a primary key. Also, one might choose to introduce an additional number (identifier) attribute to some entity set and use it in the primary key.\n", "\n", "**Do not** include primary key attributes of some entity as attributes of some other entity in order to make a \"foreign key\". There are no foreign keys in the (E)ER model, use relationship sets instead :)\n", "\n", "A workflow can be as follows.\n", "\n", "- entity sets identification\n", "- relationship sets identification\n", "- deciding on cardinality of relationship sets' mappings\n", "- identification of entity set attributes\n", "- primary key selection\n", "- identification of weak entity sets and their identifying relationship sets and identifying entity sets\n", "- identification of weak entity sets' discriminators\n", "- abstractions\n", " - specialization/generalization,\n", " - aggregation, and\n", " - categorization\n", "- determining whether the abstractions are\n", " - total/partial, and\n", " - disjoin/overlapping" ] }, { "cell_type": "markdown", "id": "fa25821b", "metadata": {}, "source": [ "## Mapping the Conceptual (E)ER Model to Relational Model\n", "\n", "We shall give some examples but omit formal algorithm descriptions." ] }, { "cell_type": "markdown", "id": "b6e246ff", "metadata": {}, "source": [ "#### Entity Sets and Attributes\n", "\n", "![](er-model-entity-w-attributes.svg)\n", "\n", "-----------------------------\n", "\n", "![](er-model-entity-w-attributes-mapped.svg)" ] }, { "cell_type": "markdown", "id": "8170f6e1", "metadata": {}, "source": [ "_notes_\n", "\n", "Entity set gets mapped to a relation. Ordinary attributes become relation attributes.\n", "\n", "Multivalued attributes get their own relations.\n", "\n", "Derived attributes are absent in relation schema.\n", "\n", "Note how we mark primary keys and foreign key constraints in schema diagram. Be aware that different notations are in use. See the WP database schema for another common way to mark cardinalities." ] }, { "cell_type": "markdown", "id": "7cfb437c", "metadata": {}, "source": [ "#### Entity Relationships and Weak Entity Types\n", "\n", "![](er-model-example.svg)\n", "\n", "-----------------------------\n", "\n", "![](er-model-mapped.svg)" ] }, { "cell_type": "markdown", "id": "dfc17db9", "metadata": {}, "source": [ "#### Generalization / Specialization\n", "\n", "![](eer-model-example.svg)\n", "\n", "-----------------------------\n", "\n", "![](eer-model-mapped.svg)" ] }, { "cell_type": "markdown", "id": "d3cf4dec", "metadata": {}, "source": [ "_notes_\n", "\n", "Alternative approach is to have just one relation for all specialized types with the attributes of all specializations present there and nullable." ] }, { "cell_type": "markdown", "id": "aba0baab", "metadata": {}, "source": [ "## Schema Normalization" ] }, { "cell_type": "markdown", "id": "8ddb4d0e", "metadata": {}, "source": [ "Let's formalize:\n", "\n", "#### Relation Schema\n", "\n", "$some\\_relation({attribute_1, attribute_2, attribute_3})$\n", "\n", "#### Relation Instance\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
attribute_1attribute_2attribute_3
1112412.43
134543.19
1564817.03
" ] }, { "cell_type": "markdown", "id": "01deea02", "metadata": {}, "source": [ "_notes_\n", "\n", "Schema is a design of relation (or entire database) structure, including integrity constraints.\n", "\n", "Instance is a data snapshot of a relation at given point in time." ] }, { "cell_type": "markdown", "id": "c00c5277", "metadata": {}, "source": [ "### The Use-Case of Normalization" ] }, { "cell_type": "markdown", "id": "276f2d2a", "metadata": {}, "source": [ "Assume another relation schema was designed to store information about issues and comments under them.\n", "\n", "$R = \\{issue\\_number, repo\\_name, owner\\_name, issue\\_text, comment\\_id, comment\\_text\\}$\n", "\n", "$issue\\_comment(R)$" ] }, { "cell_type": "markdown", "id": "365e5a93", "metadata": {}, "source": [ "_notes_\n", "\n", "Such relation schema could be, for example, the result of ad-hoc design. We deliberately make the example small by, e.g., omitting authorship information." ] }, { "cell_type": "markdown", "id": "b232eb0c", "metadata": {}, "source": [ "Consider an instance of this relation with some data.\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
issue_numberrepo_nameowner_nameissue_textcomment_idcomment_text
1112db-lecture-materialswkosiorI get `bash: git: command not found` when trying to download this :(9814You need to install the git command. E.g., with `apt install git`.
1112db-lecture-materialswkosiorI get `bash: git: command not found` when trying to download this :(9819Works like a charm. Thanks!
1345guixguix`librecad` package lacks icons and .desktop files from the upstream project at https://github.com/LibreCAD/LibreCAD.10239Fixed in 58b522a.
" ] }, { "cell_type": "markdown", "id": "15d3af8d", "metadata": {}, "source": [ "_notes_\n", "\n", "What are the problem with this relation schema?\n", "\n", "- duplicate storage of `issue_text`\n", "- need to be careful when updating it\n", "- need to be careful when deleting an issue\n", "- challenge of recording an issue that has no comments yet (`NULL` in `comment_*` columns?)\n", "\n", "It is often said that a design suffers from **anomalies** of data modification. Data redundancy is bad, mkay?" ] }, { "cell_type": "markdown", "id": "0701ae4f", "metadata": {}, "source": [ "#### Decomposed Relation\n", "\n", "$issue(issue\\_number, repo\\_name, owner\\_name, issue\\_text)$\n", "\n", "$comment(comment\\_id, issue\\_number, comment\\_text)$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
issue_numberrepo_nameowner_nameissue_text
1112db-lecture-materialswkosiorI get `bash: git: command not found` when trying to download this :(
1345guixguix`librecad` package lacks icons and .desktop files from the upstream project at https://github.com/LibreCAD/LibreCAD.
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
comment_idissue_numbercomment_text
98141112You need to install the git command. E.g., with `apt install git`.
98191112Works like a charm. Thanks!
102391345Fixed in 58b522a.
" ] }, { "cell_type": "markdown", "id": "25075bfa", "metadata": {}, "source": [ "_notes_\n", "\n", "We can **decompose** relation schemas with data redundancy into smaller relation schemas that do not suffer from this problem.\n", "\n", "The resulting tables here would be joinable on the `issue_number` attribute. No information would be lost." ] }, { "cell_type": "markdown", "id": "6c69b175", "metadata": {}, "source": [ "#### Lossy Decomposition\n", "\n", "$issue(issue\\_number, repo\\_name, owner\\_name, issue\\_text)$\n", "\n", "$comment(comment\\_id, repo\\_name, owner\\_name, comment\\_text)$\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
issue_numberrepo_nameowner_nameissue_text
1112db-lecture-materialswkosiorI get `bash: git: command not found` when trying to download this :(
1345guixguix`librecad` package lacks icons and .desktop files from the upstream project at https://github.com/LibreCAD/LibreCAD.
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
comment_idrepo_nameowner_namecomment_text
9814db-lecture-materialswkosiorYou need to install the git command. E.g., with `apt install git`.
9819db-lecture-materialswkosiorWorks like a charm. Thanks!
10239guixguixFixed in 58b522a.
" ] }, { "cell_type": "markdown", "id": "8a17fd70", "metadata": {}, "source": [ "_notes_\n", "\n", "Such decomposed schema is broken — it doesn't keep the information about the precise issue that a comment has been made under. There could possibly be multiple issues opened for the same repo." ] }, { "cell_type": "markdown", "id": "df8d739e", "metadata": {}, "source": [ "Decomposition is **lossy** if it loses information." ] }, { "cell_type": "markdown", "id": "a9bf86f1", "metadata": {}, "source": [ "_notes_\n", "\n", "Decomposition is **lossless** if and only if a natural join of decomposed smaller relations gives back the original relation." ] }, { "cell_type": "markdown", "id": "1e56af3b", "metadata": {}, "source": [ "## 1NF\n", "\n", "First normal form." ] }, { "cell_type": "markdown", "id": "abe553fe", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_nameprivilegesrepo_namerepo_owner_name
branch_nameaccess_type
theodoremagisterpulldb-lecture-materialswkosior
join-diagram-wippush
theodore-experimentsforcepush
pancratiusmagisterpulldb-lecture-materialswkosior
pancratiustrunkpullguixtim448352
eugeniatrunkpullguixtim448352
" ] }, { "cell_type": "markdown", "id": "5b0ab2ed", "metadata": {}, "source": [ "_notes_\n", "\n", "A relation is in 1NF if it has no multivalued nor composite attributes. Note that many RDBMS engines do not even have dedicated support for such attributes. Others do. For example, Postgres features an array type, and SQL standard since 2003 specifies an XML type.\n", "\n", "Depending on the semantic of data, a single string-valued attribute can be effetively composite (or multivalued). One example is the `user_pass` attribute in WP database that holds both the password hash and the salt in a single string.\n", "\n", "The above is not in 1NF." ] }, { "cell_type": "markdown", "id": "31aad002", "metadata": {}, "source": [ "### Decomposition to 1NF" ] }, { "cell_type": "markdown", "id": "b7c7aa74", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
user_namebranch_nameaccess_typerepo_namerepo_owner_name
theodoremagisterpulldb-lecture-materialswkosior
theodorejoin-diagram-wippushdb-lecture-materialswkosior
theodoretheodore-experimentsforcepushdb-lecture-materialswkosior
pancratiusmagisterpulldb-lecture-materialswkosior
pancratiustrunkpullguixtim448352
eugeniatrunkpullguixtim448352
" ] }, { "cell_type": "markdown", "id": "2f1f6b08", "metadata": {}, "source": [ "## Functional Dependencies" ] }, { "cell_type": "markdown", "id": "ba007b76", "metadata": {}, "source": [ "$a, b, c, d, e$ — relation attributes\n", "\n", "$\\{a, b\\} \\rightarrow \\{c, d\\}$" ] }, { "cell_type": "markdown", "id": "c560c52e", "metadata": {}, "source": [ "_notes_\n", "\n", "Functional dependencies between relation attributes can be used to approach the issue of normalization in a formal way.\n", "\n", "We will use arrow notation.\n", "\n", "Functional dependency means that values of LHS attributes uniquely determine the values of RHS attributes." ] }, { "cell_type": "markdown", "id": "5a61548c", "metadata": {}, "source": [ "Assume the following relation.\n", "\n", "$r(a, b, c, d, e)$\n", "\n", "$t_1, t_2$ — tuples of a relation instance\n", "\n", "The functional dependency $\\{a, b\\} \\rightarrow \\{c, d\\}$ means that:\n", "\n", "$(t_1[a], t_1[b]) = (t_2[a], t_2[b]) \\implies (t_1[c], t_1[d]) = (t_2[c], t_2[d])$" ] }, { "cell_type": "markdown", "id": "d721d7eb", "metadata": {}, "source": [ "_notes_\n", "\n", "I.e., if we know the values of LHS attributes, we could tell what the RHS attribute values are.\n", "\n", "We can say that a relation instance **satisfies** a functional dependency. If all legal instances of a relation do so, then the functional dependency **holds** on schema." ] }, { "cell_type": "markdown", "id": "49ce1901", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abcde
11135.6724
11135.6724
21430.4311
225415.530
225415.539
" ] }, { "cell_type": "markdown", "id": "5d066613", "metadata": {}, "source": [ "_notes_\n", "\n", "Relation instance presented above satisfies the functional dependency we are considering.\n", "\n", "Change the value of `c` in the last row to 77. The new instance does not satisfy the functional dependency." ] }, { "cell_type": "markdown", "id": "d9f7fd06", "metadata": {}, "source": [ "#### Examples From the Git Forge Schema\n", "\n", "$name \\rightarrow email$\n", "\n", "$\\{added, group\\_name\\} \\rightarrow adder$" ] }, { "cell_type": "markdown", "id": "c70462ea", "metadata": {}, "source": [ "_notes_\n", "\n", "LHS and RHS of a functional dependency are **sets** of attributes but we can abuse the notation and drop the braces when we have just one attribute." ] }, { "cell_type": "markdown", "id": "57392202", "metadata": {}, "source": [ "#### Trivial Functional Dependencies\n", "\n", "$\\{added, group\\_name\\} \\rightarrow group\\_name$" ] }, { "cell_type": "markdown", "id": "bd4417ef", "metadata": {}, "source": [ "#### Armstrong's Axioms" ] }, { "cell_type": "markdown", "id": "6c3da0ab", "metadata": {}, "source": [ "$\\alpha, \\beta, \\gamma$ — attribute sets\n", "\n", "$\\alpha\\gamma$ — simplified notation for $\\alpha{\\cup}\\gamma$\n", "\n", "The axioms are:\n", "\n", "$\\beta{\\subseteq}\\alpha \\implies \\alpha \\rightarrow \\beta$\n", "\n", "$\\alpha \\rightarrow \\beta \\implies \\alpha\\gamma \\rightarrow \\beta\\gamma$\n", "\n", "$\\alpha \\rightarrow \\beta \\land \\beta \\rightarrow \\gamma \\implies \\alpha \\rightarrow \\gamma$" ] }, { "cell_type": "markdown", "id": "d9fc19a9", "metadata": {}, "source": [ "_notes_\n", "\n", "Functional dependencies can logically imply other functional dependencies. A set $F^+$ of all functional dependencies implied by set $F$ is called **closure**. Armstrong's axioms allow finding the closure of every set of functional dependencies.\n", "\n", "Other tautologies can be proved from these axioms." ] }, { "cell_type": "markdown", "id": "17435f19", "metadata": {}, "source": [ "#### Implied Functional Dependencies Example\n", "\n", "We have the following functional dependencies.\n", "\n", "$name \\rightarrow email$\n", "\n", "$email \\rightarrow \\{name, registration\\_date, email, password\\_hash, salt\\}$\n", "\n", "We can then prove the following as well.\n", "\n", "$\\{password\\_hash, salt\\} \\rightarrow \\{password\\_hash, salt\\}$\n", "\n", "$name \\rightarrow \\{registration\\_date, email\\}$" ] }, { "cell_type": "markdown", "id": "e010e5f8", "metadata": {}, "source": [ "_notes_\n", "\n", "$\\{password\\_hash, salt\\} \\rightarrow \\{password\\_hash, salt\\}$ is a trivial one." ] }, { "cell_type": "markdown", "id": "8c4e78ab", "metadata": {}, "source": [ "## BCNF\n", "\n", "Boyce-Codd Normal Form (also called 3.5 Normal Form)." ] }, { "cell_type": "markdown", "id": "56a05fdc", "metadata": {}, "source": [ "Let\n", "\n", "$R$ — set of relation attributes\n", "\n", "$F$ — set of functional dependencies with attributes in $R$\n", "\n", "$F^+$ — closure of $F$\n", "\n", "A relation schema is in BCNF if and only if it is in 1NF, and\n", "\n", "$\\forall (\\alpha \\rightarrow \\beta){\\in}F^+: \\beta{\\subseteq}\\alpha \\lor \\alpha\\ is\\ a\\ superkey$" ] }, { "cell_type": "markdown", "id": "7875974e", "metadata": {}, "source": [ "### BCNF Example" ] }, { "cell_type": "markdown", "id": "403002aa", "metadata": {}, "source": [ "Consider again the following attributes and relation schema.\n", "\n", "$R = \\{issue\\_number, repo\\_name, owner\\_name, issue\\_text, comment\\_id, comment\\_text\\}$\n", "\n", "$issues\\_comments(R)$\n", "\n", "We have these (and more) functional dependencies.\n", "\n", "$comment\\_id \\rightarrow comment\\_text$\n", "\n", "$comment\\_id \\rightarrow issue\\_number$\n", "\n", "$issue\\_number \\rightarrow \\{repo\\_name, owner\\_name, issue\\_text\\}$" ] }, { "cell_type": "markdown", "id": "b15e1c7c", "metadata": {}, "source": [ "_notes_\n", "\n", "We won't be writing down all the other functional dependencies that result from these. " ] }, { "cell_type": "markdown", "id": "5d42cef9", "metadata": {}, "source": [ "The first two functional dependencies are OK wrt the BCNF requirements." ] }, { "cell_type": "markdown", "id": "ddbb3d60", "metadata": {}, "source": [ "_notes_\n", "\n", "`comment_id` is the primary key (and **the only** candidate key)." ] }, { "cell_type": "markdown", "id": "62600027", "metadata": {}, "source": [ "The last functional dependency\n", "\n", "- is not trivial, and\n", "- has LHS that is not a superkey.\n", "\n", "Therefore, the dependency $issue\\_number \\rightarrow \\{repo\\_name, owner\\_name, issue\\_text\\}$ violates BCNF." ] }, { "cell_type": "markdown", "id": "bc9e10cb", "metadata": {}, "source": [ "_notes_\n", "\n", "`issue_number` does not uniquely identify each tuple in the relation." ] }, { "cell_type": "markdown", "id": "42293b57", "metadata": {}, "source": [ "### Decomposition to BCNF Schemas" ] }, { "cell_type": "markdown", "id": "c3a6e1c4", "metadata": {}, "source": [ "If we have a nontrivial functional dependency $\\alpha \\rightarrow \\beta$ that violates BCNF, then we can decompose the relation schema into two separate schemas:\n", "\n", "$\\alpha{\\cup}\\beta$\n", "\n", "$R{-}(\\beta{-}\\alpha)$\n", "\n", "In our case:\n", "\n", "$R = \\{issue\\_number, repo\\_name, owner\\_name, issue\\_text, comment\\_id, comment\\_text\\}$\n", "\n", "$\\alpha = issue\\_number$\n", "\n", "$\\beta = \\{repo\\_name, owner\\_name, issue\\_text\\}$\n", "\n", "We get the following decomposed relation schemas.\n", "\n", "$issue(\\alpha{\\cup}\\beta) = issue(issue\\_number, repo\\_name, owner\\_name, issue\\_text)$\n", "\n", "$comment(R{-}(\\beta{-}\\alpha)) = comment(comment\\_id, issue\\_number, comment\\_text)$" ] }, { "cell_type": "markdown", "id": "2e7203ae", "metadata": {}, "source": [ "_notes_\n", "\n", "Here this would give us the same decomposition we used earlier.\n", "\n", "Note that merely checking if a database schema is in BCNF is an NP-complete problem. We don't discuss the particular algorithms as part of this course." ] }, { "cell_type": "markdown", "id": "e8127d26", "metadata": {}, "source": [ "### Dependency Preservation" ] }, { "cell_type": "markdown", "id": "cf4d2695", "metadata": {}, "source": [ "Consider this schema, dependencies, and sample instance.\n", "\n", "$repo\\_role\\_user(repo\\_name, owner\\_name, role\\_id, user)$\n", "\n", "$\\{repo\\_name, owner\\_name, user\\} \\rightarrow role\\_id$\n", "\n", "$role\\_id \\rightarrow \\{repo\\_name, owner\\_name\\}$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
repo_nameowner_namerole_iduser
db-lecture-materialswkosior342theodore
db-lecture-materialswkosior377pancratius
guixtim448352392pancratius
guixtim448352392eugenia
" ] }, { "cell_type": "markdown", "id": "baefb785", "metadata": {}, "source": [ "_notes_\n", "\n", "Assume a scenario where each repository can have multiple roles defined and users can have at most one role for each repository. Role is repo-specific and defines privileges." ] }, { "cell_type": "markdown", "id": "ea61a132", "metadata": {}, "source": [ "We have the following candidate keys.\n", "\n", "$k_1 = \\{repo\\_name, owner\\_name, user\\}$\n", "\n", "$k_2 = \\{role\\_id, user\\}$" ] }, { "cell_type": "markdown", "id": "452a3c2e", "metadata": {}, "source": [ "_notes_\n", "\n", "Functional dependency $\\{repo, user\\} \\rightarrow role\\_id$ has a candidate key on its LHS. It does not violate BCNF.\n", "\n", "Functional dependency $role\\_id \\rightarrow \\{repo\\_name, owner\\_name\\}$ is not trivial and its LHS is not a superkey. It does violate the BCNF." ] }, { "cell_type": "markdown", "id": "836735af", "metadata": {}, "source": [ "We decompose according to our rule specified earlier.\n", "\n", "$R = \\{repo\\_name, owner\\_name, role\\_id, user\\}$\n", "\n", "$\\alpha = role\\_id$\n", "\n", "$\\beta = \\{repo\\_name, owner\\_name\\}$\n", "\n", "The following are new relation schemas and their sample instances.\n", "\n", "$role\\_repo(\\alpha{\\cup}\\beta) = role\\_repo(role\\_id, repo\\_name, owner\\_name)$\n", "\n", "$user\\_role(R{-}(\\beta{-}\\alpha)) = user\\_role(user, role\\_id)$\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
role_idrepo_nameowner_name
342db-lecture-materialswkosior
377db-lecture-materialswkosior
392guixtim448352
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userrole_id
theodore342
pancratius377
pancratius392
eugenia392
\n", "
\n", "\n", "Recall our functional dependency $\\{repo\\_name, owner\\_name, user\\} \\rightarrow role\\_id$." ] }, { "cell_type": "markdown", "id": "98f0debb", "metadata": {}, "source": [ "_notes_\n", "\n", "This dependency involves attributes that no longer appear together in any single relation schema.\n", "\n", "We could no longer tell whether this dependency is satisfied by looking at a single relation instance.\n", "\n", "We say this decomposition **is not dependency preserving**. This is an undesirable situation.\n", "\n", "Not all relation schemas have a dependency preserving decomposition to a set of BCNF shemas." ] }, { "cell_type": "markdown", "id": "337dfbfc", "metadata": {}, "source": [ "## 3NF\n", "\n", "Third Normal Form." ] }, { "cell_type": "markdown", "id": "15e40158", "metadata": {}, "source": [ "Recall that a relation schema is in BCNF if and only if it is in 1NF, and\n", "\n", "$\\forall (\\alpha \\rightarrow \\beta){\\in}F^+: \\beta{\\subseteq}\\alpha \\lor \\alpha\\ is\\ a\\ superkey$\n", "\n", "Let's relax the condition relating to dependencies. Replace it with the following.\n", "\n", "$\\forall (\\alpha \\rightarrow \\beta){\\in}F^+: \\beta{\\subseteq}\\alpha \\lor \\alpha\\ is\\ a\\ superkey \\lor \\forall A{\\in}(\\beta-\\alpha) \\exists k{\\in}K: A{\\in}k\\$\n", "\n", "where\n", "\n", "$K$ — set of relation's all candidate keys" ] }, { "cell_type": "markdown", "id": "c0852599", "metadata": {}, "source": [ "_notes_\n", "\n", "This is one of the possible, equivalent definitions of 3NF (not the canonical one).\n", "\n", "3NF permits nonkey attributes on the LHS of a nontrivial functional dependency provided that the relevant attributes in the RHS belong to some candidate keys. These attribute are not required to belong to the same candidate key, though." ] }, { "cell_type": "markdown", "id": "036a3ccf", "metadata": {}, "source": [ "### 3NF in the User Roles Example" ] }, { "cell_type": "markdown", "id": "69bad4c2", "metadata": {}, "source": [ "Recall our earlier schema, dependencies and candidate keys.\n", "\n", "$repo\\_role\\_user(repo\\_name, owner\\_name, role\\_id, user)$\n", "\n", "$\\{repo\\_name, owner\\_name, user\\} \\rightarrow role\\_id$\n", "\n", "$role\\_id \\rightarrow \\{repo\\_name, owner\\_name\\}$\n", "\n", "$k_1 = \\{repo\\_name, owner\\_name, user\\}$\n", "\n", "$k_2 = \\{role\\_id, user\\}$\n", "\n", "We see that $repo\\_name{\\in}k_1$ and $owner\\_name{\\in}k_1$." ] }, { "cell_type": "markdown", "id": "1c842f46", "metadata": {}, "source": [ "_notes_\n", "\n", "The second functional dependency violates BCNF on $user\\_repo\\_role$ and leads to a decomposition that is not dependency preserving.\n", "\n", "If we instead strive for 3NF, we shall see that this dependency does not violate it. Dependency's RHS attributes all belong to a candidate key. 3NF does not require further decomposition of this schema.\n", "\n", "3NF is weaker than BCNF and allows some degree of redundancy while preserving functional dependencies." ] }, { "cell_type": "markdown", "id": "c6e2c0ba", "metadata": {}, "source": [ "### Multivalued dependencies" ] }, { "cell_type": "markdown", "id": "2461d28c", "metadata": {}, "source": [ "Assume users can watch (aka subscribe to) others' repositories. Consider this relation schema and sample instance.\n", "\n", "$account(name, email, ssh\\_key, watched\\_repo\\_name, watched\\_repo\\_owner)$\n", "\n", "$name \\rightarrow email$\n", "\n", "$email \\rightarrow name$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameemailwatched_repo_namewatched_repo_ownerssh_key
wkosiorwkosior@agh.edu.plguixguixssh-ed25519 AAAAC3…dyItC
wkosiorwkosior@agh.edu.plguixguixssh-ed25519 AAAAC3…ujdSO
wk-botwk-bot@gnu.orgapache-configswkosiorssh-ed25519 AAAAC3…ujdSO
theodore543210@student.agh.edu.pldb-lecture-materialswkosiorssh-ed25519 AAAAC3…T7EjR
theodore543210@student.agh.edu.pldb-lecture-materialswkosiorssh-rsa AAAAB3…PucE=
theodore543210@student.agh.edu.plguixtim448352ssh-ed25519 AAAAC3…T7EjR
theodore543210@student.agh.edu.plguixtim448352ssh-rsa AAAAB3…PucE=
pancratius500000@student.agh.edu.plapache-configswkosiorssh-dss AAAAB3…0yxM=
" ] }, { "cell_type": "markdown", "id": "37db72bb", "metadata": {}, "source": [ "_notes_\n", "\n", "A user can have multiple SSH keys and can be watching multiple repositories. It's obviously wrong to store all these in a single relation, but let's hold on to this example just for a moment.\n", "\n", "Please do not mind the omission of login data in the example and truncated display of SSH keys — this is to avoid making the table even wider. Please also treat $ssh\\_key$ as if it were an atomic attribute, even though we see a way to decompose it.\n", "\n", "There's clearly no dependency between the SSH keys and watched repositories. For each user we shall therefore repeat user's every SSH key with user's every watched repository, in separate rows / tuples. As when computing a cartesian product.\n", "\n", "Temporarily remove the 2 middle rows of Theodore. Note that now the relation instance indicates some connection between Theodore's particular key and watched repository. This is not what we want, so we need to get back to the cartesian-product-like instance.\n", "\n", "Note that there are no nontrivial functional dependencies involving $ssh\\_key$, $watched\\_repo\\_name$, and $watched\\_repo\\_owner$.\n", "\n", "The relation schema, although bad, is in BCNF. We have not yet learned any formal rule we could use to decompose it." ] }, { "cell_type": "markdown", "id": "9495194a", "metadata": {}, "source": [ "We have the following **multivalued dependencies**.\n", "\n", "$name \\twoheadrightarrow ssh\\_key$\n", "\n", "$name \\twoheadrightarrow \\{watched\\_repo\\_name, watched\\_repo\\_owner\\}$" ] }, { "cell_type": "markdown", "id": "bb5b6793", "metadata": {}, "source": [ "_notes_\n", "\n", "The first dependency means that the value of $name$ is always associated with the same set of values of $ssh\\_key$, regardless of the values of other attribtues.\n", "\n", "The second dependency means that the value of $name$ is always associated with the same set of value tuples of $\\{watched\\_repo\\_name, watched\\_repo\\_owner\\}$, regardless of the values of other attributes.\n", "\n", "Our instance satisfies both dependencies. If we remove the middle 2 of Theodore's rows, it shall satisfy neither.\n", "\n", "The mathematical notation for multivalued dependency constraint can be found in literature." ] }, { "cell_type": "markdown", "id": "137a3d4f", "metadata": {}, "source": [ "More could be derived, for example the following.\n", "\n", "$email \\twoheadrightarrow \\{watched\\_repo\\_name, watched\\_repo\\_owner\\}$\n", "\n", "$\\{ssh\\_key, watched\\_repo\\_name\\} \\twoheadrightarrow ssh\\_key$ — a trivial one" ] }, { "cell_type": "markdown", "id": "035dbf6a", "metadata": {}, "source": [ "### 4NF\n", "\n", "Fourth Normal Form." ] }, { "cell_type": "markdown", "id": "88a61c3d", "metadata": {}, "source": [ "Let\n", "\n", "$R$ — set of relation attributes\n", "\n", "$M^+$ — set of all functional **and** multivalued dependencies with attributes in $R$\n", "\n", "A relation schema is in 4NF if and only if it is in BCNF, and\n", "\n", "$\\forall (\\alpha \\twoheadrightarrow \\beta){\\in}M^+: \\beta{\\subseteq}\\alpha \\lor \\alpha{\\cup}\\beta = R \\lor \\alpha\\ is\\ a\\ superkey$" ] }, { "cell_type": "markdown", "id": "5ea649af", "metadata": {}, "source": [ "_notes_\n", "\n", "Note that every superkey of our sample relation schema must contain all three: $ssh\\_key$, $watched\\_repo\\_name$, and $watched\\_repo\\_owner$. Sample candidate key is $\\{name, ssh\\_key, watched\\_repo\\_name, watched\\_repo\\_owner\\}$.\n", "\n", "The dependency $name \\twoheadrightarrow ssh\\_key$ is not trivial and its LHS is not a superkey.\n", "\n", "Similarly, the dependency $name \\twoheadrightarrow \\{watched\\_repo\\_name, watched\\_repo\\_owner\\}$ is not trivial and its LHS is not a superkey.\n", "\n", "The relation schema is in BCNF but it is not in 4NF." ] }, { "cell_type": "markdown", "id": "55ee9e3f", "metadata": {}, "source": [ "### Decomposition to 4NF Schemas" ] }, { "cell_type": "markdown", "id": "0769a7de", "metadata": {}, "source": [ "If we have a nontrivial multivalued dependency $\\alpha \\twoheadrightarrow \\beta$ that violates 4NF, then we can decompose the relation schema into two separate schemas:\n", "\n", "$\\alpha{\\cup}\\beta$\n", "\n", "$R{-}(\\beta{-}\\alpha)$" ] }, { "cell_type": "markdown", "id": "268228f4", "metadata": {}, "source": [ "_notes_\n", "\n", "This is analogous to our rule of decomposition to BCNF schemas." ] }, { "cell_type": "markdown", "id": "64a283c9", "metadata": {}, "source": [ "In our case:\n", "\n", "$R = \\{name, email, ssh\\_key, watched\\_repo\\_name, watched\\_repo\\_owner\\}$\n", "\n", "$\\alpha = name$\n", "\n", "$\\beta = ssh\\_key$\n", "\n", "We get the following decomposed relation schemas.\n", "\n", "$account\\_ssh\\_key(\\alpha{\\cup}\\beta) = account\\_ssh\\_key(name, ssh\\_key)$\n", "\n", "$account'(R{-}(\\beta{-}\\alpha)) = account'(name, email, watched\\_repo\\_name, watched\\_repo\\_owner)$" ] }, { "cell_type": "markdown", "id": "f13ae7bb", "metadata": {}, "source": [ "The $account'$ schema is still not in 4NF, so we can decompose it furter and get the following 4NF schemas and sample relation instances.\n", "\n", "$account\\_ssh\\_key(name, ssh\\_key)$\n", "\n", "$account\\_watch(name, watched\\_repo\\_name, watched\\_repo\\_owner)$\n", "\n", "$account''(name, email)$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameemail
wkosiorwkosior@agh.edu.pl
wk-botwk-bot@gnu.org
theodore543210@student.agh.edu.pl
pancratius500000@student.agh.edu.pl
\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namessh_key
wkosiorssh-ed25519 AAAAC3…dyItC
wkosiorssh-ed25519 AAAAC3…ujdSO
wk-botssh-ed25519 AAAAC3…ujdSO
theodoressh-ed25519 AAAAC3…T7EjR
theodoressh-rsa AAAAB3…PucE=
pancratiusssh-dss AAAAB3…0yxM=
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namewatched_repo_namewatched_repo_owner
wkosiorguixguix
wk-botapache-configswkosior
theodoredb-lecture-materialswkosior
theodoreguixtim448352
pancratiusapache-configswkosior
\n", "
" ] }, { "cell_type": "markdown", "id": "986cca34", "metadata": {}, "source": [ "### Another 4NF Example" ] }, { "cell_type": "markdown", "id": "270ecc35", "metadata": {}, "source": [ "Consider this schema, dependencies, and sample instance.\n", "\n", "$repo\\_role\\_user(repo\\_name, owner\\_name, role\\_id, branch, privilege, user)$\n", "\n", "$\\{repo\\_name, owner\\_name, user\\} \\rightarrow role\\_id$\n", "\n", "$role\\_id \\rightarrow \\{repo\\_name, owner\\_name\\}$\n", "\n", "$role\\_id \\twoheadrightarrow \\{branch, privilege\\}$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
repo_nameowner_namerole_idbranchprivilegeuser
db-lecture-materialswkosior342magisterpulltheodore
db-lecture-materialswkosior342join-diagram-wippushtheodore
db-lecture-materialswkosior342theodore-experimentsforcepushtheodore
db-lecture-materialswkosior377magisterpullpancratius
guixtim448352392trunkpullpancratius
guixtim448352392trunkpulleugenia
" ] }, { "cell_type": "markdown", "id": "75f2cc0a", "metadata": {}, "source": [ "We first decompose to the following BCNF schemas.\n", "\n", "$user\\_role(user, role\\_id, branch, privilege)$\n", "\n", "$role\\_repo(role\\_id, repo\\_name, owner\\_name)$\n", "\n", "Then we further decompose to 4NF schemas.\n", "\n", "$role\\_repo(role\\_id, repo\\_name, owner\\_name)$\n", "\n", "$role(role\\_id, branch, privilege)$\n", "\n", "$user\\_role'(user, role\\_id)$\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
role_idbranchprivilege
342magisterpull
342join-diagram-wippush
342theodore-experimentsforcepush
377magisterpull
392trunkpull
\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
role_idrepo_nameowner_name
342db-lecture-materialswkosior
377db-lecture-materialswkosior
392guixtim448352
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
userrole_id
theodore342
pancratius377
pancratius392
eugenia392
\n", "
" ] }, { "cell_type": "markdown", "id": "32bf37df", "metadata": {}, "source": [ "_notes_\n", "\n", "These are relation instances that correspond to the initial instance." ] }, { "cell_type": "markdown", "id": "fd282530", "metadata": {}, "source": [ "### Notes on Database Design" ] }, { "cell_type": "markdown", "id": "104e6a5a", "metadata": {}, "source": [ "_notes_\n", "\n", "Other normal forms exist that are not covered here.\n", "\n", "One can design a database by\n", "\n", "- listing all needed relation attributes and relevant dependencies and normalizing the design starting with a single big relation,\n", "- creating an (E)ER model and mapping it to relational model, or\n", "- designing the desired relations ad-hoc and (possibly) later testing if they are in 3NF / 4NF.\n", "\n", "We have to choose between avoiding data redundancy (BCNF, 4NF) and ensuring dependency preservation (3NF).\n", "\n", "There is no simple, straightforward way to enforce functional dependencies with SQL constraints. BCNF (4NF) is therefore often preferred to 3NF.\n", "\n", "We are likely to obtain a 4NF database schema if we come from an (E)ER model.\n", "\n", "Although redundancy is often undesirable, some forms of redundancy can be consciously included in the design for practical reasons. The same is the case with multivalued and composite attributes." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "/gnu/store/q35bxk1i5blx0rcgxphhrq9xqmgha9bz-python-3.11.11/bin/python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }