Learning to Love Foreign Keys

This was adapted from a piece i wrote for work, for co-workers

Putting the Relations in a Relational Database

A database is relational collection of data points. Data points that are conceptually linked are combined into a set that we call a record (or sometimes a set, or sometimes a tuple, if you are talking to the theory types). A record might be a single object such as a person (with data points such as name, address, and date of birth) or a charge on a ledger (with data points such as description, cost, and who to bill). These sets of data points we call records are combined into a bigger set we call a table. These tables are comined into an even bigger set we call a database. It’s sets all the way down. Without all this relational linking, you may as well just use a stupid spreadsheet or something.

So we have all these records in all these tables, how do we connect data from one table to another? You have your table of charges, but how do we know to whom each charge belongs? You can have a column in your charge table that says “yeah this charge belongs to derek” but how do you know which Derek? What if Derek changes their name? Before we solve this mystery together, let’s take a step back to talk about keys.

Keys

A primary key is a data point that is used to uniquely identify a record on a table. There are a few different ways to create a primary key. Firstly, there is a natural key. A natural key is a data point that is already used in the real world. A common example of this is a social security number; this is a unique number that identifies an individual; in your database your person record might have a field for an SSN, and this might seem like a natural fit for a primary key. There is also the surrogate key. This is a key that doesn’t have any actual meaning, outside of representing that row in the table. There are also composite keys, keys that are composed of multiple data points, for example, last name + SSN.

There’s a lot of debate out there as to which is best to use, natural or surrogate keys; all you need to know is that people who suggest that natural keys are best, are inherently wrong. Natural keys are created by humans, and humans are inevitably falliable. Names are not unique, and can change. Social Security Numbers are occasionally unintentionally non-unique, and only 4.48% of humans have one. Surrogate IDs, created automatically for the sole purpose of identifying a record, is the correct thing to use in your database.

So now with a brief exploration of keys out of the way, let’s get back to the matter of tying tables together. In our imaginary charge table, we need to know to which person the charge belongs. We could just have a column named person_id and plan on putting the appropriate id in there, but what if we put in the wrong number, or what if the person gets accidentally deleted, and now we have a charge record tied to nobody? Ideally, we would enforce some constraints to prevent these things from happening.

That’s where the foreign key comes in. We tell the database management system that this column in this table, its data points are going to explicitly refer to these corresponding data points in this other table, tying the two records together. In our example of the charge and person tables, with such a constraint in place, the charge table’s person_id column can only contain an person_id that exists in the person table’s person_id column, or, if allowed, a NULL value indicating that it is unknown to which person the charge belongs.

Putting It to Use

Let’s look at our example tables below, first the charge table and then the person table:

charge_iddescriptioncostperson_id
1fancy hat$321
2electric toothbrush$123
3saxophone$1794
person_idfirst_namelast_name
1SylviaCarter
2TraceySchroeder
3RumikaEchi
4EdwardJordan

If we were doing some basic data delving, we would start out getting a list of the items we’ve sold, with a SELECT charge_id, description, cost, person_id FROM charge;, and see our first result set. If we wanted to know who purchased that nice saxophone, well, we have their person_id, and thanks to the constraint in place, we can be certain that barring any data entry errors on the behalf of our staff, we can find out who purchased it, with a simple SELECT first_name, last_name FROM person WHERE person_id = 4;, and we know that it was Edward Jordan who must be the aspiring jazz musician.

Of course, if you’ve already learned about the wonderful world of JOINs, you would know that we could find this information out with a single query, by entering SELECT charge_id, description, cost, first_name, last_name FROM charge JOIN person ON charge.person_id = person.person_id;.

Another benefit from this foreign key relationship is that someone would not be able to delete Sylvia’s person record accidentally, thus leaving the fancy hat order behind as an orphan; the database management system would prevent that with an error. You would have to first UPDATE or DELETE the record in the charge table, to remove Sylvia’s record in the person table.

How Do I Find Out What References My Column?

One last note; oftentimes when trying to remove a record in a table in a larger database, there can be a great number of things that reference that record, that need to be changed or removed prior to removing that record. The database management system maintains a live database of every database, table, and column it uses, called information_schema. In particular, the KEY_COLUMN_USAGE table in this database keeps track of every foreign key, and can be queried to discover what columns in other tables might contain foreign keys to your column in question. The TABLE_SCHEMA column will tell you which database a foreign key relationship is in, TABLE_NAME and COLUMN_NAME will tell you which table and column is doing the referring, and REFERENCED_TABLE_SCHEMA, REFERENCED_TABLE_NAME, and REFERENCED_COLUMN_NAME will tell you to which database’s table’s column is being referred. So, in my orders database above, if i wanted to know what all refers to my person table so that i can delete a record in that table, i couls query the information_schema database with SELECT TABLE_NAME, COLUMN_NAME FROM KEY_COLUMN_USAGE WHERE REFERENCED_TABLE_NAME = 'person' AND REFERENCED_COLUMN_NAME = 'person_id' AND TABLE_SCHEMA = 'orders';. Very handy indeed!

I hope you enjoyed this exploration of foreign keys, and as always, feel free to ask me if you have any questions.

jwiltshire Written by:

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *