Thursday, July 2, 2009

What’s in a name?

I’ve worked on database projects for over 20 years. I go back to FoxBASE and dBase. I’ve also been through a number of naming conventions in that time. I’ve developed a personal preference but can work with other preferences because I can usually figure them out fairly quickly.

Here’s two rules I use as part of my database designs:

  1. Use a surrogate or artificial key as the primary key to avoid the need for cascading deletes or updates.
  2. Use the natural name for each element so that elements are easily identifiable when in discussions with the user.

I can’t tell you how much time I have saved over the years with rule 2. Users have no idea of your database design (and they shouldn’t!) but they know exactly which piece of information they want. It’s a whole lot easier if they can write “client’s birth date” in a report request and in the Clients table there is a field called BirthDate.

My problem is lately, I’ve noticed that a lot of people seem to be using a convention that drives me nuts - the use of ElementNameID as the primary key. This is the standard in a lot of the Microsoft examples.

What's the big deal? In one of our systems we have a table named Clients that holds…wait for it..a client. That client has an ID that is assigned from a legacy mainframe system and of course it’s called ClientID. This element has become a part of the users culture and is used to identify clients and it’s natural name is Client ID in user discussions.

“So just make it your primary key and you’re fine.” Not so fast. Remember, this is coming from a legacy system, in this case a 25 year old system. In those days disk space was expensive and you squeezed everything you could into a field. Specifically, the first 6 positions are the family group and the last two differentiate between individuals.

My problem here is that since the value has internal meaning, someone, somewhere will want to change this value. For this reason, I always use artificial or surrogate keys (see rule 1). No worries about cascading updates or orphaned rows.

Ok, so now my clients will have two values named ClientID. One the actual key and the other a legacy key. I could use an underscore for one, i.e. Client_ID. I could simply rename the legacy value ClientIDLegacy. Either one means that everyone who looks at my data will have to figure out what I did.

Worse, the natural name and the database name are different, violating rule 2. I could use Client_ID for the key instead but that means I need to change all of my other tables to match or live with the inconsistency (meaning remember to use the underscore on this one table).

My solution? It’s very simple, ClientKey. The element we are describing is a key and not an ID. To me, ID (or identifier) is something to be consumed by humans. Keys are for systems. I really can’t remember something being called “key” in a non-IT environment.

It may seem trivial, but I can think of half a dozen cases of the top of my head where you’ll run into this. It may not get you on this system but how about the next one? Now you have two systems with two different conventions. Now you need to remember what you did in one and not in the other. After 20 years, I’m limiting what I need to remember only to those things I can’t control <g>!

No comments: