Sunday, February 19, 2012

Design question (CELKO?)

Hi all,
I have a design question:
I have a bunch of users (name, address, zip, etc.). They are assigned a
card with a specific id.
The only thing unique is this card id, or probably the combination of
all other user fields.
So it's seductive to use the card id as the primary key.
This card allows access to certain places and all access is logged.
The problem is, a user might lose a card and will be issued a new one
with a different id. Perhaps it may even be desirable to assign
mutliple cards with different ids to one person, but let's put that
scenario aside for a moment.
Issuing a new card to a user could mean updating the card id with a new
one, e.g.:
UPDATE Users SET CardId=124 WHERE CardId=123
But same would have to be done with all 'logs' that refer to the lost
card id.
And some kind of logging of this change needs to occur to maintain some
'history'.
Consistency could be handled with triggers or something..
Doesn't seem like a good solution.
It seems obvious that I should seperate user from card id, where card
id is no longer primary key, rather just a field within the users
table, or if multiple cards can be assigned to the same user, a
seperate table for cards mapped to user records.
So then how do I uniquely identify a user? It seems easy enough to
simply use an IDENTITY column for this, however, I do agree with
'CELKO's' (sp?) principle that best design is one where data in the
database actually represents something in the real world. In particular
when backing up, reinserting and other things could goof up
(re-generate) these id's.
And there's my dilemma. How do I uniquely identify each user record
consisting of just name, address, zip, city and perhaps phone number?
There are no other id's that represent the user, such as social
security number or whatever.
Actually, the card is going to be assigned to a household (home
address), so the combination of house number, street and zip is
probably unique but using this as a primary key would make all queries
quite cumbersome, compared to some id.
Is this a case where an identity (aka auto_increment) is inevitable or
even highly recommended?
Another alternative I thought of was to use the card id as primary key,
but have the same user record appear in the users table multiple times,
but with a different card id, where each card id might actally be
'blocked' or not. This blocked cards can then be deleted once the
records in other tables that refer to that id are also deleted, such as
logs. Sounds ugly as this introduces some redundancy of user records,
but might actually be acceptable I think, because logs are deleted
every now and then and lost cards will be rare. Not sure if I should
design the database with the 'rule' or 'exception' in mind.
Yes, I'm a rookie.. and I don't mind being chewed out for it, as long
as the lecture includes something useful I can work with and learn from
:)
Lisa> It seems obvious that I should seperate user from card id, where card
> id is no longer primary key, rather just a field within the users
> table, or if multiple cards can be assigned to the same user, a
> seperate table for cards mapped to user records.
This seems to me to be the best approach. Not only will this allow you to
have multiple cards per user, this approach will allow you to easily keep
history related to lost cards along with referential integrity. The primary
key of such a UserCards table would be a composite consisting of the Users
and Cards table PK values.
I'll let Joe chew you out for even considering surrogate keys. Personally I
have no problem introducing a surrogate key (identity or guide) when no
suitable one exists as long as one takes reasonable precautions to avoid
duplicate data. However, one shouldn't introduce an artificial key without
due diligence. Far too many surrogate keys are a result of lazy developers.
Hope this helps.
Dan Guzman
SQL Server MVP
"Lisa" <reageer@.yahoo.com> wrote in message
news:1133345102.826916.134290@.z14g2000cwz.googlegroups.com...
> Hi all,
> I have a design question:
> I have a bunch of users (name, address, zip, etc.). They are assigned a
> card with a specific id.
> The only thing unique is this card id, or probably the combination of
> all other user fields.
> So it's seductive to use the card id as the primary key.
> This card allows access to certain places and all access is logged.
> The problem is, a user might lose a card and will be issued a new one
> with a different id. Perhaps it may even be desirable to assign
> mutliple cards with different ids to one person, but let's put that
> scenario aside for a moment.
> Issuing a new card to a user could mean updating the card id with a new
> one, e.g.:
> UPDATE Users SET CardId=124 WHERE CardId=123
> But same would have to be done with all 'logs' that refer to the lost
> card id.
> And some kind of logging of this change needs to occur to maintain some
> 'history'.
> Consistency could be handled with triggers or something..
> Doesn't seem like a good solution.
> It seems obvious that I should seperate user from card id, where card
> id is no longer primary key, rather just a field within the users
> table, or if multiple cards can be assigned to the same user, a
> seperate table for cards mapped to user records.
> So then how do I uniquely identify a user? It seems easy enough to
> simply use an IDENTITY column for this, however, I do agree with
> 'CELKO's' (sp?) principle that best design is one where data in the
> database actually represents something in the real world. In particular
> when backing up, reinserting and other things could goof up
> (re-generate) these id's.
> And there's my dilemma. How do I uniquely identify each user record
> consisting of just name, address, zip, city and perhaps phone number?
> There are no other id's that represent the user, such as social
> security number or whatever.
> Actually, the card is going to be assigned to a household (home
> address), so the combination of house number, street and zip is
> probably unique but using this as a primary key would make all queries
> quite cumbersome, compared to some id.
> Is this a case where an identity (aka auto_increment) is inevitable or
> even highly recommended?
> Another alternative I thought of was to use the card id as primary key,
> but have the same user record appear in the users table multiple times,
> but with a different card id, where each card id might actally be
> 'blocked' or not. This blocked cards can then be deleted once the
> records in other tables that refer to that id are also deleted, such as
> logs. Sounds ugly as this introduces some redundancy of user records,
> but might actually be acceptable I think, because logs are deleted
> every now and then and lost cards will be rare. Not sure if I should
> design the database with the 'rule' or 'exception' in mind.
> Yes, I'm a rookie.. and I don't mind being chewed out for it, as long
> as the lecture includes something useful I can work with and learn from
> :)
> Lisa
>|||So, if I just have Name, address, zip, city and no more, should I use an
IDENTITY column for primary key? If you had a create a table of people of
whom you only know the above information, what would you use for primary
key?
And what if the natural primary key is a long string? While queries might be
fast due to indexing, typing out this long string in each query, rather than
using numbers for id, seems quite cumbersome. So what would you do in this
situation?
Lisa
"Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
news:ehSzG6Z9FHA.1332@.tk2msftngp13.phx.gbl...
> This seems to me to be the best approach. Not only will this allow you to
> have multiple cards per user, this approach will allow you to easily keep
> history related to lost cards along with referential integrity. The
> primary key of such a UserCards table would be a composite consisting of
> the Users and Cards table PK values.
> I'll let Joe chew you out for even considering surrogate keys. Personally
> I have no problem introducing a surrogate key (identity or guide) when no
> suitable one exists as long as one takes reasonable precautions to avoid
> duplicate data. However, one shouldn't introduce an artificial key
> without due diligence. Far too many surrogate keys are a result of lazy
> developers.
> --
> Hope this helps.
> Dan Guzman
> SQL Server MVP
> "Lisa" <reageer@.yahoo.com> wrote in message
> news:1133345102.826916.134290@.z14g2000cwz.googlegroups.com...
>|||On 30 Nov 2005 02:05:02 -0800, Lisa wrote:

>Hi all,
>I have a design question:
(snip)
Hi Lisa,
Here's my opinion.
First, badges and employees are different entities. I don't like
combining distinct entities in one table. There might be a one-to-one
relationship now, but that might change in the future (your own posts
already hints at this possibility).
Second, all tables should always have a "business key" (I prefer this
term over "natural key" for reasons beyond the scope of this reply). In
your case, the combination of house number, street and zip would be the
"business key", as indicated by this:

>Actually, the card is going to be assigned to a household (home
>address), so the combination of house number, street and zip is
>probably unique but using this as a primary key would make all queries
>quite cumbersome, compared to some id.
Third, to make queries and foreign keys easier, you can add a surrogate
key to the table. This surrogate key should be generated by the DB (eg
using IDENTITY). It's values should be hidden from the end user. For the
end user, the business key identifies individual households. The
surrogate key is only used "under the hood".
Fourth - if you add a surrogate key, you'll still have to make sure that
the uniqueness constraint imposed by the business key won't be violated.
That means that you'll have to add a UNIQUE constraint for the business
keyif you make the surrogate key the PRIMARY KEY. (You can reverse this
as well - make the business key the PRIMARY KEY, and put a UNIQUE
constraint on the surrogate key).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi Lisa,
You might benefit from reading the articles at these links. They provide
some perspective and rationalle on the topic which you might find useful:
http://www.dbpd.com/vault/9805xtra.htm
http://www.bcarter.com/intsurr1.htm
Types" target="_blank">http://www.ssw.com.au/SSW/Standards...ey
Types
And just for fun... here's something on SOX
http://www.phptr.com/articles/print...p?p=415980&rl=1
Enjoy!
-Jeff
"Lisa" <reageer@.yahoo.com> wrote in message
news:1133345102.826916.134290@.z14g2000cwz.googlegroups.com...
> Hi all,
> I have a design question:
> I have a bunch of users (name, address, zip, etc.). They are assigned a
> card with a specific id.
> The only thing unique is this card id, or probably the combination of
> all other user fields.
> So it's seductive to use the card id as the primary key.
> This card allows access to certain places and all access is logged.
> The problem is, a user might lose a card and will be issued a new one
> with a different id. Perhaps it may even be desirable to assign
> mutliple cards with different ids to one person, but let's put that
> scenario aside for a moment.
> Issuing a new card to a user could mean updating the card id with a new
> one, e.g.:
> UPDATE Users SET CardId=124 WHERE CardId=123
> But same would have to be done with all 'logs' that refer to the lost
> card id.
> And some kind of logging of this change needs to occur to maintain some
> 'history'.
> Consistency could be handled with triggers or something..
> Doesn't seem like a good solution.
> It seems obvious that I should seperate user from card id, where card
> id is no longer primary key, rather just a field within the users
> table, or if multiple cards can be assigned to the same user, a
> seperate table for cards mapped to user records.
> So then how do I uniquely identify a user? It seems easy enough to
> simply use an IDENTITY column for this, however, I do agree with
> 'CELKO's' (sp?) principle that best design is one where data in the
> database actually represents something in the real world. In particular
> when backing up, reinserting and other things could goof up
> (re-generate) these id's.
> And there's my dilemma. How do I uniquely identify each user record
> consisting of just name, address, zip, city and perhaps phone number?
> There are no other id's that represent the user, such as social
> security number or whatever.
> Actually, the card is going to be assigned to a household (home
> address), so the combination of house number, street and zip is
> probably unique but using this as a primary key would make all queries
> quite cumbersome, compared to some id.
> Is this a case where an identity (aka auto_increment) is inevitable or
> even highly recommended?
> Another alternative I thought of was to use the card id as primary key,
> but have the same user record appear in the users table multiple times,
> but with a different card id, where each card id might actally be
> 'blocked' or not. This blocked cards can then be deleted once the
> records in other tables that refer to that id are also deleted, such as
> logs. Sounds ugly as this introduces some redundancy of user records,
> but might actually be acceptable I think, because logs are deleted
> every now and then and lost cards will be rare. Not sure if I should
> design the database with the 'rule' or 'exception' in mind.
> Yes, I'm a rookie.. and I don't mind being chewed out for it, as long
> as the lecture includes something useful I can work with and learn from
> :)
> Lisa
>|||> So, if I just have Name, address, zip, city and no more, should I use an
> IDENTITY column for primary key? If you had a create a table of people of
> whom you only know the above information, what would you use for primary
> key?
I would use a surrogate key as the primary key and also create a unique
constraint on those 4 columns. I'd probably use an identity column as the
surrogate key and, like Hugo said, not expose it to end users.

> And what if the natural primary key is a long string? While queries might
> be fast due to indexing, typing out this long string in each query, rather
> than using numbers for id, seems quite cumbersome. So what would you do in
> this situation?
Hopefully, end users aren't writing SQL queries or entering the entire
primary key the hard way. Usually, a front-end application does the heavy
lifting by allowing the user to search based on easily-entered data like
postal code, partial strings and pick lists.
Hope this helps.
Dan Guzman
SQL Server MVP
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:OhqTEWf9FHA.1844@.TK2MSFTNGP11.phx.gbl...
> So, if I just have Name, address, zip, city and no more, should I use an
> IDENTITY column for primary key? If you had a create a table of people of
> whom you only know the above information, what would you use for primary
> key?
> And what if the natural primary key is a long string? While queries might
> be fast due to indexing, typing out this long string in each query, rather
> than using numbers for id, seems quite cumbersome. So what would you do in
> this situation?
> Lisa
> "Dan Guzman" <guzmanda@.nospam-online.sbcglobal.net> wrote in message
> news:ehSzG6Z9FHA.1332@.tk2msftngp13.phx.gbl...
>|||You need to tie a card to a person, and have the ability re-issue or
cancel cards.
CREATE TABLE UsersCards
(user_id CHAR(35) NOT NULL
REFERENCES Users(user_id),
card_id CHAR(10) NOT NULL
REFERENCES Cards(cards_id),
issue_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
cancel_date DATETIME, -- null is current
CHECK (issue_date < cancel_date),
PRIMARY KEY (user_id, card_id),
. );
You will need a card_id with a check digit and a person_is with an
external verification.
as the lecture includes something useful I can work with and learn from
<<
I am sorry not to be more abusive tonight, but I have two trips next
w and two consulting jobs this w. I promise to do better.
Then use a hashing algorithm, a Census Bureau household number, or
other household ids from commercial sources.|||Celko,
I thought about using hashing.. I'm sure there are many out there, but I was
considering MD5.
However, I'm guessing the likelyhood that the hash of 2 different records
(with different data) in a database of a few million, being equal, is too
big. There's no guarantee in place that it won't.
I also notice that you use underscores in your field names.. pretty,
readable, recommended or just a matter of taste?
I've seen many times that databases are named "nameDB" and tables "TBLname"
and fields "FLDname".
Is that stupid, or useful?
Lisa
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1133484712.397103.187830@.g43g2000cwa.googlegroups.com...
> You need to tie a card to a person, and have the ability re-issue or
> cancel cards.
> CREATE TABLE UsersCards
> (user_id CHAR(35) NOT NULL
> REFERENCES Users(user_id),
> card_id CHAR(10) NOT NULL
> REFERENCES Cards(cards_id),
> issue_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
> cancel_date DATETIME, -- null is current
> CHECK (issue_date < cancel_date),
> PRIMARY KEY (user_id, card_id),
> .. );
> You will need a card_id with a check digit and a person_is with an
> external verification.
>
> as the lecture includes something useful I can work with and learn from
> <<
> I am sorry not to be more abusive tonight, but I have two trips next
> w and two consulting jobs this w. I promise to do better.
>
> Then use a hashing algorithm, a Census Bureau household number, or
> other household ids from commercial sources.
>

No comments:

Post a Comment