Friday, February 24, 2012

Design with the best space allocation.

We are getting ready to setup the gov's death file on a sql server. I would
like thoughts on the table design to optimize space. This is the data spec's
https://dmf.ntis.gov/recordlayout.pdf
The master file doesn't have the first column. So here is where I was headed
for the main table. The column's with asterisk are possibly empty. The
file updates will be done based off SSN column.
CREATE TABLE [dbo].[tblDMFile] (
[SSN] [char] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[LastName] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
*[NameSuffix] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[FirstName] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
*[MiddleName] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
*[StatusCode] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[DeathDate] [smalldatetime] NULL ,
[BirthDate] [smalldatetime] NULL ,
*[ResidentCode] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
*[ZipLastResident] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
,
*[ZipLumpSumPaymt] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblDMFile] WITH NOCHECK ADD
CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
(
[SSN]
) ON [PRIMARY]
GO
CREATE INDEX [IX_tblDMFile] ON [dbo].[tblDMFile]([LastName], [FirstName])
ON [PRIMARY]
GO
TIA
CDMy understanding (could be wrong) is that SSNs of deceased persons are
reused, (or can be resused) so, if that is in fact true, then your use of SS
N
as single column Primary Key might be an issue. If true, you could add
DeathDate as well, (make it non null) I would assume since this is Death
File, every record will have to have a Death Date?
CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
([DeathDate],[SSN]) ON [PRIMARY]
That will handle the issue...
Also, suggest you put DeathDate first in the Index, and make this the
Clustered Index (on DeatHDate, SSN), since this will ensure that new Inserts
will generally be added to new Pages on disk, not randomly distributed
throughout the physical storage... Putting SSN as first column in PK index,
means that records would be physically ordered on SSN, and new inserts will
be distributed among all the physical pages on disk, causing many page split
s
and rapid table and index fragmentation.
"CD" wrote:

> We are getting ready to setup the gov's death file on a sql server. I wou
ld
> like thoughts on the table design to optimize space. This is the data spec
's
> https://dmf.ntis.gov/recordlayout.pdf
> The master file doesn't have the first column. So here is where I was head
ed
> for the main table. The column's with asterisk are possibly empty. The
> file updates will be done based off SSN column.
> CREATE TABLE [dbo].[tblDMFile] (
> [SSN] [char] (9) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
> [LastName] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> *[NameSuffix] [varchar] (4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [FirstName] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> *[MiddleName] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> *[StatusCode] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> [DeathDate] [smalldatetime] NULL ,
> [BirthDate] [smalldatetime] NULL ,
> *[ResidentCode] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
> *[ZipLastResident] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
> ,
> *[ZipLumpSumPaymt] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
> ) ON [PRIMARY]
> GO
> ALTER TABLE [dbo].[tblDMFile] WITH NOCHECK ADD
> CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
> (
> [SSN]
> ) ON [PRIMARY]
> GO
> CREATE INDEX [IX_tblDMFile] ON [dbo].[tblDMFile]([LastName], [FirstName])
> ON [PRIMARY]
> GO
> TIA
> CD
>
>|||Thanks for the reply. That is a good point about SSN resuage(possibly)
1) To clarify the table schema is good for the best Least space usage
2) ALTER TABLE [dbo].[tblDMFile] WITH NOCHECK ADD
CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
([DeathDate], [SSN] ) ON [PRIMARY]
3) CREATE INDEX [IX_tblDMFile] ON [dbo].[tblDMFile]([DeathDate],
[LastName], [FirstName])
ON [PRIMARY]
I am guessing most of the searches will be off the SSN then probably least
likely LastName...
Thanks again.
"CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
news:A19A88F8-A183-4116-98F2-24BF3666352E@.microsoft.com...
> My understanding (could be wrong) is that SSNs of deceased persons are
> reused, (or can be resused) so, if that is in fact true, then your use of
> SSN
> as single column Primary Key might be an issue. If true, you could add
> DeathDate as well, (make it non null) I would assume since this is Death
> File, every record will have to have a Death Date?
> CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
> ([DeathDate],[SSN]) ON [PRIMARY]
> That will handle the issue...
> Also, suggest you put DeathDate first in the Index, and make this the
> Clustered Index (on DeatHDate, SSN), since this will ensure that new
> Inserts
> will generally be added to new Pages on disk, not randomly distributed
> throughout the physical storage... Putting SSN as first column in PK
> index,
> means that records would be physically ordered on SSN, and new inserts
> will
> be distributed among all the physical pages on disk, causing many page
> splits
> and rapid table and index fragmentation.
>
> "CD" wrote:
>|||CD,
The only other suggestion I can make, related to minimizing disk storage,
which will only save you 8bytes per record, is based on my assumption that
you are not storing time data in either Birthdate or death date... Therefore
,
by using SmallDateTime, which uses 2bytes for date, and 2 bytes for time, yo
u
are wasting the 2 time bytes. One way avoid this is to use your own
propietary "Date" data type, based on SmallInt, where the value of the
smallInt
(which is from -2^15 (-32,768) through 2^15 - 1 (32,767) represents the
Date as number of days since some arbitrary Date. If you Cast a
smalldateTime to a SmallInt, you will get the same value based on 1 Jan 1900
,
If you want to use the same values as the 2 bytes in a SmallDateTime Value,
you must remember that the 2 date bytes in a smalldatetime represent integer
s
from 0 to 65535, (UNSIGNED 2-byte Integer), not from -32,768 to 32,767.
So, if you want to take this approach, first you need to decide what range
of dates you want to be able to represent... And to do this in 2 bytes, you
need to limit it to 65536 dates.. If you want to use the same range of dates
as SmallDateTime
(1 Jan 1900 - 6 Jun 2079)
Then the conversion from a SmalldateTime value (say it's in Variable @.SDT)
to the smallint you need to store in your table will be
Cast (Cast (@.SDT as Integer) - 32768 As SmallInt)
1) Cast SmallDateTime as Integer (it's in range 0 - 65535)
2) Subtract 32,768 (now it's in range - 32,768 to 32,7687)
3) Cast it as SmallInt (This not absoutely necessay as Insert into SmallInt
Column will auto convert an integer)
And the conversion from the smallint in the table (say Colname is
"DeathDt"), to the SmalldateTime value will be
Cast(DeathDt + 32768 As SmallDateTime)
Add 32,768 (to make it positive (in range from 0 - 65535) and then cast it
to smalldateTime
By the way, you can add Calculated columns to the table definition, based
exactly on that latter formula [ Cast(DeathDt + 32768 As SmallDateTime) ]
that will not increase on disk data storage, yet output the exact
SmallDateTime value exactly as you would if you stored it in the the table a
s
a SmallDateTime...
"CD" wrote:

> Thanks for the reply. That is a good point about SSN resuage(possibly)
> 1) To clarify the table schema is good for the best Least space usage
> 2) ALTER TABLE [dbo].[tblDMFile] WITH NOCHECK ADD
> CONSTRAINT [PK_tblDMFile] PRIMARY KEY CLUSTERED
> ([DeathDate], [SSN] ) ON [PRIMARY]
> 3) CREATE INDEX [IX_tblDMFile] ON [dbo].[tblDMFile]([DeathDate],
> [LastName], [FirstName])
> ON [PRIMARY]
> I am guessing most of the searches will be off the SSN then probably least
> likely LastName...
> Thanks again.
>
> "CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
> news:A19A88F8-A183-4116-98F2-24BF3666352E@.microsoft.com...
>
>|||Never actually researched this one, Thanx Michael... Another urban legend
bites the dust...
"Michael C#" wrote:

> According to the Social Security Administration (www.ssa.gov), SSN's are N
OT
> re-assigned after someone dies. However, some people are not assigned SSN
's
> (Pennsylvania Amish spring immediately to mind), so SSN might be NULL for
> some folks, which could cause issues. I'm facing a similar issue with
> trying to identify people uniquely in a database right now myself.
> --Quote From SSA.GOV Website:
> Question:
> Are Social Security Numbers re-assigned after a person dies?
> Answer:
> No. We do not reassign a Social Security number (SSN) after the
> number holder's death. Even though we have issued over 415 million SSNs so
> far, and we assign about 5 and one-half million new numbers a year, the
> current numbering system will provide us with enough new numbers for sever
al
> generations into the future with no changes in the numbering system.
>
> --End Quote
> "CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
> news:A19A88F8-A183-4116-98F2-24BF3666352E@.microsoft.com...
>
>|||Sorry, not 8bytes per record, only 2 byes/record, total only 4 bytes saved..
.
"CBretana" wrote:
> CD,
> The only other suggestion I can make, related to minimizing disk storage,
> which will only save you 8bytes per record, is based on my assumption that
> you are not storing time data in either Birthdate or death date... Therefo
re,
> by using SmallDateTime, which uses 2bytes for date, and 2 bytes for time,
you
> are wasting the 2 time bytes. One way avoid this is to use your own
> propietary "Date" data type, based on SmallInt, where the value of the
> smallInt
> (which is from -2^15 (-32,768) through 2^15 - 1 (32,767) represents the
> Date as number of days since some arbitrary Date. If you Cast a
> smalldateTime to a SmallInt, you will get the same value based on 1 Jan 19
00,
> If you want to use the same values as the 2 bytes in a SmallDateTime Value
,
> you must remember that the 2 date bytes in a smalldatetime represent integ
ers
> from 0 to 65535, (UNSIGNED 2-byte Integer), not from -32,768 to 32,767.
> So, if you want to take this approach, first you need to decide what range
> of dates you want to be able to represent... And to do this in 2 bytes, yo
u
> need to limit it to 65536 dates.. If you want to use the same range of dat
es
> as SmallDateTime
> (1 Jan 1900 - 6 Jun 2079)
> Then the conversion from a SmalldateTime value (say it's in Variable @.SDT)
> to the smallint you need to store in your table will be
> Cast (Cast (@.SDT as Integer) - 32768 As SmallInt)
> 1) Cast SmallDateTime as Integer (it's in range 0 - 65535)
> 2) Subtract 32,768 (now it's in range - 32,768 to 32,7687)
> 3) Cast it as SmallInt (This not absoutely necessay as Insert into SmallIn
t
> Column will auto convert an integer)
>
> And the conversion from the smallint in the table (say Colname is
> "DeathDt"), to the SmalldateTime value will be
> Cast(DeathDt + 32768 As SmallDateTime)
> Add 32,768 (to make it positive (in range from 0 - 65535) and then cast i
t
> to smalldateTime
> By the way, you can add Calculated columns to the table definition, based
> exactly on that latter formula [ Cast(DeathDt + 32768 As SmallDateTime) ]
> that will not increase on disk data storage, yet output the exact
> SmallDateTime value exactly as you would if you stored it in the the table
as
> a SmallDateTime...
>
> "CD" wrote:
>|||Thanks for the advice,
The date is currently a problem in its current format (MM,DD,CC,YY)
01301980. I have the data loaded (dates are in varchar) and plan to run a
update to move the year to front the change the column type.
I have not test but hope this will work.
set Deathdate = Right(Deathdate, 4) + Left(Deathdate, 4)
"CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
news:17A3FFDD-9F3F-4266-9B28-9F57F729BC07@.microsoft.com...
> Sorry, not 8bytes per record, only 2 byes/record, total only 4 bytes
> saved...
> "CBretana" wrote:
>|||Yes that should work...
but cast or convert the result to smalldateTime as well...
Cast(Right(Deathdate, 4) + Left(Deathdate, 4) As SmallDateTime)
"CD" wrote:

> Thanks for the advice,
> The date is currently a problem in its current format (MM,DD,CC,YY)
> 01301980. I have the data loaded (dates are in varchar) and plan to run a
> update to move the year to front the change the column type.
> I have not test but hope this will work.
> set Deathdate = Right(Deathdate, 4) + Left(Deathdate, 4)
> "CBretana" <cbretana@.areteIndNOSPAM.com> wrote in message
> news:17A3FFDD-9F3F-4266-9B28-9F57F729BC07@.microsoft.com...
>
>|||On Wed, 9 Mar 2005 12:14:46 -0600, CD wrote:
(snip)
> The column's with asterisk are possibly empty.
(snip)
Hi CD,
Then why are the columns without asterisk not declared as NOT NULL?
* Prevents data corruption
* Since you want the best space allocation: would save you one byte per
row (up to 8 nullable columns take a one-byte bitmap to hold the
NULL/NOT NULL control bits; 9-16 nullable columns add a second byte to
the NULL/NOT NULL bitmap, etc.)
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment