Friday, February 24, 2012

Design question: static data & custom data & merging the two

OVERVIEW:
We have "normal" application data, like Company, and we have custom/override data that can override the "normal" data. We need the ability to switch back and forth between normal data and custom data and also merge the two (with custom data overriding any normal data fields).

BASIC QUESTION:
Is it best to set a flag in each table saying whether the row is custom or override OR
is it best to have a Normal table and a Custom/Override table?

MORE DETAIL IF NEEDED..........
Company Data
CompanyName - CompanyCity - Company Address
(Normal/Default Data) Microsoft - Seattle - 1 Microsoft Way
(Custom/Override Data) Microsoft - Atlanta - 1 Peachtree Way

So, the first row would show up in the application, unless the second row existed and at that point, the second row would be merged with the first row. Also would need the ability to switch back to the Normal row if the Custom row is deleted. Hope this makes sense.This seems like an unusual requirement, however, there a options available to implement it. Could you elaborate what 'merge' really does in this case? Aldo, do you have an idea of how common the 'custom' data is going to be? Is it a small percentage of rows in the table? Is it predictable at all? Is there only one 'custom' version of a particular row or you can have multiple 'custom' rows per every 'normal' row?|||"what 'merge' really does in this case"

We have a lot of applications running on customer's PC's (around 40,000), and each of the applications has the same base set of data, for example they all start out with a listing of Products. However, a couple thousand of the apps has the ability to go in and modify some of the data (custom data) for particular users. So the merge would override the base data with any custom data that has been created.

how common?

Well, it's only going to be available on a small set of applications, but it will be maybe 10% of data will be custom.

Is it predictable?
Yes, in the sense that we do know which fields are overridable/custom.

There is only on custom version of a particular row.

Hope this makes sense, let me know if I need to clarify.|||

craig11 wrote:

BASIC QUESTION:

Is it best to set a flag in each table saying whether the row is custom or override OR

is it best to have a Normal table and a Custom/Override table?

With respect to querying such data:

If none of the 'mergeable' columns are nullable than it would either be self-left-joins if the 'custom' rows are in the same table, or left-joins if you have separate tables for the 'custom' data. Plus you would also need to coalesce the columns from the 'normal' rows and the 'custom' rows using the corresponding function. So, the separate tables or the same table - it would basically be the same T-SQL and the bottom line is that cost-wise It would not be that much different assuming that you also make sure that use have useful indices in both cases. Using your example:

SELECT
COALESCE (Custom.[Company Name], Normal.[Company Name] as [Company Name],
COALESCE (Custom.[Company City], Normal.[Company City] as [Company City],
COALESCE (Custom.[Company Address], Normal.[Company Address]) as [Company Address]
FROM [Company Data] Normal LEFT JOIN [Company Data] Custom ON
Normal.[Company Id] = Custom.[Company Id]
AND Normal.[Flag] = 'Normal' AND Custom.[Flag] = 'Custom'

Or

SELECT
COALESCE (Custom.[Company Name], Normal.[Company Name] as [Company Name],
COALESCE (Custom.[Company City], Normal.[Company City] as [Company City],
COALESCE (Custom.[Company Address], Normal.[Company Address]) as [Company Address]
FROM [Company Data - Normal] Normal LEFT JOIN [Company Data - Custom] Custom ON
Normal.[Company Id] = Custom.[Company Id]

If some of the columns are nullable, then the things become a tiny bit more complicated, but still cost-wise it would not really have any effect.

With respect to updating such data:

If the ‘normal’ data for you deployed application is read-only most of the time then, you are likely better off using a separate table for the normal data residing on a dedicated read-only filegroup. Assuming that you would need to periodically deliver the changes in ‘normal’ data to the deployed applications, it could also be easily done by switching the filegroup to read-write mode for the duration of the sync procedures.

If the read-mostly model is not adequate, then, again there won‘t be much f a difference, except you will probably need a little more space for the flag column if you go with the same table approach.

You could also somewhat combine the two by partitioning your table and use two separate partitions instead of the two tables.

However, depending on actual data distributions and the update and concurrency patters, the variations in the index fragmentation could introduce additional considerations.

No comments:

Post a Comment