Sunday, February 19, 2012

Design Question

I am struggling to decide which the following two design is better choice in terms of performance:

Here is exisiting system has table call SalesDetail, which has typical sales amount, sales cost, etc measures, it also includes field like locationCode (varchar), companyCode(varchar). There is identity key field call TransID, which acts like GUID, all other tables in the system has this TransID.

1) One design is there are "lookup" table in the system such as location (locationCode, locationname), company (companyCode, companyName), so I can have my factSalesDetail with all exsiting field, I can join locationCode to the location "Lookup" table, creating one-many relationship, essentially I use these "lookup" table as Dimension such as dimLocation, etc. This gives us wider FactTable since it is included all "locationCode", etc.

2) Or I can use GUID "TransID" of SalesDetail Fact, to join prebuild CompanyTable in datamart, which has all Location, Companyname, code there. so this gives me shorter factTable (only including true measurement), but deeper and bigger DimensionTable with more records than "lookup" dimensionTable"

The factTable is quite large (26 million records), which design is better in terms of performance?

If the transaction's relationship to location is through company, then my advice would be to model it that way unless it would be confusing to end-users.

Regarding performance, look into partitioning. HOLAP may be another interesting option for you, but I would try to stick with MOLAP if you can get away with it. Watch the processing steps to grab the SQL and make sure your relational database is tuned to help those queries run as quickly as possible. Also, look into replace any GUID fields used for joins with integer values as this will reduce space which could give you a performance boost (by reducing the number of data pages needing to be read).

Good luck,
Bryan

|||

Thanks Bryan for your advice. So basically you think Design Mode 2) could be good route to go, which use GUID like TrasID join prebuilt Company table includes all information regarding the company. I actually test both choices, Design 2 has smaller size and prcessing little bit faster, currently it is about 20-22 minutes.

I also would like to ask your advice on processing, the whole database has 15 cubes but salesDetail accounts for half of the size and processing time. What do you think the better processing strategy? Currently I setup Full Process for whole AS DB using Cube Processing Task at SSIS, it takes about 36-42 minutes. There is also Process Default which seems doing nothing. Should I process the biggest cube in this case salesTransaction separately?

I try to do ProcessData to this Cube, and I check "Prcess Affected Object" but it still gives me error:"Errors in the OLAP storage engine: The attribute key cannot be found". since new record has been added in the dimension table. Do I miss something, I though if you choose "Prcess Affected Object", OLAP engine should be smart enough to figure it out.

|||

If there is time to pull it off, I go with full processing. It's just cleaner and easier to execute. But again, you have to have the time to do it.

Regarding "Process Affected Objects", I have used this in the context of processing a dimension and having associated partitions being automatically reprocessed. It may be that it only works that one direction. (Kinda makes sense actually. If I processed a partition but instructed SSAS to first process any associated dimensions, I could invalidate a bunch of other partitions associated with those dimensions.)

So, one question I have is do you have 15 cubes or 15 measure groups in one cube?

Thanks,
Bryan

|||

Hi Bryan, thanks again for your advice. for now I think I can pull it off by doing full process since it only takes around 40 minutes. I have 15 cubes, not measure groups. in terms of size, I have 5 cubes have facttable records about 24 million, 15 million, 9.4 million, 9.2 million and 3 million, all others are about half million records.

One thing I cannot do is partition since we are using standard edition of SQL 2005.

No comments:

Post a Comment