Determining Security: Designing dimensions and partitions on heavy loaded dimensions

Saturday, February 25, 2012

Designing dimensions and partitions on heavy loaded dimensions

I need your help designing a complex AS solution.

Here's the story -
My application is a hosted BI solution meant to serve several customers on a single machine.
Most of the data of my customers is not sharable / has no reason to share (For instance - customer1 products has not intersection with customer2 products)
There are few and small dimensions that are common to everyone - for instance: geographic and time dimensions.

I was thinking of two options to approach such a problem, I’d like to hear you opinion on them:
1. One DB for several (~10) customers (meaning VERY big dimensions > 15M).
Using lot of where [Customers].[CustomerX] and non empty in any of the queries to programmatically separate data.
Setting partitions slicing per customer and month.
One connection string.
2. Separate DB for each customer (meaning complex maintenance)
Products (for example) dimension is always specific to a single customer.
Setting different connection strings for each customer and physically separate all dimensions on different files.

More details to take into consideration:
The application consists of 10 dimensions, 3 of them are VERY big: ~3M per customer.
As for partitions - few 10s of millions of facts per customer per month.

It's difficult to say what is best, this is not really a typical application.

I would tend to lean away from option 1. Using a single connection string and trying to filter manually would be prone to errors and if users were able to execute any sort of adhoc queries they might be able to circumvent your "security". I would use multiple connections and setup some sort of role bases security instead and this would imply using multiple connections.

If all/most of the dimensions were shared, I would say put the data in one cube (with plenty of partitions) and use security roles to restrict access. If nothing significant can be shared, in my opinion, I would think that multiple databases would be better, but this is not a situation I have had to deal with myself.

I have the following issues with a single database solution:

What happens if you need to scale out to multiple SSAS servers? With Multiple DBs it would be relatively easy to shift some of them to another server.

What about backup/restore - SSAS 2005 has issues where backs take exponentially longer as the database gets larger.

What happens when corrupt/incorrect data gets loaded for one customer and you need to re-process? Do you want to take down all customer access or just that to the affected customer?

If the bulk of the data is not shared, you are not saving much memory by consolidating multiple customers into one database.

Formulas might need to be calculated across the entire cube subspace regardless of the customer, before a "non-empty" can be performed significantly reducing performance.

If you have products from multiple customers in one cube you could get clashes where different customers have different products with the same name.

Saturday, February 25, 2012

Designing dimensions and partitions on heavy loaded dimensions

No comments:

Post a Comment

Determining Security

Blog Archive

About Me