Showing posts with label thousands. Show all posts
Showing posts with label thousands. Show all posts

Wednesday, March 7, 2012

Designing reusable column transformations?

We receive thousands of files every week from various clients and we attempt to clean the columns using the same technique over and over so the data is consistent. The problem is I dont see a way to reuse complex column transformations in different packages. I would hate to have to go change every package if we change the rules for cleaning a column.

So #1: Can you create some kind of script or .net function that cleans a column and reuse it in multiple packages (or even in the same package)?

#2: Is it possible to call functions from the Derived Column expression builder?

Thanks!

1)May be. You always can use a script component with the transformation rules and copy and paste it many times; it that sounds reasonable to you.

2) No you can not.

|||

Craig, I have had do do the same thign.The best solution is not a script component, as the person above mentioned.

Take the hit, build a custom component. Once you get it built, adding a new column is as simple as adding a new column to the transform.

You will want to build a custom pipline component. Both Wrox and O'riely have some decent chapters decribing how to do this. and there is definately info in microsoft.

I have 350 packages, that all chagne time from european time to us time, in the data flow. By writing 1 component, once, I was able to then delegate the rest of the work, since the hard part was encapsulated in a compoent.

good luck

Tuesday, February 14, 2012

Design and Performance Question

Hi,

I have a basic DB design question.

First, my application has 100's of customers, all of which will be adding thousands of records each day.

My question is, because of the eventual size of some of these tables, should I have a separate DB for each customer or should I design one DB to hold all customers?

I am worried about some of the massive joins that will eventually take place, also what if I have to do a restore on just one customer's data.

So what should I do, one massive DB, or one DB per customer?

Thanks for you input!

SQL Server is quite capable of handling tables with millions of rows of data. Properly indexed and maintained, performance should not be a big problem. Managing hundreds of databases, however, could become quite wieldly.

The only reason I'd separate customers into their own databases would be for security reasons in the event that each customer directly accessed their data through some web-based application. In that event it's more secure to keep each customer to its own database, preventing inadvertent access to another customer's data.