Showing posts with label advice. Show all posts
Showing posts with label advice. Show all posts

Sunday, March 25, 2012

determine a role

hi,

since I am kind o'new with SQL, I preffer get an advice fro you pro's: I created an application which performs access to a database on an SQL server. the application will be used by a few different users, each on a different computer. the application calls stored procedures, updates\inserts records in tables on the SQL and delete rows. what would be the best role to define the users activity ? How do I limit their activity ONLY to the specified actions ?

For the tasks you mentioned, typically there will be permissions at the granularity you want (for example EXECUTE on a stored procedure, INSERT and UPDATE on a table, etc.), but I would strongly suggest referring to BOL to learn more about this topic. A good starting point can be

“Security Considerations for Databases and Database Applications” (http://msdn2.microsoft.com/en-us/library/ms187648(SQL.90).aspx).

I hope this helps, but let us know if you have any further question.

-Raul Garcia

SDE/T

SQL Server Engine

Friday, February 24, 2012

Design Solution Required

We are facing design issues, Could you please advice us how to proceed?

Problem description: Web App will pass a complex dynamic SQL query to
backend and it should return result set as fast as it can
Issue 1: SQL query will have lot of JOINS and WHERE clause
Issue 2: Each Table contain millions of records

Requirement: Turn around time of the SQL query should be as far as
possible minimum.

Could you please advice us which technology we should use, such that
users get the resultset in few seconds.

We are Microsoft Partner. We use only Microsoft technology for our
product development.

Your Help is much appreciated

With Regards
S a t h y a RCould you please advice us which technology we should use, such that

Quote:

Originally Posted by

users get the resultset in few seconds.


Pay particular attention to index and query tuning. Make sure you have
indexes that the optimizer can use to generate the most efficient plan.
Prioritize tuning so that the most often executed and expensive queries are
addressed first. Also consider indexed views, which are especially
appropriate for aggregated data. Keep in mind that too many indexes can
hurt performance if you do a lot of inserts/updates so you'll need to
perform cost-benefit analysis.

I suggest you get a good book that covers query and index tuning in depth.
I recommend Inside Microsoft SQL Server 2005: T-SQL Querying, ISBN
9780735623132.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Sathya" <sathyamca@.gmail.comwrote in message
news:1164801200.132687.7070@.j72g2000cwa.googlegrou ps.com...

Quote:

Originally Posted by

We are facing design issues, Could you please advice us how to proceed?
>
Problem description: Web App will pass a complex dynamic SQL query to
backend and it should return result set as fast as it can
Issue 1: SQL query will have lot of JOINS and WHERE clause
Issue 2: Each Table contain millions of records
>
Requirement: Turn around time of the SQL query should be as far as
possible minimum.
>
>
Could you please advice us which technology we should use, such that
users get the resultset in few seconds.
>
We are Microsoft Partner. We use only Microsoft technology for our
product development.
>
>
Your Help is much appreciated
>
With Regards
S a t h y a R
>

|||Sathya wrote:

Quote:

Originally Posted by

We are facing design issues, Could you please advice us how to proceed?
>
Problem description: Web App will pass a complex dynamic SQL query to
backend and it should return result set as fast as it can
Issue 1: SQL query will have lot of JOINS and WHERE clause
Issue 2: Each Table contain millions of records


Quote:

Originally Posted by

Could you please advice us which technology we should use, such that
users get the resultset in few seconds.
>


Use sp_executesql to execute your dynamic SQL (not EXEC). Even better,
try to use a prepared statement. In your queries, make sure to use the
indexes, avoid calling functions and do not sort in SQL unless it is
absolutely necessary (sort on client side instead).

You could also save typical queries and run them through the Database
Tuning Advisor, which will suggest how to index your tables. This
wizard is available with SQL Server 2005 in the Management Studio, but
it can help to tune SQL Server 2000 databases as well.

If you can afford it, use SQL 2005 Enterprise Edition, which will allow
you to partition your tables. Partitions can greatly improve speed.
Again save a typical query and run it through the Database Tuning
Advisor, which can suggest how to create optimal partitions.

This wizard is just awesome, but of course if your queries are
completely random and different it won't be of much help since it need
a specific workload to make suggestions.

Regard,
lucm

Design Question!

Hi,
i need some advice on the best way to do something; here the explaination
i have a request from the powers above me to design a solution that will do
the following
from telecom CDR (Call Data Records); record the following the destination
telephone number, date and if it was evening, daytime or wend into a
different table. if that number already exists update the date to the
current date and update the counters of "evening , daytime or wend".
the average size of the input file is around 2 million plus. My question
is; that i have 2 possible solutions but i am unsure of the right one to use
1, use a function within a cursor
2. use a trigger
my thoughts are that idea 1 will be very resource hungry but am sure about
the use of a trigger as the data could be either updated or inserted. can
somebody please point me in the right direction?
and i hope this makes sense
Many thanks
Simon WhaleWhy do you need to use a trigger or a cursor? You should be able to use
BCP, DTS (or SSIS if 2005) or Bulk Insert to bulk load the file into a
staging table. From there you can use SET based processing to update the
other tables. You probably want to do the updates in smaller batches of say
10K at a time to minimize blocking etc.
Andrew J. Kelly SQL MVP
"Simon Whale" <s.whale@.nospam.dsl.pipex.com> wrote in message
news:usZf%23HvcGHA.1260@.TK2MSFTNGP05.phx.gbl...
> Hi,
> i need some advice on the best way to do something; here the explaination
> i have a request from the powers above me to design a solution that will
> do the following
> from telecom CDR (Call Data Records); record the following the destination
> telephone number, date and if it was evening, daytime or wend into a
> different table. if that number already exists update the date to the
> current date and update the counters of "evening , daytime or wend".
> the average size of the input file is around 2 million plus. My question
> is; that i have 2 possible solutions but i am unsure of the right one to
> use
> 1, use a function within a cursor
> 2. use a trigger
> my thoughts are that idea 1 will be very resource hungry but am sure about
> the use of a trigger as the data could be either updated or inserted.
> can somebody please point me in the right direction?
> and i hope this makes sense
> Many thanks
> Simon Whale
>

Sunday, February 19, 2012

Design Question - SQL 2000 to 2005 App Upgrade (CLR?)

I'm upgrading an app I wrote from SQL 2000 to 2005 looking for some
high-level design advice, just to determine how much should be T-SQL vs.
CLR, and the best approach to take for performance. For background, I'm a
strong c# developer, but with only average SQL skills and I'm just getting
into SQL 2005.
My Current (SQL 2000) Setup:
--
My C# app regularly collects performance information, then inserts it into a
table. On insert in that table I have a trigger to kick off a couple stored
procs that compare the new data to other tables where thresholds are set.
As a simple example, it essentially does a "SELECT * FROM Thresholds WHERE
@.NewlyInsertedValue > ThresholdValue" to determine if the current value is
in violation of any thresholds. (There are a couple more columns that are
considered that makes the query a little more intensive, but this is the
general idea)
If thresholds are violated, it inserts a record into an 'alerts' table and
generates an email (by another stored proc). If no thresholds are violated,
it runs a stored proc to clear any alerts that might have been created
previously since we're not in violation anymore. Since this happens EVERY
time a new value is collected, I really want to optimize this as much as
possible.
My Desired (SQL 2005) Setup:
--
I'm going to replace certain obvious functions like the "send email"
function from old SQL objects to net CLR objects. However, I want to
further optimize the system that compares incoming data with thresholds and
generates alerts as necessary. It seems very wasteful to do selects every
time a new value is inserted, and as the tables grow even with proper
indexing it is going to get expensive. Is there a way I can take advantage
of CLR to maybe keep these thresholds in memory to compare them against?
Any other ideas about how to accomplish this most efficiently?
Craig
PS - I can post DDL if you really want it, I was just trying to get high
level design ideas from someone that's done a lot of SQL CLR work already.For general data manipulation especially Inserts, Updates, Deletes and
normal Selects TSQL will always be the best choice. The CLR is useful for
heavy calculations / aggregations or when you simply can't do something with
TSQL. Of coarse anytime you need to visit the filesystem or need the
equivalent of an extended sp the CLR is prime for that as well. AS for the
mail component you can use the new DatabaseMail which works pretty well and
a lot less troublesome than the MAPI version in 2000. But in general I
suggest you use stored procedures for this instead of triggers. You have a
fair amount of processing to do and you should keep triggers as trim as
possible. I don't see any reason this logic can not be put into a standard
TSQL stored procedure. But as for checking the values why not do that in the
front end or middle tier before you even attempt the insert? That kind of
information is pretty much static and can be cached very easily in the
middle tier or front end. If these inserts are many you can save a lot of
processing etc. on the back end by validating the data before you send it
in.
Andrew J. Kelly SQL MVP
"Craig S" <cscheets@.dontspam.kc.rr.com> wrote in message
news:OIG$CfXDGHA.1544@.TK2MSFTNGP10.phx.gbl...
> I'm upgrading an app I wrote from SQL 2000 to 2005 looking for some
> high-level design advice, just to determine how much should be T-SQL vs.
> CLR, and the best approach to take for performance. For background, I'm a
> strong c# developer, but with only average SQL skills and I'm just getting
> into SQL 2005.
> My Current (SQL 2000) Setup:
> --
> My C# app regularly collects performance information, then inserts it into
> a table. On insert in that table I have a trigger to kick off a couple
> stored procs that compare the new data to other tables where thresholds
> are set. As a simple example, it essentially does a "SELECT * FROM
> Thresholds WHERE @.NewlyInsertedValue > ThresholdValue" to determine if the
> current value is in violation of any thresholds. (There are a couple more
> columns that are considered that makes the query a little more intensive,
> but this is the general idea)
> If thresholds are violated, it inserts a record into an 'alerts' table and
> generates an email (by another stored proc). If no thresholds are
> violated, it runs a stored proc to clear any alerts that might have been
> created previously since we're not in violation anymore. Since this
> happens EVERY time a new value is collected, I really want to optimize
> this as much as possible.
> My Desired (SQL 2005) Setup:
> --
> I'm going to replace certain obvious functions like the "send email"
> function from old SQL objects to net CLR objects. However, I want to
> further optimize the system that compares incoming data with thresholds
> and generates alerts as necessary. It seems very wasteful to do selects
> every time a new value is inserted, and as the tables grow even with
> proper indexing it is going to get expensive. Is there a way I can take
> advantage of CLR to maybe keep these thresholds in memory to compare them
> against? Any other ideas about how to accomplish this most efficiently?
> Craig
> PS - I can post DDL if you really want it, I was just trying to get high
> level design ideas from someone that's done a lot of SQL CLR work already.
>|||First, you're right - everything I'm doing can be done in T-SQL (It is today
in SQL 2000). I just wanted to see if CLR would offer any optimization.
Secondly, I think you're right about caching the data and performing the
checks in the middle tier for violations (I'm using web services). From
what I understand I can actually subscribe to changes to a table/view in
2005 so that I could cache that info and just rebuild it any time the
thresholds table was updated. The thresholds data is pretty static, so this
could definately work.
I guess this is basically what I was trying to do, but for some reason
thought it needed to live in the db. Logic is always best left in the
middle/front tier anyway I guess.
So, good call Andrew. Anyone else have anything to add/recommend?
Craig
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:O9eKD9XDGHA.3444@.TK2MSFTNGP10.phx.gbl...
> For general data manipulation especially Inserts, Updates, Deletes and
> normal Selects TSQL will always be the best choice. The CLR is useful for
> heavy calculations / aggregations or when you simply can't do something
> with TSQL. Of coarse anytime you need to visit the filesystem or need the
> equivalent of an extended sp the CLR is prime for that as well. AS for the
> mail component you can use the new DatabaseMail which works pretty well
> and a lot less troublesome than the MAPI version in 2000. But in general
> I suggest you use stored procedures for this instead of triggers. You have
> a fair amount of processing to do and you should keep triggers as trim as
> possible. I don't see any reason this logic can not be put into a
> standard TSQL stored procedure. But as for checking the values why not do
> that in the front end or middle tier before you even attempt the insert?
> That kind of information is pretty much static and can be cached very
> easily in the middle tier or front end. If these inserts are many you can
> save a lot of processing etc. on the back end by validating the data
> before you send it in.
> --
> Andrew J. Kelly SQL MVP
>
> "Craig S" <cscheets@.dontspam.kc.rr.com> wrote in message
> news:OIG$CfXDGHA.1544@.TK2MSFTNGP10.phx.gbl...
>|||Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:574655
Craig S (cscheets@.dontspam.kc.rr.com) writes:
> My C# app regularly collects performance information, then inserts it
> into a table. On insert in that table I have a trigger to kick off a
> couple stored procs that compare the new data to other tables where
> thresholds are set. As a simple example, it essentially does a "SELECT *
> FROM Thresholds WHERE @.NewlyInsertedValue > ThresholdValue" to determine
> if the current value is in violation of any thresholds. (There are a
> couple more columns that are considered that makes the query a little
> more intensive, but this is the general idea)
As long you have full control over the data, using a stored procedure
to insert is better. Triggers are more difficult to write, and there
is one ugly thing about them: they can disappear without notice. If
a stored procedure is dropped inadvertenly, your application will
cry out, but if the trigger is missing, you will only get incorrect
results.

> It seems very wasteful to do selects every time a new value is inserted,
> and as the tables grow even with proper indexing it is going to get
> expensive. Is there a way I can take advantage of CLR to maybe keep
> these thresholds in memory to compare them against?
Maybe there is, but it would be a bad idea. Don't worry, that data
is in memory already. SQL Server takes care of that, and keeps as
much data in cache as possible.

> I guess this is basically what I was trying to do, but for some reason
> thought it needed to live in the db. Logic is always best left in the
> middle/front tier anyway I guess.
Personally, I'm a strong believer in having the business logic where
the database is.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx

Design Question

Hi

I am trying to implement a invtentory control system and would like
some advice on the best design for it.

The system will have to main tables Product and Stock which will look
as follows

Product
---
ProductId
PartNumber
Description

Stock
--
StockId
ProductId
SerialNumber
RecievedDate
OrderNo
ShipmentNo

In ths stock table the RecivedDate Signifies when the product
recieved, The OrderNo signifies the whether the item has been sold and
the shipmentNo represents whethert tiem has been shipped.

I want to produce an SQL query which basically looks like this

StockList
---
PartNumber
Description
QtyInStock (StockItems not sold or shipped)
QtySold (StockItems Sold)
QtyShipped (StockItems Sold and shipped)

I cant seem to work out what the query would look like for this.
Has anyone got anytips, or alternative ideas/designs"Boogieboy" <iamthecow@.hotmail.com> wrote in message news:67ae4f66.0409220519.6d99eb62@.posting.google.c om...
> Hi
> I am trying to implement a invtentory control system and would like
> some advice on the best design for it.
<snip>
> I want to produce an SQL query which basically looks like this
> StockList
> ---
> PartNumber
> Description
> QtyInStock (StockItems not sold or shipped)
> QtySold (StockItems Sold)
> QtyShipped (StockItems Sold and shipped)
> I cant seem to work out what the query would look like for this.
> Has anyone got anytips, or alternative ideas/designs

Tip: Summing 1s and 0s is essentially the same as counting...

How about this?

Select
PartNumber,
Description,
SUM( CASE
WHEN OrderNo is NULL and ShipmentNo is NULL
then 1 else 0 END) as QtyInStock,
SUM( CASE
WHEN OrderNo is NOT NULL and ShipmentNo is NULL
then 1 else 0 END) as QtySold,
SUM( CASE
WHEN OrderNo is NOT NULL and ShipmentNo is NOT NULL
then 1 else 0 end) as QtyShipped
FROM
Product
JOIN Stock
ON Product.ProductID = Stock.ProductID
GROUP BY
PartNumber,
Description
ORDER BY
PartNumber ;

--
Paul Horan[TeamSybase] www.teamsybase.com
Sr. Architect
VCI Springfield, Mass
www.vcisolutions.com|||>> Has anyone got any tips, or alternative ideas/designs <<

inventory, orders and shipments are all logically different things, I
would put them in separate tables.

Tuesday, February 14, 2012

Design Help – Access SQL Database behind a firewall from ASP .NET pages.

Hi,

I am new to internet development and would like some advice on the technology used to access a SQL database that sits on a network behind a firewall.

** ASP .NET Page ** -> ** Web Server ** -> ** FIREWALL ** -> ** SQL **

So to give an example; from an ASP .NET page on the internet, I would like to populate a DataGrid with the contents of a single table from a SQL database. The SQL database is sitting on our company network behind a firewall.

Could someone please explain / point me in the right direction in how the ASP .NET page / Web Server can securely access the SQL database.

Thanking you in advance
ScottThe firewall needs to allow SQL access between the WebServer and the SQL server. This is usually port 1433. Only requests initiated by the webservers IP should be allowed to get to the db server.

You could also put together some web services on SQL server to send back datasets through the firewall (assuming its running iis and the .net framework).

Then you would just need to open port 80 between the webserver and the sql server. The webservices we run on our db server required an encrypted key as part of all method calls to insure that only we can run them.

No matter which way you do it, your opening a hole in your firewall for communication to occurr between the two boxes.|||Mbanavige – thanks for your reply. Can I confirm my understanding??

Option 1.
The firewall opens port 1433 with an additional rule to only allow the IP address of the web server to communicate on port 1433. As such this allows web server code to create a connection string to SQL? How would I fully qualifying the connection string server property - <SQL Server NetBIOS Name><domain>??

Option 2.
Have a second web server (internal) running on the same box as the SQL server. The internal web server will host web services, which can be called from the external web server. This requires the opening of port 80. Further security is added with the use of an encrypted key required for all method calls. How is the encrypted key implemented?

Is there an industry preferred solution?

Is option 2 more secure against SQL attacks since SQL access is further controlled through the use of web services?

Thanking you again in advance for your replies.
Scott

Design advice?

I have an construction estimation system, and I want to develop a project management system. I will be using the same database because there are shared tables. My question is this, critical data tables are considered tables with dollar values and these tables should not be shared across the whole company. I do however need information from these tables, such as product and quantity of the product for a given project. When an estimate becomes a project it is assigned a project number. At this point I thought of Copying the required data from the estimate side to the project side. This would result in duplicate data in a sence but the tables will be referenced from two standalone front end applications. Should I copy the data from one table to another, or create new "views" to the estimate tables for the project management portion.

What would be the best solution to this problem? I find in some circumstances, a new table is required because additional data will be saved on the "Project Management" side, but not in all cases.

Mike Bi'd make a new table and transfer contents of estimates table into it once the estimate matures into a project. at least one reason behind it: you can have multiple estimates and all but the final one will stay estimates. i'd also designate a grouping field to identify what estimates may relate to what project (i am talking about estimates that never became a project, but are related to the one that did)|||i'd also designate a grouping field to identify what estimates may relate to what project (i am talking about estimates that never became a project, but are related to the one that did)

I have already done that, it is a "Project Reference ID". Thanks for the input. It is pretty much already what I was thinking, but.....I am very under confident in my decisions (inexperience I guess).

Mike B

Design advice please...

Without a more thorough understanding of your database schema and the
business model you're trying to implement, I can only offer suggestions.
First I think using NULL to handle these values can make sense. However in
this case, I expect that I would create a Member table that had a MemberID
and MemberType columns. I would also create StudentDetails, AffiliateDetails
and RegularDetails tables and cross-reference these on the MemberID PK/FK.
This way you could hold common columns in the Member table and type-specific
information in the FK tables.
hth
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
"Matt" <awpilotnospam@.aol.com> wrote in message
news:jz6yc.66258$3x.20120@.attbi_s54...
> I'm developing a database for someone that will involve user accounts.
The
> accounts will hold certain data. There are different types of users: e.g.
> regular members, affiliate members, and student members. The problem is
> that not all the accounts will hold the same type of data. The difference
> between a regular member and a student member is that the regular member
has
> a doctoral degree, a date at which that degree was received, and some
other
> information; the student member has info about their current institution,
> degree type, and a faculty sponsor name.
> I think it would be unwise to put all these fields into one table and then
> leave null whichever fields are unneeded.
> Would it be good practice to have a users table with user name, password,
> email, and member type (regular, student, etc.) and then have separate
> tables for each type of member? For example, separate tables for regular
> members, student members, affiliate members, etc.
> Thanks.
> Matt
>
> Without a more thorough understanding of your database schema and the
> business model you're trying to implement, I can only offer suggestions.
> First I think using NULL to handle these values can make sense. However in
> this case, I expect that I would create a Member table that had a MemberID
> and MemberType columns. I would also create StudentDetails,
AffiliateDetails
> and RegularDetails tables and cross-reference these on the MemberID PK/FK.
> This way you could hold common columns in the Member table and
type-specific
> information in the FK tables.
Thanks!
Matt

Design advice needed

I have several dimensions that contain useful information to be used in calculations. For example, if I had a product dimension that included Cost, but my fact table was at a transaction level, I would still like to compute "Average Cost" at various levels within the cube. Cost per transaction would be the total cost of all products (or a subset of selected products) within a specific time frame, which would allow me to total the # of transactions (from the fact table) divided by the total costs of all products within the same selection (by totaling the cost value contained at the product dimension level).

This seems like a very common scenario, but seems to elude me how to model and create the calculations. I can't, for example, store the product cost in the fact table because it is at a lower level of granularity and would overstate the cost. The obvious place to have the cost value would be within the product dimension, but in what form? Member property? I have tried this approach, but get inconsistent or bad results when I try to filter along other dimensions, or multiselect many products.

Example:

Code Snippet

CREATE MEMBER CURRENTCUBE.[MEASURES].[Product Cost] AS Val(Product.CurrentMember.Properties("Product Cost")))

This "seems" to get me partways, by displaying the cost along the product dimension, but doesn't behave like a regular measure in that it doesn't automatically aggregate along other dimensions.

Am I missing something obvious here?

Thanks

-Kory

Dear Friend,

Be carefuly in the design of your project. I need to know your project requirements, but for example, I'm agree that the product is a Dimension but without the cost. The cost of the product will be the same for all the time? never be changed?

And what hapen if you change the product cost this month? The report for this month is OK but the reports for past months? will be based in the cost that you updated today? So, you must consider this cost as FACT relating to the Product dimension with the attributes for example (Name and Description).

In your example, on thing you must see is to create named calculations inside the datasource view in spite of calculated member inside cube.

I hope I helped you!!

regards!!

|||

Thanks for the reply. My example was simplistic, but if the product cost changed, and we wanted to keep history of changes, the dimension table would be a type II slowly-changing dimension. So a new record with a new surrogate key would be inserted in the dimension table with the new product cost.

By putting the product cost in the fact table, even as a named calculation in the DSV, you would be duplicating the cost across many fact records, and any calculation using this fact amount would be overstated.

An alternative would be to create a fact table at the product level of granularity, but then I can see ending up with many, many fact tables out of what appear to be dimensions.

I'm still not convinced there is a good way to handle this- I just haven't found it yet.

-Kory

|||

Korys,

Check my post on my blog and tell me waht you think.

http://pedrocgd.blogspot.com/2007/07/ssas-slowly-changing-values.html

Are you more convinced?

regards!

|||

Hello KoryS. I would recommend you to have the product cost as a measure in the fact table.

This means that you would have to solve this problem in the ETL-process with Integration Services.

It is a lot more complicated to solve this problem with MDX in a cube by using a product cost on each product in the dimension table.

If your leaf level in the fact table is the transaction level that should not be a hard ETL-problem to solve.

HTH

Thomas Ivarsson

|||

I'm agree wit you Thomas.

I did not understood very well what's the problem, because I'm very busy, but i'm with you!

regards to both!!!

Design advice for an education data warehouse

I’m not sure whether this ought to be in the Architecture or

SQL Analysis Services forum.I am after

some initial advice about a suitable structure for a data warehouse which will

be used to build OLAP cubes AS2005.

I work in a county education department.Each year we create/update a set of reports

and procedures which provide schools with online access to many of the

statistics for their school compared to district wide and to national targets,

etc.

After this year’s, which is my second at creating these

report/data, I can’t help but think that the whole thing is screaming out for a

Data Warehouse and Cube data to report on.

I may be wrong on that, as I’m still only doing the learning regarding

AS2005 and haven’t touched a previous version of it.I’m currently working through this book:

Microsoft SQL Server 2005 Analysis Services Step by Step - ISBN

0735621993.But I’m willing to bet that

this should be the way forward.Not only

for the actual schools viewing the data, but also for all those staff who work

at the county level and have to inspect/report on these schools and the

county’s performance.The only thing is

that most examples are stated within companies which have sales as a primary

measure, however I’m not sure education grades, targets, etc fall into the same

scheme.So I can’t use an example

structure to help get me started.

It isn’t something that I’ve got management backing for yet,

as it’s still pretty much just my own and a colleagues’ thoughts at the

moment.Also until I know more about it

myself then I wouldn’t feel comfortable taking my case to management.I feel I will be better to create a simple

version to demonstrate and show the benefit and power of such a system, to

those who will give the backing to the project.

The main central application, which is used in the county,

contains most of the information needed.

And several other systems also contain additional data which would be

useful.All based around a

Unique_Pupil_ID.

Current transactional database

The main transaction database I would be taking data from

has a main student table, then it has a student_results table.The student table also has many lookup type

tables related to ethnic origins, deprivation codes relating to income/student

address, etc.The kind of things that

would be useful when doing analysis on students.As well as at a more broad level, being able

to compare schools to each other at various subjects/overall/etc.

So the student table contains the information about the

student, but not results data.

The results table data is similar to below, but obviously

with more to it than just these fields:


Student_ID

Subject_ID

Grade

GradePointsEquivalent

ExamYear


Then there is obviously a lookup for the subject name.

There are also things like average/points/score per student/year

which aren’t a subject score, but more of a calculated field.So where ought that data to be located table-wise?

There would be a lot more to a total DW solution for this

environment, but I wanted to start with the basics so I have somethign to demonstrate.

Can anyone offer my any advice regarding this? and a possible design structure for the warehouse with regards to which ought to be fact or dimension tables.

Or do you think I'm barking up completely the wrong tree thinking that this would be a good solution?

Thanks for taking the time to read this less than short post.

Kind regards,
David

Hello. I have searched on the subject "education data warehouse" and found some suppliers and solutions in this area.

You design is a good start. A student table, with facts about the student, and a fact table with each students results, by each test or whatever. You will also need a tiem dimension, a test dimension(if these tests are general). Perhaps a school and a teacher dimension can enhance the analytic value of your solution. Students change schools and teachers and teachers change schools.

In the fact table you will normally only put base measures that can be used for calculations. You do the calculations in a cube.

Your problem have a lot in common with analysis when you try to see customer profiles and their buying patters, so you can have some help with sales examples.

HTH

Thomas Ivarsson

|||Thanks for your reply Thomas.

This was my initial thought on the design (although there are a fair few columns that I will not need in the final version, which are included in that diagram at the moment). See link below.

>>Structure<<

What do you think? Anything that jumps out at you regarding it?|||

Hello. The structure is fine for a data warehouse. If you build a datamart on top of this, by using views or new tables I recommend to reduce the number of joins in your snowflake.

Have a look here for more design tips(www.kimballgroup.com)

Regards

Thomas Ivarsson

|||I have made them into more of a star schema that a Snowflake and that has certainly helped.

However, I have a new issue. My understanding of OLAP & AS2005 is still growing, so it's possible I've got the wrong idea here, but...

The student grades are not always numeric, in many of the subjects they are graded A-G etc. They all have points equivalent, but it would be good to look at the grades too.

Common requests about the data are how many students achieved 5 A-C grades or 5 A-G etc. And other bits of analysis that are done around grade.

So some of the data needs to be analysed as stated here and some by numeric points. Can anyone help me as to how I could structure this part of it or achieve such a result?

I'm at a bit of a loss as to how.

Thanks,
David|||

You can used named members/sets for A-C and A-G type grades. For numeric data, you can either create a band dimension or use discretization methods in SSAS. You can search for these techniques in Books online.

One other point in your structure, for languages, you can create one physical dimension and use 'role playing dimensions'.

|||

Hello again. Build a separate dimension table for the grades(A-G), make a foreign key to the fact table and add a measure, called NumberOfGrades to the fact tables.

In this way you should be able to use sum as the aggregation method. This measure can then tell you how many (A-G) grades a student, teacher, school (and so on) that exist in the fact table.

HTH

Thomas Ivarsson

Design Advice

I'm trying to design a database that handles Clients, Cases, Individual and Group Sessions. My problem is that a client can have individual sessions and belong to more than one group at the same time, so I have a many-to-many relationship to deal with. Also I'm trying to design it so that I can have a form that when a group is selected from a drop down it shows all clients assigned to that group and will let me enter new session data for them.

Just looking for some advice on how to handle the relationships.
Maybe someone could show me how they see the relationships working.

My take is that the session is linked to the case not the client, I could be thinking incorrectly?

Thank you,

tblClient
tblClientCase
tblCaseSessionLog
tblClientCaseGroupLink
tblGroupscreate look up tables to handle those relationships.|||What do you mean? Can you explain?

Thank you,|||When a case can have only one client, you can simply add the clinetID to the case.

if a client can be in multiple Groups you need

a master table of Clients
a master table of Groups

an additional table to link them that contains 2 columns (ClientID, GroupID)

this extra table allows you to add additional rows for each group a client belongs to.

You'd need to better define what the SessionLog is for before i can comment on that last question.

Design Advice

Hello - this is my first post (just found the site today). I'm looking for advice regarding the following design issue(s):

I have a VB batch job that runs a couple of times throughout the day - each time it runs, it inserts approximately 100k rows to a table (call it Table A). I need to be able to view the inserted data by batch run and in aggregate. Obviously querying the table has now become too expensive to produce aggregate reports.

I'd like to create an aggregate table (call it Table B) so that as rows are inserted into Table A, Table B is updated to reflect running totals.

Currently, the batch job inserts records into Table A by using the ADO batchupdate method. If I were to stick with this, then I could create a trigger to update Table B as records are inserted into Table A.

Question 1: Is it posssible to create a trigger that will update if the record exists or insert if it doesn't? What kind of overhead does that represent?

Question 2: If instead of using the updatebatch method, I call a stored procedure to insert each of the records into Table A and also insert/update Table B, will I suffer a performance hit?

Sorry for the long post, just wanted to make sure I didn't leave anything out. TIA for any help you can offer!In answer to your questions

1) You can create a trigger on Table A that updates/inserts into TableB.
UPDATE
SET value = value+i.Increment
FROM inserted i,
TableB b
WHERE b.Key = i.Key

INSERT TableB (col1,col2,col3,...)
SELECT col1,col2,col3,...
FROM inserted i
WHERE NOT EXISTS
(
SELECT *
FROM TableB b
WHERE b.Key = i.Key
)
This is just an idea on how you can do it. You could simplify the INSERT part if only one record is in the INSERTED table. You may need to think about what should happen to the values in TableB if TableA is updated?

2) Stored procedure are allows the way to go and increase performance, however when it comes to bulk operations or single operation with stored procedure, I like bulk. If you could combine the 2 that would be great.

Design Advice

HI there my first post so be gentle...

I am redeveloping a web store which is an ASP based site and am looking to make life easier for myself and other staff at the company I work for as well as our customers. I have some problems with my existing data, it is becoming quite a task to manage and this boils down to receiving product information from Vendors in varying formats and in some cases partial data from two different vendors to make one catalogue.

When the site was originally developed it was designed for one "master" product table. We currently still use this and use catalogue numbers and barcodes as our keys to pulling the various supporting information out of our Vendor databases this works OK but we now have many duplicates but fixing the table is not an option as too much of the internal systems rely on this master table for looking up product information.

I have a plan which I am currently in the middle of conceiving, and hence asking for advice here, to re-code our website to directly browse and search purely on the Vendor databases, this way our customers can view the absolute best up to date catalogues and if we need to we can perform a complete reload of the data if we wish most importantly not affecting the other databases or our internal systems. To keep track of what the customers order I intend to add the important product information - such as price, supplier etc to an orders table. Our staff can then use this information to process the orders regardless of the status of my product data that is web facing. Now strictly speaking I dont think this design is correct, as I will be duplicating quite a bit of data into that already exists in my Vendor databases into my orders table and certain problems will arise that I can forsee already like having to write specific code for the browsing and searching of each Vendor database but I think the overall benefits outweigh the current setup. we use SQL server 2k and have several million rows of existing data that will need importing into the new structure aswell. What do you think of doing this i.e. keeping the product data and order data totally seperate and developing the website around this so essentially the add to shopping cart button on the website is what does the adding of data between the two.

phew long post...any advice appreciated!Are these vendor databases you are referring to - are they part of your system or are they the actual remote databases of your vendors ? And how does the "master" relate to these ?|||They make up the product range that we sell from the website and are supplied by the vendors as databases (normally dbf or csv format - but they will usually supply a schema). I will take these databases and provide facility to browse, search and order the products on the website.

As the rest of our system is designed to access a single table for the products, my code on the website takes the information from the Vendor databases and as the customers browse our site & add products to their cart some background code adds records into the master product table so that we can keep track of what they have in their cart.

This solution is not particularly suitable as it creates a lot of duplicate entries (if two different customers order the same product for example or our staff may delete a record for whatever reason and add a new record.)

what I am thinking is to eliminate the master table, and seperate my data into two totally different entities - product data and order data. I doubt this is good design but Its the easiest way I can see of simplifying the system.|||So are the customers browsing the "databases" that the vendors supply to you - you don't import this data into a central products table ?

Why not import the vendor's databases into 1 (use can use dts to help you transform the data into a standard if possible)- rather than waiting on a customer to pick - this would be easier to manage and faster ? Next, use the "true master" as a query only - use the unique key from this and use it to populate the order details table for the customer.|||this would be the ideal however the vendor databases are very different in product information, change frequently (daily/weekly change,additions & deletes) the vendor data also contains "related" product information that isn't directly related to orders that would logically not work in a single product table. My "keys" to pull ou t the related data i need would be barcodes and/or catalogue numbers but these are a) not always unique or b) do not match up (so we lose valuable product information that would generate sales)

My Vendor databases could potentially number 5 - 10 and range from single table 50mb files to 10gb relational structures. Getting it all to work off of a single product table with no dupes and allowing staff to make changes aswell as the daily/weekly updates just spins my head.|||To be honest (and I have very minimal information for this recommendation) I would re-examine the entire process and redesign my structure. The problem you will continue to have is "patching" the process to make everything work based on a design that may have worked in the beginning but is beginning to show it flaws. There are several techniques to handle disjointed data sources that you have while still having a centralized repository of data (usually keys that just point to the actual table/databases that you need to retrieve data). Basically, creating an intermediate table to abstract the complexity of the data sources beneath it, while allowing it to be manageable and maintainable.

There are times that you are much better off punting the existing design - this may have an upfront cost that seems too expensive but realize that in a very short time the maintainability will pay for itself - any customization on both the database and software development side will be minimal. Plus adding additional vendors to this design would be seamless.|||RE: To be honest (and I have very minimal information for this recommendation) I would re-examine the entire process and redesign my structure. The problem you will continue to have is "patching" the process to make everything work based on a design that may have worked in the beginning but is beginning to show it flaws. There are several techniques to handle disjointed data sources that you have while still having a centralized repository of data (usually keys that just point to the actual table/databases that you need to retrieve data). Basically, creating an intermediate table to abstract the complexity of the data sources beneath it, while allowing it to be manageable and maintainable.
There are times that you are much better off punting the existing design - this may have an upfront cost that seems too expensive but realize that in a very short time the maintainability will pay for itself - any customization on both the database and software development side will be minimal. Plus adding additional vendors to this design would be seamless.

RE: [I would re-examine the entire process and redesign my structure.]

S1 I agree, and support the sentiment.

S2 Create a sound logical design that addresses the business needs. This is probably the best advice you can take. Since your situation appears large and complex, you may want to implement a sound end result in two or more intermediate stages that may more easily be budgeted, implemented, and adjusted for 'unforseen' issues over a period of few quarters or longer (rather than use a 'Big Bang' cutover strategy).|||Thanks for the replies definitely very useful comments. What I am considering is to totally seperate my product data from my order data and use code on the website to transfer the neccessary data into my order table so the staff can process the customer orders.

vendor data --> sql server --> website --> sql server --> order tables --> order processing

whereas the current setup is along the lines of

product tables <--> sql server <--> website
^
|
order tables

my data changes so frequently that this is the only simple way I can see of approaching the problem.

If i go along the route of having keys in a centralised database pointing to the source of the information I think this would be too much effort to manage and ensure it is kept in order.

Thanks for your help its great to be able to throw some ideas around as I do not currently have a second technical person to discuss this in detail with!

Design Advice

Hello,
We are storing largish (about 1 million rows) datasets which are imported
from CSV files.
We do no manipulation of the statistical content, so records are stored as
follows:
Year Gender Age Value
1998 Male 18 100
1998 Male 19 150
1998 Female 18 45
1998 Female 19 60
So there were 100 males aged 18 in this particular survey.
The gender and age are being stored as varchar fields, and the value as an
integer field.
The problem is that some surveys have invalid data - such as a n.a. (not
available), n.p. (not published), and n.y.a (not yeat available). So we may
need to store:
1997 Female 19 n.a
I need to reflect this invalid data to the user when he is completing
dynamic cross-table reports of the data Eg the user can filter just too 1998,
or to all years - if all years were selected, I need to return 1997 Female 19
as n.a.
Can anybody offer some advice to me for which is the best way to store this?
I considered using NULL's, but I had two issues; ideally I'd like the
aggregation routine to give me a NULL if a NULL is aggreated, however it
treats it as a 0; and there would be no easy way for me to differentiate
between the different types of invalid data.
Sorry about the long post.
Any advice would be greatly appreciated.
Mark
NULL is the obvious way to record the unknown amounts. The additional
"status" code for invalid data could go in another column. For example:
CREATE TABLE Surveys (yr INTEGER NOT NULL CHECK (yr BETWEEN 1990 AND 2100),
gender CHAR(1) NOT NULL CHECK (gender IN ('M','F')), age INTEGER NOT NULL
CHECK (age BETWEEN 0 AND 150), pop_count INTEGER NULL, status CHAR(3) NOT
NULL CHECK (status IN ('NA','NP','NYA','OK')), CHECK ((pop_count IS NOT NULL
AND status = 'OK') OR (pop_count IS NULL AND status <> 'OK')), PRIMARY KEY
(yr,gender,age))
(I used "Pop_count" just because "Value" isn't a very informative name for a
column. Value also a reserved word.)
I'm not sure what you mean when you say that the "aggregation routine"
treats NULLs as zero. The SUM aggregate in SQL will ignore NULLs and return
a total without them. If all values in an aggregation are NULL then the
result is NULL. If you want to return a NULL sum if *any* value in the
aggregation is NULL then you can use CASE like this:
SELECT CASE WHEN COUNT(pop_count)=COUNT(*) THEN SUM(pop_count) END
FROM Surveys
David Portas
SQL Server MVP
|||I use Null as well for numeric fields, for character fields I use either
null or NA whichever the users prefer.. About the only way you can aggregate
null values is to count them...
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"mallen" <mallen@.discussions.microsoft.com> wrote in message
news:2D445832-520C-4AD1-AC0F-9176B63DF483@.microsoft.com...
> Hello,
> We are storing largish (about 1 million rows) datasets which are imported
> from CSV files.
> We do no manipulation of the statistical content, so records are stored as
> follows:
> Year Gender Age Value
> 1998 Male 18 100
> 1998 Male 19 150
> 1998 Female 18 45
> 1998 Female 19 60
> So there were 100 males aged 18 in this particular survey.
> The gender and age are being stored as varchar fields, and the value as an
> integer field.
> The problem is that some surveys have invalid data - such as a n.a. (not
> available), n.p. (not published), and n.y.a (not yeat available). So we
may
> need to store:
> 1997 Female 19 n.a
> I need to reflect this invalid data to the user when he is completing
> dynamic cross-table reports of the data Eg the user can filter just too
1998,
> or to all years - if all years were selected, I need to return 1997 Female
19
> as n.a.
> Can anybody offer some advice to me for which is the best way to store
this?
> I considered using NULL's, but I had two issues; ideally I'd like the
> aggregation routine to give me a NULL if a NULL is aggreated, however it
> treats it as a 0; and there would be no easy way for me to differentiate
> between the different types of invalid data.
> Sorry about the long post.
> Any advice would be greatly appreciated.
> Mark

Design Advice

Hello,
We are storing largish (about 1 million rows) datasets which are imported
from CSV files.
We do no manipulation of the statistical content, so records are stored as
follows:
Year Gender Age Value
1998 Male 18 100
1998 Male 19 150
1998 Female 18 45
1998 Female 19 60
So there were 100 males aged 18 in this particular survey.
The gender and age are being stored as varchar fields, and the value as an
integer field.
The problem is that some surveys have invalid data - such as a n.a. (not
available), n.p. (not published), and n.y.a (not yeat available). So we may
need to store:
1997 Female 19 n.a
I need to reflect this invalid data to the user when he is completing
dynamic cross-table reports of the data Eg the user can filter just too 1998
,
or to all years - if all years were selected, I need to return 1997 Female 1
9
as n.a.
Can anybody offer some advice to me for which is the best way to store this?
I considered using NULL's, but I had two issues; ideally I'd like the
aggregation routine to give me a NULL if a NULL is aggreated, however it
treats it as a 0; and there would be no easy way for me to differentiate
between the different types of invalid data.
Sorry about the long post.
Any advice would be greatly appreciated.
MarkNULL is the obvious way to record the unknown amounts. The additional
"status" code for invalid data could go in another column. For example:
CREATE TABLE Surveys (yr INTEGER NOT NULL CHECK (yr BETWEEN 1990 AND 2100),
gender CHAR(1) NOT NULL CHECK (gender IN ('M','F')), age INTEGER NOT NULL
CHECK (age BETWEEN 0 AND 150), pop_count INTEGER NULL, status CHAR(3) NOT
NULL CHECK (status IN ('NA','NP','NYA','OK')), CHECK ((pop_count IS NOT NULL
AND status = 'OK') OR (pop_count IS NULL AND status <> 'OK')), PRIMARY KEY
(yr,gender,age))
(I used "Pop_count" just because "Value" isn't a very informative name for a
column. Value also a reserved word.)
I'm not sure what you mean when you say that the "aggregation routine"
treats NULLs as zero. The SUM aggregate in SQL will ignore NULLs and return
a total without them. If all values in an aggregation are NULL then the
result is NULL. If you want to return a NULL sum if *any* value in the
aggregation is NULL then you can use CASE like this:
SELECT CASE WHEN COUNT(pop_count)=COUNT(*) THEN SUM(pop_count) END
FROM Surveys
David Portas
SQL Server MVP
--|||I use Null as well for numeric fields, for character fields I use either
null or NA whichever the users prefer.. About the only way you can aggregate
null values is to count them...
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"mallen" <mallen@.discussions.microsoft.com> wrote in message
news:2D445832-520C-4AD1-AC0F-9176B63DF483@.microsoft.com...
> Hello,
> We are storing largish (about 1 million rows) datasets which are imported
> from CSV files.
> We do no manipulation of the statistical content, so records are stored as
> follows:
> Year Gender Age Value
> 1998 Male 18 100
> 1998 Male 19 150
> 1998 Female 18 45
> 1998 Female 19 60
> So there were 100 males aged 18 in this particular survey.
> The gender and age are being stored as varchar fields, and the value as an
> integer field.
> The problem is that some surveys have invalid data - such as a n.a. (not
> available), n.p. (not published), and n.y.a (not yeat available). So we
may
> need to store:
> 1997 Female 19 n.a
> I need to reflect this invalid data to the user when he is completing
> dynamic cross-table reports of the data Eg the user can filter just too
1998,
> or to all years - if all years were selected, I need to return 1997 Female
19
> as n.a.
> Can anybody offer some advice to me for which is the best way to store
this?
> I considered using NULL's, but I had two issues; ideally I'd like the
> aggregation routine to give me a NULL if a NULL is aggreated, however it
> treats it as a 0; and there would be no easy way for me to differentiate
> between the different types of invalid data.
> Sorry about the long post.
> Any advice would be greatly appreciated.
> Mark

Design Advice

Hello,
We are storing largish (about 1 million rows) datasets which are imported
from CSV files.
We do no manipulation of the statistical content, so records are stored as
follows:
Year Gender Age Value
1998 Male 18 100
1998 Male 19 150
1998 Female 18 45
1998 Female 19 60
So there were 100 males aged 18 in this particular survey.
The gender and age are being stored as varchar fields, and the value as an
integer field.
The problem is that some surveys have invalid data - such as a n.a. (not
available), n.p. (not published), and n.y.a (not yeat available). So we may
need to store:
1997 Female 19 n.a
I need to reflect this invalid data to the user when he is completing
dynamic cross-table reports of the data Eg the user can filter just too 1998,
or to all years - if all years were selected, I need to return 1997 Female 19
as n.a.
Can anybody offer some advice to me for which is the best way to store this?
I considered using NULL's, but I had two issues; ideally I'd like the
aggregation routine to give me a NULL if a NULL is aggreated, however it
treats it as a 0; and there would be no easy way for me to differentiate
between the different types of invalid data.
Sorry about the long post.
Any advice would be greatly appreciated.
MarkNULL is the obvious way to record the unknown amounts. The additional
"status" code for invalid data could go in another column. For example:
CREATE TABLE Surveys (yr INTEGER NOT NULL CHECK (yr BETWEEN 1990 AND 2100),
gender CHAR(1) NOT NULL CHECK (gender IN ('M','F')), age INTEGER NOT NULL
CHECK (age BETWEEN 0 AND 150), pop_count INTEGER NULL, status CHAR(3) NOT
NULL CHECK (status IN ('NA','NP','NYA','OK')), CHECK ((pop_count IS NOT NULL
AND status = 'OK') OR (pop_count IS NULL AND status <> 'OK')), PRIMARY KEY
(yr,gender,age))
(I used "Pop_count" just because "Value" isn't a very informative name for a
column. Value also a reserved word.)
I'm not sure what you mean when you say that the "aggregation routine"
treats NULLs as zero. The SUM aggregate in SQL will ignore NULLs and return
a total without them. If all values in an aggregation are NULL then the
result is NULL. If you want to return a NULL sum if *any* value in the
aggregation is NULL then you can use CASE like this:
SELECT CASE WHEN COUNT(pop_count)=COUNT(*) THEN SUM(pop_count) END
FROM Surveys
--
David Portas
SQL Server MVP
--|||I use Null as well for numeric fields, for character fields I use either
null or NA whichever the users prefer.. About the only way you can aggregate
null values is to count them...
--
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"mallen" <mallen@.discussions.microsoft.com> wrote in message
news:2D445832-520C-4AD1-AC0F-9176B63DF483@.microsoft.com...
> Hello,
> We are storing largish (about 1 million rows) datasets which are imported
> from CSV files.
> We do no manipulation of the statistical content, so records are stored as
> follows:
> Year Gender Age Value
> 1998 Male 18 100
> 1998 Male 19 150
> 1998 Female 18 45
> 1998 Female 19 60
> So there were 100 males aged 18 in this particular survey.
> The gender and age are being stored as varchar fields, and the value as an
> integer field.
> The problem is that some surveys have invalid data - such as a n.a. (not
> available), n.p. (not published), and n.y.a (not yeat available). So we
may
> need to store:
> 1997 Female 19 n.a
> I need to reflect this invalid data to the user when he is completing
> dynamic cross-table reports of the data Eg the user can filter just too
1998,
> or to all years - if all years were selected, I need to return 1997 Female
19
> as n.a.
> Can anybody offer some advice to me for which is the best way to store
this?
> I considered using NULL's, but I had two issues; ideally I'd like the
> aggregation routine to give me a NULL if a NULL is aggreated, however it
> treats it as a 0; and there would be no easy way for me to differentiate
> between the different types of invalid data.
> Sorry about the long post.
> Any advice would be greatly appreciated.
> Mark