Tuesday, February 14, 2012

Design advice for an education data warehouse

I’m not sure whether this ought to be in the Architecture or

SQL Analysis Services forum.I am after

some initial advice about a suitable structure for a data warehouse which will

be used to build OLAP cubes AS2005.

I work in a county education department.Each year we create/update a set of reports

and procedures which provide schools with online access to many of the

statistics for their school compared to district wide and to national targets,

etc.

After this year’s, which is my second at creating these

report/data, I can’t help but think that the whole thing is screaming out for a

Data Warehouse and Cube data to report on.

I may be wrong on that, as I’m still only doing the learning regarding

AS2005 and haven’t touched a previous version of it.I’m currently working through this book:

Microsoft SQL Server 2005 Analysis Services Step by Step - ISBN

0735621993.But I’m willing to bet that

this should be the way forward.Not only

for the actual schools viewing the data, but also for all those staff who work

at the county level and have to inspect/report on these schools and the

county’s performance.The only thing is

that most examples are stated within companies which have sales as a primary

measure, however I’m not sure education grades, targets, etc fall into the same

scheme.So I can’t use an example

structure to help get me started.

It isn’t something that I’ve got management backing for yet,

as it’s still pretty much just my own and a colleagues’ thoughts at the

moment.Also until I know more about it

myself then I wouldn’t feel comfortable taking my case to management.I feel I will be better to create a simple

version to demonstrate and show the benefit and power of such a system, to

those who will give the backing to the project.

The main central application, which is used in the county,

contains most of the information needed.

And several other systems also contain additional data which would be

useful.All based around a

Unique_Pupil_ID.

Current transactional database

The main transaction database I would be taking data from

has a main student table, then it has a student_results table.The student table also has many lookup type

tables related to ethnic origins, deprivation codes relating to income/student

address, etc.The kind of things that

would be useful when doing analysis on students.As well as at a more broad level, being able

to compare schools to each other at various subjects/overall/etc.

So the student table contains the information about the

student, but not results data.

The results table data is similar to below, but obviously

with more to it than just these fields:


Student_ID

Subject_ID

Grade

GradePointsEquivalent

ExamYear


Then there is obviously a lookup for the subject name.

There are also things like average/points/score per student/year

which aren’t a subject score, but more of a calculated field.So where ought that data to be located table-wise?

There would be a lot more to a total DW solution for this

environment, but I wanted to start with the basics so I have somethign to demonstrate.

Can anyone offer my any advice regarding this? and a possible design structure for the warehouse with regards to which ought to be fact or dimension tables.

Or do you think I'm barking up completely the wrong tree thinking that this would be a good solution?

Thanks for taking the time to read this less than short post.

Kind regards,
David

Hello. I have searched on the subject "education data warehouse" and found some suppliers and solutions in this area.

You design is a good start. A student table, with facts about the student, and a fact table with each students results, by each test or whatever. You will also need a tiem dimension, a test dimension(if these tests are general). Perhaps a school and a teacher dimension can enhance the analytic value of your solution. Students change schools and teachers and teachers change schools.

In the fact table you will normally only put base measures that can be used for calculations. You do the calculations in a cube.

Your problem have a lot in common with analysis when you try to see customer profiles and their buying patters, so you can have some help with sales examples.

HTH

Thomas Ivarsson

|||Thanks for your reply Thomas.

This was my initial thought on the design (although there are a fair few columns that I will not need in the final version, which are included in that diagram at the moment). See link below.

>>Structure<<

What do you think? Anything that jumps out at you regarding it?|||

Hello. The structure is fine for a data warehouse. If you build a datamart on top of this, by using views or new tables I recommend to reduce the number of joins in your snowflake.

Have a look here for more design tips(www.kimballgroup.com)

Regards

Thomas Ivarsson

|||I have made them into more of a star schema that a Snowflake and that has certainly helped.

However, I have a new issue. My understanding of OLAP & AS2005 is still growing, so it's possible I've got the wrong idea here, but...

The student grades are not always numeric, in many of the subjects they are graded A-G etc. They all have points equivalent, but it would be good to look at the grades too.

Common requests about the data are how many students achieved 5 A-C grades or 5 A-G etc. And other bits of analysis that are done around grade.

So some of the data needs to be analysed as stated here and some by numeric points. Can anyone help me as to how I could structure this part of it or achieve such a result?

I'm at a bit of a loss as to how.

Thanks,
David|||

You can used named members/sets for A-C and A-G type grades. For numeric data, you can either create a band dimension or use discretization methods in SSAS. You can search for these techniques in Books online.

One other point in your structure, for languages, you can create one physical dimension and use 'role playing dimensions'.

|||

Hello again. Build a separate dimension table for the grades(A-G), make a foreign key to the fact table and add a measure, called NumberOfGrades to the fact tables.

In this way you should be able to use sum as the aggregation method. This measure can then tell you how many (A-G) grades a student, teacher, school (and so on) that exist in the fact table.

HTH

Thomas Ivarsson

No comments:

Post a Comment