Friday, February 24, 2012

Design question regarding comma delimited data

Hi,

I'm trying to figure out if it's good design practice to have several pieces of data in a one column. I explain by example. Let's say you build a movie site. Each movie can belong to several categories. A movie can be Action, Adventure, Fantasy and Drama all at once. Assume a database table with all the movies and another table with all the categories. Now how would I associate one movie with several categories? Would it be OK if I add a Category field in the movie table and then add several categories in that, delimited by commas? Sort of like below:

movie_title | movie_rating | category_name
------------------------------
Pirates of the Carribean | PG-13 | Action,Adventure,Fantasy
Evil Dead | Unrated | Horror

and so on ...

I can then query the database with a LIKE query if I want to select all movies of a certain category. Personally, I don't like this approach to much, but I can't think of another way to achieve this. Well, there is one other, but I like that one even less. I could create another table that links each movie to a category, but his way each movie with several categories would get a new row. Using the table above, Pirates would get three rows in that table. One with Action, one with Adventure and one with Fantasy. Get my drift?


It all seems counter-intuitive. Thoughts?

Thanks :o)

You should go with the second approach (separate table where you store MovieId, CategoryId) unless you have a good reason to not do it so.

Think about the fact that search by category is not the only possible query you will need. How would you do implement the following using your first approach:Select the number of movies in each category? Using a separate table you can just group by category and do a count().

Keep in mind that normalization is almost always a good thing to do. If you don't know what that means you really have to take a book on database design and read about it.

|||

Create a child table that joins the two tables together

Your Movies Table
MovieID int identity PK
Title varchar
Rating varchar

Your Categories Table
CategoryID int identity PK
Name varchar

Your MovieCategories table
MovieID int
CategoryID int

So if you enter a movie in the Movies table
Title: Pirates of the Carribean
Rating: PG-13

It would get assigned an auto-incrementing ID (in this case 1)

You then populate your Category Table with things like
Action (gets assigned 1)
Adventure (gets assigned 2)
Drama (gets assigned 3)
Comedy (gets assigned 4)
Fantasy (gets assigned 5)

Then in your child table you assign the MovieID for Pirates with the CategoryID's that it belongs to
MovieID CategoryID
1 1
1 2
1 5

Then when you query you would join the necessary tables together to get what you are after.

|||

>>Now how would I associate one movie with several categories?

The method you described isvery typical ( a third table to link the other two).

Table: MovieCategoriesMovieID CatID1 11 32 23 13 23 3
This is the utilimate in flexibility but requires more work from you.
A quick and dirty way (not always bad) would be a boolean column for each category.
This assumes that categories are fixed and you will not be adding new categories

|||

The best will be to create table

Movie with MovieID and Movietittle

Ratings with RatingID and RatingName

Category CategoryID and categoryname

and two tables to link your information

MovieRatings with mrID, mrMovieID, mrRatingID

MOvieCategory with mcID, mcMovieID ,mcCategoryID

With this structure your queries will be much faster when with like statement and also it will save you same space in database

You can have multiple categories connected to movie in movie category table and also different ratings if you need.

Thanks

|||

Thanks for all the replies.

See, I used the, for lack of a better word, "third table" approach before. It's the only way I could think of doing what I wanted it to do. I just never liked that idea of having to create another table. But apparently this is pretty common, so I don't feel so bad about it anymore :o)

It's basically going to look like this:

Movies

movie_id movie_title movie_rating
-------------
1 Bad Movie PG
2 Cool Movie R
3 So So Movie PG-13

Categories

category_id category_name
----------
1 Action
2 Adventure
3 Comedy
4 Drama

Movies/Categories

movie_id category_id
--------
1 1
1 2
2 3
2 4
3 3

I guess that makes good sense, but it always seemed redundant.

Thanks :o)

No comments:

Post a Comment