A Simple Database Design Decision Can Save (or Cost) Thousands of Dollars
By Jason Sherrill
Posted on Jul 30, 2007
We're fortunate to have two very experienced database engineers on staff at InetSolution, and we bill a pretty hefty rate for their services. Today a frustrating experience reminded me why the fees people pay for database design are worth for more than the price they usually pay.
WARNING: Unless you think programming is fun, you will not find the rest of this post interesting.
Today I spent nearly two and a half hours writing a query that should've taken about 15 minutes. The problem stemmed from a design decision (thankfully not one that anyone at InetSolution made) during the early planning stages of the software project.
I was creating a report for a web-based application using a database table that someone else created. I was writing a simple query that should've looked like this:
SELECT name,filedate from DETAIL_HISTORY Where (bai_rt = 'XYX123') and (filedate >= '7/30/2007') and (filedate <= '7/30/2007') order by filedate
Instead, the query ended up looking like this:
SELECT name, CAST(RIGHT(filedate,2)+LEFT(filedate,4) AS DATETIME) from DETAIL_HISTORY Where (bai_rt = 'XYX123') and (CAST(RIGHT(filedate2)+LEFT(filedate,4) AS DATETIME) >= '7/30/2007') and (CAST(RIGHT(filedate,2)+LEFT(filedate,4) AS DATETIME) <= '7/30/2007') order by filedate
The problem is that the database column that contains the date value (FILEDATE) is stored as a varchar rather than a datetime type. In addition, the data is stored with a two-digit year instead of a four-digit year. Here is an example of the data:
Sample data from the database
The problem is that while humans recognize the '032707' value as a date, to SQL Server those characters are just a series of numbers with no particular meaning. So if I were to try to compare '032707' to '3/27/2007', SQL Server would return a result indicating that they're not equal. In human terms, this would be the equivalent of adding apple + 4 -- it just doesn't compute. (Actually, in some instances it would compute, and you'd end up with apple4.)
The SQL Server query language has functions for converting data from one type to another. Two of these functions are CAST and CONVERT. If I attempt to CAST this value to a datetime format using this query:
SELECT name, CAST(filedate AS DATETIME) FROM DETAIL_HISTORY
I would get this result:
The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value
If I attempt to use the CONVERT function with this query:
SELECT name, CONVERT(DATETIME, filedate, 101) from DETAIL_HISTORY
I would get the same result as above.
The problem lies with the format of the data, MMDDYY. If I were to convert each of the dates to YYYYMMDD, then both functions would work and return a result of 3/27/2007.
Since changing the source data is not an option with this application, I have to perform this data manipulation an run time. Using this code:
CAST(RIGHT(filedate,2)+LEFT(filedate,4) AS DATETIME)
I am first finding the last two characters of the filedate column value (the YY portion of the date) and then appending the first four characters of the value (the MMDD) to return a YYMMDD format. Then I use CAST to convert that value into a valid datetime format that I can use to perform date comparisons in the query.
These are the main problems that the improper data type selection causes:
- It took roughly five times longer to write the query, thus costing my client five times as much development fees
- The application will not perform as fast as it could otherwise because the application must perform data type conversions at run time
- Future development costs will also be higher each time that developers must create new modules that query this data
Since the database is the heart of all database-driven software applications and websites, it is worth paying a professional to design, or at least review, your database structure before you begin any coding.