The Law of Three
If the system properly handles dates, characters and money, it has good quality.
That is the law.
I have been talking about it every now and then for quite some time. Years pass by, but this law seems to hold true in the world of professional software development where I work. Most of the real world systems handle characters and dates in some fashion, and getting these two right is a major challenge even if money is not involved. The law sets a sufficient condition; I rarely see a software system of non-trivial size which gets all these right.
This post examines these three matters through examples to shed some light into why they’re actually quite difficult. The intention is not to deal with these subjects in a thorough manner — that would fill a book I’m not qualified to write.
Date and time are something familiar to us, but the devil is in the details. The Long, Painful History of Time gives a good account of the intricacies of date and time.
Understanding the historical context and time zones is only half the battle. Let’s consider Java and a standard JDBC interface to a relational database, say PostgreSQL.
Originally Java had two Date classes:
java.sql.Date. Despite Java core developers being smart, it
turned out that poor
java.util.Date had issues. A decade later the Java time API was rewritten in JSR 310. And then JodaTime came to rescue the suffering Java community. Now the confused developer has deprecated (and broken) legacy classes lying around, some newer ones to choose from and a lot of headache.
Okay, but surely the situation is better in SQL databases? We’ll define our table like this:
CREATE TABLE dadas ( le_time TIMESTAMP NOT NULL ) ;
Oops, we just failed. According to PostgreSQL documentation this leaves out timezone, which effectively means the system locale.
The SQL standard requires that writing just timestamp be equivalent to timestamp without time zone, and PostgreSQL honors that behavior. (Releases prior to 7.3 treated it as timestamp with time zone.)
new Date(2012,12,12) > Sat Jan 12 2013 00:00:00 GMT+0200 (EET) new Date(2012,0,1) > Sun Jan 01 2012 00:00:00 GMT+0200 (EET) new Date(2012,-1,1) > Thu Dec 01 2011 00:00:00 GMT+0200 (EET)
Good luck getting all these layers to work perfectly in all situations.
Unless you are a veteran with a lot of scars, this old article about encoding from Joel Spolsky should give you something to think about.
As with dates, it doesn’t end here. With Java your source code and strings at least have some defined encodings and characters sets. Python developers are not so fortunate, see this: Unicode in Python.
Again, remember to check your relational database for surprises. One particularly interesting one is the order of result set we get from this innocent SQL statement:
select * from winners order by name;
As the article on Alphabetical Order in Wikipedia informs us, the order of the letters in the alphabet depends on the country. V may come before W, or can it? And the order has been changed in some countries quite recently. This may result in ordersings that are a bit surprising as some characters can even have equal standing in the order. Do you actually know what ordering your database server is using now? Did you specify it? This article about ordering in Oracle SQL database has plenty of details, but how many software developers read such documents? To make matters worse, with Microsoft SQL Server you need to read about collation in SQL Server even if you know Oracle inside out. The vendors have their own conventions.
Characters also need to be escaped, encoded and re-enconded multiple times during their travel through servers and switches and lines of code. We do it with URL encoding, HTML encoding and many other forms of encoding. It is anything but easy to have everything working perfectly in a complex modern software system.
Show me the money
Money may actually be the easiest of the three. Perhaps the two most difficult things to remember are:
- Be careful with rounding and precision. 1/3 is an infinite number, even with some sort of “big decimal”.
- If you convert from one currency to another all sort of issues arise. Where did you get your currency rate from?
My advice is to create an abstraction or a wrapper for handling money. Instead of having a
float amount (gasp, the horror) or
BigDecimal amount it would be better to have
Money amount with Money handling the roundings somehow.
Eventually you will have to round money to some precision. Then all sorts of funny issues may come up, like the sum of invoice rows (rounded) may not actually equal the total monetary amount stated in the invoice.
Does it matter?
It doesn’t, until it matters. It depends on the context if some mistake with encoding of characters presents a marginal non-issue, a serious security flaw or an embarrasing UI glitch. Similarly, improper handling of date and time may not be pose any risk from the business perspective.
However, it does matter that programmers are aware of these things and intricacies which may be significant. How could they otherwise judge if something is important or not? Consider also that in most cases, the correct program takes no more time to write, provided that you know what you are doing. Researching and learning takes time and usually occurs after some problem has caused real measurable consequences, like a flow of bug reports from the end users. Locked accounts. Missing payments. That’s how I learnt what little I know about these things. I wish you the best of luck with your learning experiences.