Keeping Time

by Joe Celko

SQL is the first programming language to have explicit temporal datatypes. I have had the theory that if Cobol had been designed with a TIMESTAMP datatype, we would have avoided all that Y2K trouble. At least now, more people are aware of the ISO 8601 time and date display standards. Who knows? Maybe people will start to use them.

The temporal support in each SQL product can be classified as either a "Unix-style" or "Cobol-style" internal representation.

In the Unix-style representation, each point in time is shown as a very large integer number that represents the number of clock ticks from a base date. This is how the Unix operating system handles its temporal data. The use of clock ticks makes calculations very easy — it becomes simple integer math. However, it is hard to convert the clock ticks into a year-month-day-hour-minute-second format.

In the Cobol-style representation, the database has a separate internal field for the year, month, day, hour, minute, and seconds. This is great for displaying the information, but not for calculations.

One of the debates in the SQL Standards Committee was how to handle intervals of time. The reason that time is tricky is that it is continuous. The defining mathematical property of a continuum is that any part of it can be further sub-divided forever. Give me any line segment and I can cut it into smaller segments endlessly. But we run into the problem that the defining property of a point is that it cannot be further subdivided. So how can there be points in a continuum?

When you give a year, say 2000, you are really giving me an interval of 365 days. Give me a date, say 2000-01-01, you are not giving me a point; you are identifying an interval of 24 hours. Give me the date and time 2000-01-01 00:00:00 and you are giving me an interval of 60 seconds. It never stops!!

The decision in SQL was to view time as a series of open ended intervals. That is, the segment includes the starting point in time, but never gets to the end point of the interval. This has some nice properties. It prevents you from counting the end of one event and the start of another event as identical moments in time. An open interval minus an open interval gives open intervals as a result and all points are accounted for.

But intervals are hard to work with conceptually. Let me give you an actual example that was posted in a newsgroup. We have a table that catches information about the user activity on a system. It is a very simple "log file" that shows when someone starts and ends a session with the system. We do not even care who the user was, since I am assuming that user_activity_id is a unique number that identifies a session, without identifying individual users. The table looks like this:

CREATE TABLE User_Activity
(user_activity_id INTEGER NOT NULL PRIMARY KEY,
 login TIMESTAMP NOT NULL,
 logout TIMESTAMP, -- null means session is still active
 CHECK (login < logout),
 ...);

Using a NULL in the logout column to mean that the session is still active adds a little complexity to the problem. I decided to use the current timestamp at the time the query is executed as the logout time.

I would like to be able to report the number of user sessions logged on during each hour of the day. So, if someone began a session at 03:12 Hrs and ended it at 06:45 Hrs, I would like them to be counted as being logged on the system for 03:00 Hrs, 04:00 Hrs, 05:00 Hrs and 06:00 Hrs. This report should work all the hours in several years of data.

One solution proposed in the newsgroup involved using CASE expressions to classify each time extracted from the TIMESTAMP values as to what hourly interval it belongs. The logic got worse from there.

Here is one solution: first, create an auxiliary table like this:

CREATE TABLE HourlyReport
(period_nbr INTEGER NOT NULL PRIMARY KEY,
 start_timestamp TIMESTAMP NOT NULL,
 end_timestamp TIMESTAMP NOT NULL,
 CHECK(start_time < end_time));

INSERT INTO HourlyReport
VALUES (1, '1999-01-01 00:00:00.00000',
'1999-01-01 00:59:59.99999');

INSERT INTO HourlyReport
VALUES (2, '1999-01-01 01:00:00.00000',
'1999-01-01 01:59:59.99999');
etc.

Before you reject this auxiliary table, notice that it is easy to generate and will be (24 hours per day * 365.25 days per year * 10 years) = 87660 rows in size if you want to handle an entire decade of data.

The query to find the periods in which each activity falls is simply:

  SELECT DISTINCT A1.user_activity_id, period_nbr
   FROM User_Activity AS A1,
        HourlyReports AS H1
  WHERE H1.start_timestamp BETWEEN A1.login
                  AND COALESCE A1.logout,CURRENT_TIMESTAMP)
     OR H1.end_timestamp BETWEEN A1.login&
                 AND COALESCE A1.logout, CURRENT_TIMESTAMP);

Notice the DISTINCT! Without it, you would count both the start and end times of each period. Now, to answer the original question, tally by periods:

  SELECT A1.period_nbr, A1.start_timestamp,
        COUNT (DISTINCT A1.user_activity_id)
          AS total_sessions
   FROM User_Activity AS A1,
        HourlyReports AS H1
  WHERE H1.start_timestamp BETWEEN A1.login
                 AND COALESCE A1.logout, CURRENT_TIMESTAMP)
     OR H1.end_timestamp BETWEEN A1.login
                 AND COALESCE A1.logout, CURRENT_TIMESTAMP)
  GROUP BY A1.period_nbr, A1.start_timestamp;

It might help if you drew a diagram with a time line, then put in a session as a line segment which crosses the borders between the time periods.

session  X------------------X           

     -|-----|-----|-----|-----|-----|--
period 2 3 4 5 6

Instead of trying to put the session into the periods, this query puts the starts and stops of the periods into the session interval. A period can have a start time, a stop time or both inside the session; this case is why you need to remove the duplicate period numbers.

Joe Celko was a member of the ANSI X3H2 Database Standards Committee and helped write the SQL-92 standards. He is the author of over 450 magazine columns and four books, the best known of which is SQL for Smarties (Morgan-Kaufmann Publishers, 1999). He is the Vice President of RDBMS at North Face Learning in Salt Lake City.

Contributors : Joe Celko
Last modified 2005-04-20 10:17 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Keeping Time