How much storage space and I/O resources can be saved by shrinking the length of date and datetimes in SAS while maintaining sufficient precision? In my experience, saving I/O generally makes SAS run faster (such as in a DATA step) proportional to the I/O savings, so, for example, a 50% reduction in file size makes SAS run about 50% faster.

SAS stores dates and datetimes as numerics, exactly like integers and decimals: the only difference is the format applied. (In other words, SAS doesn’t have a unique data type for dates and datetimes.) Date are stored as days since the epoch January 1, 1960, and datetimes are stored as seconds since the same epoch.

Using the LENGTH statement, SAS can store numerics using 3 to 8 bytes, where 8 is the default. When a value exceeds the maximum integer size for the numeric storage, SAS will continue to store the value, but it loses precision.

### Date

First, let’s see what storing a date with lengths 3 and 4 will allow, assuming no loss of precision is tolerable with dates:

data _null_; length date3 3; length date4 4; format date3 date4 yymmdd10.; date3 = 8192; /* the maximum integer size using 3 bytes of storage */ date4 = 2097152; /* the maximum integer size using 4 bytes of storage */ put date3= date4=; run;

The SAS log indicates 3 bytes is enough for date 1982-06-06, and 4 bytes is enough for the date 7701-10-23. **In other words, you can cut half the storage and processing time, while practically storing any date without loss of precision.**

### Datetime

Now to answer a similar question for datetimes, but here we assume some loss of precision is tolerable.

/* Returns an integer in the range min...max */ %macro rand_between(min, max); &min+ceil(rand('uniform')*(&max - &min )) %mend; /* Create random datetimes within the years 1980 and 2050 */ data datetime_accuracy; length dt3 3 dt4 4 dt5 5 dt8 8; format d yymmdd10.; do i = 1 to 100000; day_of_year = %rand_between(365); year = %rand_between(1980, 2050); d = mdy(1, 1, year) + day_of_year; h = %rand_between(0, 23); m = %rand_between(0, 59); s = %rand_between(0, 59); dt8 = dhms(d, h, m, s); dt5 = dt8; dt4 = dt8; dt3 = dt8; output; end; run; /* Measure errors */ data datetime_accuracy; set datetime_accuracy; dt3_err = dt8 - dt3; dt4_err = dt8 - dt4; dt5_err = dt8 - dt5; run; /* Summarize errors */ proc means data=datetime_accuracy min max mean; var dt3_err dt4_err dt5_err; run;

This reveals storing a datetime with a length of 3 bytes has a maximum error of 524,282 seconds, which is 146 days: this is much less efficient than a date.

Storing a datetime with length 4 has a maximum error of 2047 seconds, or 0.57 hours. This is tolerable in many applications, and reduces storage size by half.

Storing a datetime with length 5 has a maximum error of 7 seconds.

### Notes

In some cases shrinking one variable (column) does not affect the size of the data set, but shrinking several columns always helps. It seems SAS allocates row size in chunks.

In some cases wise use of the LENGTH statement makes SAS data set compression anti-effective: it may actually increase the size of a data set. When you see a warning like “Compressing data set WORK.FOO increased size by 33.33 percent” in the log, turn off compression.

*This code was tested with Base SAS 9.3 on Windows 7.*

How did you determine the largest integer you could store in n bytes in SAS? I found this table in the SAS help files, under ‘LENGTH statement, saving storage space’:

Variable Length and Largest Exact Integer Length in Bytes

Significant Digits Retained Largest Integer Represented Exactly

2 2 256

3 4 65,536

4 7 16,777,216

5 9 4,294,967,296

6 12 1,099,511,627,776

7 14 281,474,946,710,656

8 16 72,057,594,037,927,936

This table is for z/OS (Mainframes), and I used a different table for Windows, which I think is the same as Unix.

So the conclusions in this article are not valid for Mainframes, but the same methods could be used to reach the proper conclusions.

Re previous comment – although that table lists the largest values stored exactly in n bytes, this does not imply that all values below this point are also stored without loss of precision. The limit in the article is correct – for n bytes, it is 2^((n-1)*8-3). I know SAS sets aside 1 byte for the sign and exponent, but I’m not sure what happens to the other 3 bits.