Popup notification from R on Windows

After R is done running a long data crunching process, you may need to notify the operator to check the R console and provide the next commands. Without installing any more software or creating any batch files or VBS scripts, here is a simple way to create the popup notice in Windows:

system('CMD /C "ECHO The R process has finished running && PAUSE"', 
   invisible=FALSE, wait=FALSE)

R will launch a command prompt window with the message (which can be customized, of course), and the notice stays open until manually closed.

This trick was adapted for R from Dave Webb on stackoverflow.com.

Tested with R 2.15.2 and R 3.0.0 on Windows 7.

Posted in Uncategorized | Tagged , , | Leave a comment

Date and datetime: accuracy vs storage and I/O costs (SAS)

How much storage space and I/O resources can be saved by shrinking the length of date and datetimes in SAS while maintaining sufficient precision? In my experience, saving I/O generally makes SAS run faster (such as in a DATA step) proportional to the I/O savings, so, for example, a 50% reduction in file size makes SAS run about 50% faster.

SAS stores dates and datetimes as numerics, exactly like integers and decimals: the only difference is the format applied. (In other words, SAS doesn’t have a unique data type for dates and datetimes.) Date are stored as days since the epoch January 1, 1960, and datetimes are stored as seconds since the same epoch.

Using the LENGTH statement, SAS can store numerics using 3 to 8 bytes, where 8 is the default. When a value exceeds the maximum integer size for the numeric storage, SAS will continue to store the value, but it loses precision.

Date

First, let’s see what storing a date with lengths 3 and 4 will allow, assuming no loss of precision is tolerable with dates:

data _null_;
	length date3 3;
	length date4 4;
	format date3 date4 yymmdd10.;
	date3 = 8192; /* the maximum integer size using 3 bytes of storage */
	date4 = 2097152; /* the maximum integer size using 4 bytes of storage */
	put date3= date4=;
run;

The SAS log indicates 3 bytes is enough for date 1982-06-06, and 4 bytes is enough for the date 7701-10-23. In other words, you can cut half the storage and processing time, while practically storing any date without loss of precision.

Datetime

Now to answer a similar question for datetimes, but here we assume some loss of precision is tolerable.

/* Returns an integer in the range min...max */
%macro rand_between(min, max);
	&min+ceil(rand('uniform')*(&max - &min ))
%mend;

/* Create random datetimes within the years 1980 and 2050 */
data datetime_accuracy;
	length dt3 3 dt4 4 dt5 5 dt8 8;
	format d yymmdd10.;
	do i = 1 to 100000;
		day_of_year = %rand_between(365);
		year = %rand_between(1980, 2050);
		d = mdy(1, 1, year) + day_of_year;
		h = %rand_between(0, 23);
		m = %rand_between(0, 59);
		s = %rand_between(0, 59);
		dt8 = dhms(d, h, m, s);
		dt5 = dt8;
		dt4 = dt8;
		dt3 = dt8;
		output;
	end;
run;

/* Measure errors */
data datetime_accuracy;
	set datetime_accuracy;
	dt3_err = dt8 - dt3;
	dt4_err = dt8 - dt4;
	dt5_err = dt8 - dt5;
run;

/* Summarize errors */
proc means data=datetime_accuracy min max mean;
	var dt3_err dt4_err dt5_err;
run;

This reveals storing a datetime with a length of 3 bytes has a maximum error of 524,282 seconds, which is 146 days: this is much less efficient than a date.

Storing a datetime with length 4 has a maximum error of 2047 seconds, or 0.57 hours. This is tolerable in many applications, and reduces storage size by half.

Storing a datetime with length 5 has a maximum error of 7 seconds.

Notes

In some cases shrinking one variable (column) does not affect the size of the data set, but shrinking several columns always helps. It seems SAS allocates row size in chunks.

In some cases wise use of the LENGTH statement makes SAS data set compression anti-effective: it may actually increase the size of a data set. When you see a warning like “Compressing data set WORK.FOO increased size by 33.33 percent” in the log, turn off compression.

This code was tested with Base SAS 9.3 on Windows 7.

Posted in SAS | Tagged , , | 3 Comments

Data about each month (SAS)

Do you remember this old old nursery rhyme?

30 days hath September,
April, June and November,
All the rest have 31,
Excepting February alone
(And that has 28 days clear,
With 29 in each leap year).

Ne neither, and I don’t need to because I use SAS code to generate information about each month, including the number of days in each month, using SAS’s powerful intnx and intck functions to do the hard work. It’s also a good example of using a DATA step to generate a data set without an input data set (i.e., no SET or MERGE statements).

data months;
	format first_day last_day yymmdd10.;
	do year = 2013 to 2020;
		do month = 1 to 12;
			first_day = mdy(month, 1, year);
			last_day = intnx('month', first_day, 0, 'e');
			days  = intck('day', first_day, last_day) + 1;
			weekdays  = intck('weekday', first_day, last_day) + 1;
			weeks  = intck('week', first_day, last_day);
			hours  = intck('dthour', dhms(first_day, 0, 0,0), dhms(last_day, 23, 59, 59)) + 1;
			work_hours = weekdays * 8;
			output;
		end; /* end month */
	end; /* end year */
	label
		first_day = 'First day'
		last_day = 'Last day'
		days = 'Days'
		weekdays = 'Week days'
		weeks = 'Weeks'
		hours = 'Hours'
		work_hours = 'Work hours'
		;
	drop month year;
run;

Here are the results:

First day Last day Days Week days Weeks Hours Work hours
1/1/2013 1/31/2013 31 23 4 744 184
2/1/2013 2/28/2013 28 20 4 672 160
3/1/2013 3/31/2013 31 21 5 744 168
4/1/2013 4/30/2013 30 22 4 720 176
5/1/2013 5/31/2013 31 23 4 744 184
6/1/2013 6/30/2013 30 21 5 720 168
7/1/2013 7/31/2013 31 23 4 744 184
8/1/2013 8/31/2013 31 22 4 744 176
9/1/2013 9/30/2013 30 22 4 720 176
10/1/2013 10/31/2013 31 23 4 744 184
11/1/2013 11/30/2013 30 21 4 720 168
12/1/2013 12/31/2013 31 23 4 744 184
1/1/2014 1/31/2014 31 23 4 744 184
2/1/2014 2/28/2014 28 21 4 672 168
3/1/2014 3/31/2014 31 22 5 744 176
4/1/2014 4/30/2014 30 22 4 720 176
5/1/2014 5/31/2014 31 22 4 744 176
6/1/2014 6/30/2014 30 22 4 720 176
7/1/2014 7/31/2014 31 23 4 744 184
8/1/2014 8/31/2014 31 21 5 744 168
9/1/2014 9/30/2014 30 22 4 720 176
10/1/2014 10/31/2014 31 23 4 744 184
11/1/2014 11/30/2014 30 21 5 720 168
12/1/2014 12/31/2014 31 23 4 744 184
1/1/2015 1/31/2015 31 22 4 744 176
2/1/2015 2/28/2015 28 21 3 672 168
3/1/2015 3/31/2015 31 23 4 744 184
4/1/2015 4/30/2015 30 22 4 720 176
5/1/2015 5/31/2015 31 21 5 744 168
6/1/2015 6/30/2015 30 22 4 720 176
7/1/2015 7/31/2015 31 23 4 744 184
8/1/2015 8/31/2015 31 22 5 744 176
9/1/2015 9/30/2015 30 22 4 720 176
10/1/2015 10/31/2015 31 22 4 744 176
11/1/2015 11/30/2015 30 22 4 720 176
12/1/2015 12/31/2015 31 23 4 744 184
1/1/2016 1/31/2016 31 21 5 744 168
2/1/2016 2/29/2016 29 21 4 696 168
3/1/2016 3/31/2016 31 23 4 744 184
4/1/2016 4/30/2016 30 21 4 720 168
5/1/2016 5/31/2016 31 23 4 744 184
6/1/2016 6/30/2016 30 22 4 720 176
7/1/2016 7/31/2016 31 21 5 744 168
8/1/2016 8/31/2016 31 23 4 744 184
9/1/2016 9/30/2016 30 22 4 720 176
10/1/2016 10/31/2016 31 22 5 744 176
11/1/2016 11/30/2016 30 22 4 720 176
12/1/2016 12/31/2016 31 22 4 744 176
1/1/2017 1/31/2017 31 23 4 744 184
2/1/2017 2/28/2017 28 20 4 672 160
3/1/2017 3/31/2017 31 23 4 744 184
4/1/2017 4/30/2017 30 21 5 720 168
5/1/2017 5/31/2017 31 23 4 744 184
6/1/2017 6/30/2017 30 22 4 720 176
7/1/2017 7/31/2017 31 22 5 744 176
8/1/2017 8/31/2017 31 23 4 744 184
9/1/2017 9/30/2017 30 21 4 720 168
10/1/2017 10/31/2017 31 23 4 744 184
11/1/2017 11/30/2017 30 22 4 720 176
12/1/2017 12/31/2017 31 21 5 744 168
1/1/2018 1/31/2018 31 23 4 744 184
2/1/2018 2/28/2018 28 20 4 672 160
3/1/2018 3/31/2018 31 22 4 744 176
4/1/2018 4/30/2018 30 22 4 720 176
5/1/2018 5/31/2018 31 23 4 744 184
6/1/2018 6/30/2018 30 21 4 720 168
7/1/2018 7/31/2018 31 23 4 744 184
8/1/2018 8/31/2018 31 23 4 744 184
9/1/2018 9/30/2018 30 21 5 720 168
10/1/2018 10/31/2018 31 23 4 744 184
11/1/2018 11/30/2018 30 22 4 720 176
12/1/2018 12/31/2018 31 22 5 744 176
1/1/2019 1/31/2019 31 23 4 744 184
2/1/2019 2/28/2019 28 20 4 672 160
3/1/2019 3/31/2019 31 21 5 744 168
4/1/2019 4/30/2019 30 22 4 720 176
5/1/2019 5/31/2019 31 23 4 744 184
6/1/2019 6/30/2019 30 21 5 720 168
7/1/2019 7/31/2019 31 23 4 744 184
8/1/2019 8/31/2019 31 22 4 744 176
9/1/2019 9/30/2019 30 22 4 720 176
10/1/2019 10/31/2019 31 23 4 744 184
11/1/2019 11/30/2019 30 21 4 720 168
12/1/2019 12/31/2019 31 23 4 744 184
1/1/2020 1/31/2020 31 23 4 744 184
2/1/2020 2/29/2020 29 21 4 696 168
3/1/2020 3/31/2020 31 23 4 744 184
4/1/2020 4/30/2020 30 22 4 720 176
5/1/2020 5/31/2020 31 21 5 744 168
6/1/2020 6/30/2020 30 22 4 720 176
7/1/2020 7/31/2020 31 23 4 744 184
8/1/2020 8/31/2020 31 22 5 744 176
9/1/2020 9/30/2020 30 22 4 720 176
10/1/2020 10/31/2020 31 22 4 744 176
11/1/2020 11/30/2020 30 22 4 720 176
12/1/2020 12/31/2020 31 23 4 744 184

Tested with SAS 9.3 on Windows 7.

Posted in Data set, SAS | Tagged , , | Leave a comment

Outlook: Trying to Connect

This is how I solved a problem of Microsoft Outlook 2010 “Trying to Connect” for several days.

  1. I verified I could connect to the Exchange Server from another machine using Outlook Web App.
  2. I verified network connectivity to the Microsoft Exchange server.
  3. I killed the OUTLOOK.EXE process and restarted Outlook.
  4. I stopped the Windows Search service and restarted Outlook. (So far, nothing helped.)
  5. I killed Microsoft Communicator 2007 R2 and restarted Outlook.
  6. </ol

    The last step, killing Communicator, helped and then Outlook connected and updated all folders.

    The reason I tried to stop Communicator is using Process Explorer I found Communicator was locking files in C:\Users\username\appdata\local\microsoft\outlook, which I was trying to rename in case there was a corrupt file there.

    Rebooting should have resolved the issue, but I couldn’t reboot because I needed another application to keep running. Searching on Google shows other people had other causes for the same “Trying to Connect” problem.

    System: Microsoft Windows 7, Microsoft Office 2010.

Posted in Bugs, Uncategorized | Tagged , , | Leave a comment

Dummy coding in SAS

Here is a macro to generate binary features (also called dummy coding) from a nominal variable (also called a categorical variable, such as eye color). The automation saves time and avoids mistakes when there are many possible values to a category or the values can change.

/* A macro for dumming coding */
%macro nominal_to_binary(
		sm_dataset=/* data set */, 
		sm_var= /* categorical variable */, 
		sm_prefix= /* prefix for dummy variables */);

/* Find the unique levels of the categorical variable */
proc sort data=&sm_dataset(keep=&sm_var) out=&sm_dataset._unique nodupkey;
	by &sm_var;
run;

data _null_;
	set &sm_dataset._unique end=end;
	/* Use CALL EXECUTE to dynamically create a macro that executes */
	/* after this DATA step finishes. The metaprogrammed macro */
	/* modifies the original data set. */
	if _N_ eq 1 then do;
		call execute("data &sm_dataset;");
		call execute("set &sm_dataset;");
		end;
	call execute(cat("length &sm_prefix", &sm_var," 3;")); /* use minimum storage */
	call execute(cats("&sm_prefix", &sm_var," = &sm_var = '", &sm_var,"';"));
	if end then call execute('run;');
run;

proc sql;
	/* Clean up */
	drop table &sm_dataset._unique;
quit;
%mend;

/* Generate a sample data set */
data nominal;
	infile datalines dsd;
	input hair$ eye$;
datalines;
brown,brown
brown,blue
red,blue
red,brown
;

/* Example invocation */
%nominal_to_binary(sm_dataset=nominal, sm_var=hair, sm_prefix=hair_);
%nominal_to_binary(sm_dataset=nominal, sm_var=eye, sm_prefix=eye_);

Here is the final data set:

binary features from categorical variable

This generates k binary variables for a categorical variable with k levels, so it is not suitable for linear regression unless you remove one of the binary feature. Really, in SAS you would pass the categorical variable directly to the regression procedure (REG, LOGISTIC, GLM) using a CLASS statement. I use this macro for machine learning, and I keep all k levels.

The excellent UCLA stats web site has an alternate dummy coding method using arrays.

This macro was tested on SAS 9.3 on Windows 7 64-bit.

Posted in SAS | Tagged , , , | Leave a comment

Using SQLObject with Microsoft SQL Server and Windows Authentication

Here is how to use Windows Authentication to connect to Microsoft SQL Server with SQLObject.

from sqlobject import connectionForURI

# Replace hostname and db with your values.  Notice SSPI in the URI.
connection_string= 'mssql://hostname/db?sspi=1' 

# connect
connection = connectionForURI(connection_string) 

# simple query
x=connection.queryAll('select 1 as foo') 

# show the results of query (first row)
x[0] 

The SQLObject documentation is frustratingly silent on the issue, so I had to discover the solution by reading the source. If the SSPI parameter is omitted, then SQLObject builds a connection string using SQL Server authentication. With the parameter, you get a SQL connection string like this:

Provider=SQLOLEDB;Data Source=hostname;Initial Catalog=db;Integrated Security=SSPI;Persist Security Info=False

By the way, if you are using Microsoft SQL Server Expression Edition, try adding ncli=1.

Also if you are getting an InterfaceError, make sure to install pywin32.

This was tested with Python 2.7.3, SQLObject 1.3.2, adodbapi 2.4.2.2, pywin32 build 218.

Posted in Uncategorized | Tagged , , , | Leave a comment

Easy sortable HTML tables in SAS ODS

Today Charlie Huang posted a SAS HTML table sorting demo, but without automation. Here is a fully automated implementation in SAS. The developer does not need to manually edit the HTML, and the user simply clicks any column and it sorts instantly client-side in the web browser. The feature is automatically applied to all SAS tables in the ODS HTML output.

I use a macro to avoid hard-coding URLs in all programs that sort tables. We keep ours macro in a macro library, so it’s easy to update to newer versions of jQuery and the tablesorter plugin.

/* The URLs are shortened because of */
/*   "Problem Note 37764: HEADTEXT values are limited to 256 characters" 
/*   <http://support.sas.com/kb/37/764.html> */
/* http://goo.gl/Pg0GB = https://ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js   */
/* http://goo.gl/ruKEb = http://cdn.jsdelivr.net/tablesorter/2.0.5b/jquery.tablesorter.min.js */

%macro ods_html_sort_table;
<script src='http://goo.gl/Pg0GB'></script>
<script src='http://goo.gl/ruKEb'></script>
<script>$(document).ready(function(){$('.table').tablesorter({widgets: ['zebra']});});</script>
%mend;

ods html 
 file="c:\temp\jquery_sort_test.html"
 /* Notice this style lets you add more HEADTEXT (maybe to the 256 characters) */
 headtext="%ods_html_sort_table";

/* A simple table for demonstration */
proc print data=sashelp.class;
run;

ods html text='Click on any column header to sort the table.';

/* PROC FREQ works */
proc freq data=sashelp.class;
	table sex * age/nopercent nocol norow;
run;
 
/* PROC TABULATE does not sort well */
proc tabulate data=sashelp.class;
    class sex;
    var height;
    table sex, height*mean;
run;

run;


ods html close;

This was tested with SAS 9.3 TS Level 1M1 on Windows 7 64-bit with Firefox 18 and Internet Explorer 8.

A further improvement would be to find a CDN to host a tablesorter theme and integrate it, so the user sees the arrows making the sorting feature easy to discover.

After writing this, I searched the web and found others had accomplished something similar. For example in 2006 Covington and Oden published “Sorting ODS HTML Tables by Column Headers Using Javascript”, but the jQuery method is much more compact and elegant.

Incidentally, I was recently experimenting with a similar marriage of jQuery and SAS: I manually edited SAS HTML output to get the Accordion jQuery UI plugin working to create a sort of interactive table of contents for long SAS HTML reports, but I haven’t yet automated it.

Posted in SAS | Tagged , , | Leave a comment