# Practical MD5 in SAS

This guide introduces MD5 and hash functions in general, lists common uses for hash functions, gives advise on how to best use MD5 in SAS, and covers common issues.

# 100% stacked bar chart in SAS’s SGPLOT

A 100% stacked bar chart is useful for comparing the relative frequencies of an m x n table where frequencies in m are very different. While this is easy to do in Excel, SAS requires an extra step, which you could call a hack or a trick. First, let’s create an example data set. Say…

# Comparing continuous distributions with R

In R we’ll generate similar continuous distributions for two groups and give a brief overview of statistical tests and visualizations to compare the groups. Though the fake data are normally distributed, we use methods for various kinds of continuous distributions. I put this together while working with data from an odd distribution involving money where…

# Checking return codes for errors in SAS

You should check for error return codes in any SAS programs that run unattended in batch jobs, so you can handle the exception properly. For example, if the data are invalid, you don’t want to generate reports or insert bad data into a database. Also it can save time to abort as soon as the…

# Using neural network for regression

Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth of neural networks to model complex, non-linear hypothesis is desirable for many real world problems—including…

# Confidence interval diagram in R

This code shows how to easily plot a beautiful confidence interval diagram in R. First, let’s input the raw data. We’ll be making two confidence intervals for two samples of 10. In case you curious, the data represents samples from a survey of how many minutes it takes to drive from home to school at…

# Paired sample t-test in R

Let’s walk through using R and Student’s t-test to compare paired sample data. The book Statistics: The Exploration & Analysis of Data (6th edition, p505) presents the longitudinal study “Bone mass is recovered from lactation to postweaning in adolescent mothers with low calcium intakes”. The total-body bone mineral content (TBBMC) of young mothers was measured…

# Basic line chart with ggplot2

ggplot2 is a package for R which easily draws plots that are easier on the eyes than R’s built-in plotting functions, though the grammar is different than what is commonly used in R. This code demonstrates how to prepare a data frame of basic math functions (logarithm, sine, etc.) and plot it with ggplot2. The…

# Delete non-exact duplicates in SAS

When deleting non-complete duplicates in SAS, in each duplicate set you may want to keep a particular record identified by a rule: it may be the oldest, newest, first, or last observation in each set. You need a identifier to be unique, but you can’t randomly choose which observation to keep. To be precise, you…

# Install Status.Net testing from Git on Fedora

Here is a quick guide to downloading the latest Status.Net code from Git and installing it on Fedora 14. This is to get the testing branch which contains all the fun new features, but it is also unsupported and may contain bugs. The instructions will work similarly on other types of Linux such as Red…