In R we’ll generate similar continuous distributions for two groups and give a brief overview of statistical tests and visualizations to compare the groups. Though the fake data are normally distributed, we use methods for various kinds of continuous distributions. I put this together while working with data from an odd distribution involving money where I am focusing on the binning method.
> # load the ggplot graphing package
> require(ggplot2)
Loading required package: ggplot2
>
> # generate fake data with group B having a higher outcome
> # the function rnorm() generates normally distributed random numbers
> set.seed(22136)
> df <- rbind(
+ data.frame(group='A', outcome=rnorm(n=200, mean=100, sd=20)),
+ data.frame(group='B', outcome=rnorm(n=200, mean=105, sd=20)))
>
> # inspect the fake data
> summary(df)
group outcome
A:200 Min. : 46.55
B:200 1st Qu.: 88.90
Median :101.88
Mean :102.01
3rd Qu.:116.65
Max. :160.17
>
> # generate five number summary and mean for each group
> by(df$outcome, df$group, summary)
df$group: A
Min. 1st Qu. Median Mean 3rd Qu. Max.
59.68 86.62 97.12 97.75 109.80 145.30
------------------------------------------------------------
df$group: B
Min. 1st Qu. Median Mean 3rd Qu. Max.
46.55 93.54 107.10 106.30 122.40 160.20
>
> # perform t-test for the difference of means
> t.test(outcome ~ group, data=df)
Welch Two Sample t-test
data: outcome by group
t = -4.4333, df = 387.358, p-value = 1.21e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.304221 -4.743673
sample estimates:
mean in group A mean in group B
97.74968 106.27363
>
> # perform two-sample Wilcoxon test
> wilcox.test(outcome ~ group, data=df)
Wilcoxon rank sum test with continuity correction
data: outcome by group
W = 14770, p-value = 6.09e-06
alternative hypothesis: true location shift is not equal to 0
>
> # draw a box plot
> ggplot(df, aes(group, outcome)) + geom_boxplot()

> # draw a histogram > ggplot(df, aes(outcome)) + geom_histogram(binwidth=10) + + facet_wrap(~group, nrow=2, ncol=1)

> # draw a density plot colored by group > ggplot(df, aes(outcome,color=group)) + geom_density()

> # draw a frequency plot colored by group > ggplot(df, aes(outcome,color=group)) + geom_freqpoly(binwidth=10)

> # bin/discretize (cut continuous variable into equally sized groups)
> (q<-quantile(df$outcome , seq(0, 1, .2)))
0% 20% 40% 60% 80% 100%
46.55005 86.24026 96.58131 107.14224 120.55994 160.16613
> df$outcome_bin <- cut(df$outcome, breaks=q, include.lowest=T)
> summary(df$outcome_bin)
[46.6,86.2] (86.2,96.6] (96.6,107] (107,121] (121,160]
80 80 80 80 80
>
> # tabulate the binned data
> (tab<-with(df, table(outcome_bin, group))) # counts
group
outcome_bin A B
[46.6,86.2] 47 33
(86.2,96.6] 50 30
(96.6,107] 43 37
(107,121] 36 44
(121,160] 24 56
> print(prop.table(tab,2)) # proportions
group
outcome_bin A B
[46.6,86.2] 0.235 0.165
(86.2,96.6] 0.250 0.150
(96.6,107] 0.215 0.185
(107,121] 0.180 0.220
(121,160] 0.120 0.280
>
> # perform chi square test on the binned groups
> print(chisq.test(tab))
Pearson's Chi-squared test
data: tab
X-squared = 21.5, df = 4, p-value = 0.000252
>
> # plot the binned group as a barchart with percentages of the relative frequencies
> # thanks to joran at http://stackoverflow.com/a/10888277/603799
>
> require(reshape) # for melt()
> df1 <- melt(ddply(df,.(group),function(x){prop.table(table(x$outcome_bin))}),id.vars = 1)
>
> require(scales) # for percent
> ggplot(df1, aes(x = variable,y = value)) +
+ facet_wrap(~group, nrow=2, ncol=1) +
+ scale_y_continuous(labels=percent) +
+ geom_bar(stat = "identity")

Tested with R 2.15.0 and ggplot2 0.9.1.