I. Installing R II. Learning R III. Extending R

Installing R

The Comprehensive R Archive Network (CRAN) site provides Windows, OS X, and Linux binary installers for the reference implementation of R. Ubuntu users can also follow these instructions to install R using the apt package management system.

An optimized, parallel version of R is available from Revolution Analytics. UT students can obtain a free academic subscription by registering with their "@utexas.edu" email address. Warning: The Revolution R distribution may be one or more revisions behind the CRAN distribution, which may cause problems when installing certain extension pacakges.

RStudio is a free visual interface and integrated development environment (IDE) for R. StatET is a plug-in extension for developing R scripts using the Eclipse IDE platform.

Rattle provides a nice visual interface to various data mining packages in R. You can install this package by starting R and running the following the commands (the ">" indicates the R prompt):

> install.packages("rattle", dep=c("Suggests"))
> library('rattle')
> rattle()

Learning R

Update (13 Feb. 2013): A demo R script for basic classifiers has been added.

Update (29 Jan. 2013): The R script used for the in-class RStudio demo is now available.

Multiple resources for learning R are available on the web. A suggested list (from this CMU data analysis course) includes:

Even if you have some experience coding, you should read this page providing minimal advice on programming.

There are also several books on R and statistical computing, including:


Extending R

Many (if not all) of the algorithms covered in this introductory course have already been implemented in various add-on packages that are available from CRAN. Related packages have been gathered into various task views, including one for machine learning and one for graphics.

As a general rule, you should try to find an R package that implements an algorithm before implementing it yourself. Example code for using R packages to solve machine learning/data mining problems can be found in various places, including the "Data Mining Algorithms in R" Wikibook.

Bioconductor is another large R package repository. Most of these packages are specifically for bioinfomatics analyses, but several might be useful in more general contexts.