I. Installing R | II. Learning R | III. Extending R |
The Comprehensive R Archive Network (CRAN) site provides Windows, OS X, and Linux binary installers for the reference implementation of R. Ubuntu users can also follow these instructions to install R using the apt package management system.
An optimized, parallel version of R is available from Revolution Analytics. UT students can obtain a free academic subscription by registering with their "@utexas.edu" email address. Warning: The Revolution R distribution may be one or more revisions behind the CRAN distribution, which may cause problems when installing certain extension pacakges.
RStudio is a free visual interface and integrated development environment (IDE) for R. StatET is a plug-in extension for developing R scripts using the Eclipse IDE platform.
Rattle provides a nice visual interface to various data mining packages in R. You can install this package by starting R and running the following the commands (the ">" indicates the R prompt):
> install.packages("rattle", dep=c("Suggests")) > library('rattle') > rattle()
Update (13 Feb. 2013): A demo R script for basic classifiers has been added.
Update (29 Jan. 2013): The R script used for the in-class RStudio demo is now available.
Multiple resources for learning R are available on the web. A suggested list (from this CMU data analysis course) includes:
Even if you have some experience coding, you should read this page providing minimal advice on programming.
There are also several books on R and statistical computing, including:
Many (if not all) of the algorithms covered in this introductory course have already been implemented in various add-on packages that are available from CRAN. Related packages have been gathered into various task views, including one for machine learning and one for graphics.
As a general rule, you should try to find an R package that implements an algorithm before implementing it yourself. Example code for using R packages to solve machine learning/data mining problems can be found in various places, including the "Data Mining Algorithms in R" Wikibook.
Bioconductor is another large R package repository. Most of these packages are specifically for bioinfomatics analyses, but several might be useful in more general contexts.