R is a free software environment for statistical computing and graphics

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

lightbulb_outline View ideas list


  • r-project
  • c
  • c++
  • fortran
  • javascript


email Mailing list
mail_outline Contact email

R project for statistical computing 2017 Projects

  • Zhehui Chen
    A stochastic variational inference framework for probabilistic modeling toolbox in R
    Stochastic variational inference is a powerful tool for analyzing probabilistic models, especially for large scale problem. In this project, our goal...
  • Jason Ge
    Active Set Based Second-order Algorithm for Sparse Learning
    For sparse learning problems, such as sparse generalized linear models and sparse undirected graphical model estimation, the current R packages still...
  • Chindhanai Uthaisaad
    The project plan represents a very significant step forward for the factorAnalytics package by adding advanced methods to the fundamental factor...
  • Faizan Khan
    Animated Interactive Plots (animint)
    animint package in R allows animated data visualization which is a useful tool for obtaining an intuitive understanding of patterns in multivariate...
  • Luis Antonio Damiano
    Bayesian Hierarchical Hidden Markov Models applied to financial time series.
    The goal of this project is to replicate research in Hierarchical Hidden Markov Models (HHMM) applied to financial data. This model is a...
  • Lindsay Rutter
    bigPint: Big multivariate data plotted interactively
    Parallel coordinate plots, scatterplot matrices, and replicate line plots are useful visual tools to understand the relationship between variables in...
  • Ashwin Agrawal
    Biodiversity Data Cleaning
    Data cleaning is a process used to determine inaccurate, incomplete, or unreasonable data and then improving the quality through correction of...
  • Shubham-Chaturvedi
    Constrained Hierarchical Agglomerative Clustering
    Constrained HAC is useful in various application fields like ecology and bioinformatics.This project aims to build an efficient constrained HAC...
    Control Systems Toolbox
    This project proposes to develop a control-systems package for R. For many years, R has been used extensively for several data-related tasks. With...
  • Alexandre Almeida
    Distributional Assessments with Q-Q Plots
    Quantile-quantile plots (Q-Q plots) are a powerful way of visually diagnosing distributional assumptions of random variables. Q-Q plots have been...
  • Leah South
    Efficient SMC Algorithms in Rcpp
    Sequential Monte Carlo (SMC) methods are powerful alternatives to standard Markov chain Monte Carlo (MCMC) for sampling from the posterior of complex...
  • Matthew Piekenbrock
    Estimating the Empirical Cluster Tree
    The aim of this project is to provide a standalone, scalable, and extensible R package that unifies existing methodologies for estimating the...
  • Robin Kohze
    FireData: Connecting R to Firebase
    R is one of the strongest players in data science. The aim is to connect its strength in data analysis with the actual data. By making it easier to...
  • Xia Zhang
    Graphical Models for Mixed Multi Modal Data
    In this project, we propose a new package to make graphical models for mixed multi-modal data readily available to a wide audience. The proposed...
  • cdries
    Improved functionality for higher order comoment estimation in PerformanceAnalytics
    In this project I aim to improve estimation of the higher order comoment matrices currently implemented in the R packages PerformanceAnalytics and...
  • lwei
    Integrated Oversampling for Time Series Classification
    A significant number of learning problems involve the accurate classification of rare events or outliers from time series data. For example, the...
  • Thiloshon Nagarajah
    Integrating biodiversity data curation functionality
    The importance of data in the biodiversity research has been repeatedly stressed in the recent times and various organizations have come together and...
  • Jialin Ma
    Interactive Genome Browser in R
    The project intends to provide an interactive and user-friendly way to visualizing track-based genomic data by wrapping the flexible TnT javascript...
  • Balázs Dukai
    Interactive trajectory tool for rpostgisLT
    The goal of the project is to build an interactive trajectory analysis extension for the R package rpostgisLT, that was developed during the GSoC...
  • Leopoldo Catania
    Markov Switching GARCH models (MSGARCH) in R
    Modeling the volatility of financial markets is central in risk management. A seminal contribution in this field was the development of the GARCH...
  • Vandit Jain
    Markovchain package
    This project aims to extend the current functionality and capabilities of the R package ‘markovchain’ in order to provide statisticians a more...
  • Natalia da Silva
    metawRite:Meta analysis update package, LSR (Living systematic review)
    Living systematic reviews have been proposed as a new approach to deal with the main problem of traditional systematic reviews. Systematic reviews...
  • Wazeer Zulfikar
    Native R API for Tensorflow
    R does not have a high-level modeling language for designing neural networks. Tensorflow, an open source python library, is a great tool for this...
  • Coin Lewis-Beck
    NIMBLE Ecology Package
    The goal of this project is to build a new R package providing high-level user interfaces to many kinds of ecological models and implementing the...
  • Lorenz Walthert
    Noninvasive source code formatting
    A coherent coding style greatly simplifies collaborative work. This is easiest enforced by an automatic code formatter, but existing solutions to...
  • mb706
    Operator Based Machine Learning Pipeline Construction
    The package mlr is a comprehensive machine learning toolkit for R, providing a standardized interface to over sixty machine learning R packages, in...
  • Qingyue Xu
    Parser and Crawler for Biodiversity checklists
    Compiling taxonomic checklists from varied sources of data is a common task that biodiversity informaticians encounter. Data for checklists usually...
  • Pushpak Sarkar
    Portfolio Construction and Risk Management with Unequal Returns Histories
    The goal of this project is to implement the three methods in an “Unequal Histories” package that: (1) facilitates use of the methods in portfolio...
  • Xin Chen
    Risk and Performance Measure Standard Errors for Serially Correlated Returns
    This project is focused on developing a Risk/Performance Standard Errors (RPSE) package that implements a new methodology based on statistical...
  • Samuel Borms
    Sentometrics: An integrated framework for text based multivariate time series modeling and forecasting
    This project leads to the creation of the Sentometrics package that is designed to do time series analysis based on textual sentiment. Time series...
  • Binxiang Ni
    Sparse matrix automatic conversion in RcppArmadillo
    This project is aimed to complete the integration between R Matrix package and Armadillo Package. During this project, I am going to do such things: ...
  • Rover Van
    Speed optimizations for iregnet
    The iregnet package is the first R package to support four types of censoring and elastic net (L1 + L2) regularization. Though it is already useful...
  • Earo Wang
    Tidy data structures and visual methods to support exploration of big temporal-context data
    This new package aims to fit into the tidyverse and grammar of graphics suite to support and facilitate temporal-context data analysis and...