R is a free software environment for statistical computing and graphics

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

lightbulb_outline View ideas list


  • r-project
  • c
  • c++
  • fortran
  • javascript


email Mailing list
mail_outline Contact email

R project for statistical computing 2019 Projects

  • Gregory S Brownson
    A New Package for Empirical Asset Pricing Research, or EAPR
    A major effort in empirical asset pricing research is the initial stage of gathering the data, cleaning and filtering it, and then formatting it in a...
  • Shannon Sequeira
    Add subsampling to aster models
    We add support for sampling arrows to aster models using the theory of curved exponential families.
  • Shawn Feng
    Add Support for Extra Optimization Solvers to PortfolioAnalytics
    PortfolioAnalytics is a popular R package designed to provide optimized solution and visualizations for portfolio allocating problems with complex...
  • Ziheng Zhou
    Adding plotting engine into PerformanceAnalytics package
    Adding plotting engine into PerformanceAnalytics package.
  • Daniel Xia
    An R package for two new skew-t distributions
    We will develop an R package for two families of skew-t distributions that have different tail behavior for left and right tails, namely the family...
  • Oliver Ford
    cpVis: Interactive visualization for change point exploration and labeling
    A changepoint is typically defined as a point in time where the distribution of a data-stream changes in a distinct manner, for example, typically...
  • Ben Ubah
    Data-Driven Exploration of the R Community
    This project proposes to build an infrastructure that helps the R community explore R user groups, R-Ladies groups and past R-GSoC projects using a...
  • Rahul Chauhan
    Enhancing Visualizations for Biodiversity Data
    We plan to incorporate into bdvis two state-of-the-art elements: interactive plotting and dashboards. We plan to develop and test an interface that...
  • Fahrozi Fahrozi
    Exploring Election and Census Highly Informative Data Nationally for Indonesia ( Eechidna R package)
    In Indonesia, elections are highly anticipated, because it provides an opportunity for all people to influence the direction of their country. The...
  • Vito Lestingi
    Financial Transactions Analytics in blotter
    The Transaction Cost Analysis (TCA) of an investment program is a critical framework to pursue its best execution, as costs minimization is a...
  • Sayani Gupta
    gravitas: Exploring probability distributions for bivariate temporal granularities
    gravitas aims to provide methods to operate on time in an automated way, to deconstruct it in many different ways. Deconstructions of time that...
  • Marlon E. Cobos
    Grinnellian ecological niches and ellipsoids in R
    Distributional ecology is a growing field of science dedicated to characterize species distributions based on their ecological niches. Based on early...
  • Povilas Gibas
    Implementing biodiversity data checks for the bdchecks package
    Background bdchecks is an infrastructure for performing, filtering and managing various biodiversity data checks using R. Data checks are a key to...
  • Onno Kleen
    Improving the R package highfrequency
    The R package highfrequency is the go-to package for intraday financial analysis in R. In the project, I will enhance its functionalities, rework...
  • Aditya Samantaray
    Iregnet is the first R package to support general interval output data (no censoring as well as left, right and interval censored data) and elastic...
  • Luofeng Liao
    MoMA - Modern Multivariate Analysis in R
    Multivariate Analysis techniques are indispensable in the era of Big Data. However, a unified and user-friendly framework has been lacking to date....
  • Akshaj Verma
    Neural Network Package Validation 1
    The purpose of this project is to verify the convergence of the training algorithms provided in 69 Neural Network R packages available on CRAN to...
  • Salsabila Mahdi
    Neural Network Package Validation 2
    The purpose of this project is to verify the convergence of the training algorithms provided in 69 Neural Network R packages available on CRAN to...
  • Anuraag Srivastava
    Optimal partitioning algorithm for changepoint detection
    There are several applications where we need to work with ordered data (e.g. Time-series). This includes financial data, climate data, radio signals,...
  • Yawei Ge
    Parallel Coordinate Plots in ggplot2
    We plan to create a package for parallel coordinate plots using ggplot2 based on the existing methods. We want it to make use of larger flexibility...
  • Yujia Xie
    PRIMAL: An R Package for Linear Programming-based Sparse Learning Methods in High Dimensions
    Linear Programming (LP) based sparse learning methods, such as the Dantzig selector (for linear regression), sparse quantile regression, sparse...
  • wenyu yang
    Project Proposal Translator from ggplot2 to Vega Lite
    Project Abstract About me Mentors Information Coding Plan and Methods Commitments Timeline
  • Juan Cruz Rodriguez
    R Code Optimizer
    R is slow compared to other popular languages. “The R interpreter is not fast and execution of large amounts of R code can be unacceptably slow”....
  • Panagiotis Repouskos
    Sampling Methods for Convex Optimization
    Extend VolEsti (a c++ library with an R interface) by implementing randomized algorithms for convex optimization. First, there is a need to implement...
  • AndrewC1998
    Second Order Structure in the Changepoint Package
    Detecting changes in statistical properties of a time series is important in a large number of fields. A large amount of research has taken place...
  • Andres Algaba
    The transformation of textual data into time series variables, and their subsequent use in an econometric analysis is an important and emerging...
  • Qincheng Lu
    sgdnet: efficient regularized GLMs for big data
    There is not yet any way in R to fully leverage the power of stochastic gradient algorithms for fitting generalized linear models (GLM). The...
  • Apostolos Chalkis
    State-of-the-art geometric random walks in R
    Sampling algorithms and volume computation of convex polytopes are very useful in many scientific fields and applications. The package volesti is a...
  • Ye
    Tree-regularized convolutional Neural Network (tCNN) for microbiome-based prediction
    One important characteristic for microbiome data is the number of microbiome is far larger than small sample size (n>>p), resulting in a...
  • Avinash Barnwal-1
    xgboost loss functions
    The project requires implementing 2 new objective loss functions - one for survival loss and another is for binomial loss. Survival loss includes...