The *fastverse* is a suite of complementary high-performance packages for statistical computing and data manipulation in R. Developed independently by various people, *fastverse* packages jointly contribute to the objectives of:

- Speeding up R through heavy use of compiled code (C, C++, Fortran)
- Enabling more complex statistical and data manipulation operations in R
- Reducing the number of dependencies required for advanced computing in R

The `fastverse`

package integrates, and provides utilities for easy installation, loading and management of these packages. It is an extensible framework that allows users to (permanently) add or remove packages to create a ‘verse’ of packages suiting their general needs. Separate ‘verses’ can also be created.

*fastverse* packages are jointly attached with `library(fastverse)`

, and several functions starting with `fastverse_`

help manage dependencies, detect namespace conflicts, add/remove packages from the *fastverse* and update packages.

The *fastverse* consists of 6 core packages (7 dependencies in total) which provide broad C/C++ based statistical and data manipulation functionality and have carefully managed APIs. These packages are installed and attached along with the `fastverse`

package.

**data.table**: Enhanced data frame class with concise data manipulation framework offering powerful aggregation, extremely flexible split-apply-combine computing, reshaping, joins, rolling statistics, set operations on tables, fast csv read/write, and various utilities such as transposition of data.**collapse**: Fast grouped & weighted statistical computations, time series and panel data transformations, list-processing, data manipulation functions, summary statistics and various utilities such as support for variable labels. Class-agnostic framework designed to work with vectors, matrices, data frames, lists and related classes including*xts*,*data.table*,*tibble*,*pdata.frame*,*sf*.**matrixStats**: Efficient row-and column-wise (weighted) statistics on matrices and vectors, including computations on subsets of rows and columns.**kit**: Fast vectorized and nested switches, some parallel (row-wise) statistics, and some utilities such as efficient partial sorting and unique values.**magrittr**: Efficient pipe operators for enhanced programming and code unnesting.**fst**: A compressed data file format that is very fast to read and write. Full random access in both rows and columns allows reading subsets from a ‘.fst’ file.*Additional dependency*: Package*Rcpp*is imported by*collapse*and*fst*.

Currently, there are 2 different versions of the *fastverse* on CRAN and GitHub. The GitHub version is recommended if you want to have *matrixStats* consistently preserve attributes of your matrices: it modifies functions in the *matrixStats* namespace making them preserve attributes consistently (and by default) whenever the *fastverse* is attached. This version was rejected by CRAN because it requires a call to `unlockBinding`

. The CRAN version takes *matrixStats* as it is, which means most functions do not preserve attributes such as dimension names in computations.

```
# Install the CRAN version
install.packages("fastverse")
# Install the GitHub version (Requires Rtools)
remotes::install_github("SebKrantz/fastverse")
```

*Note* that the GitHub version is not a development version, development takes place in the ‘development’ branch. *matrixStats* is slowly evolving towards greater consistency, but it might take more than half a year until dimension names are handled consistently by default - due to the large number of reverse dependencies. Until then CRAN and GitHub versions of the *fastverse* are released together.

In addition, users have the option (via the `fastverse_entend()`

function) to freely attach extension packages offering more specific functionality. The *fastverse* can by extended by any R package, either just for the current session or permanently:

In addition to a global customization, separate *fastverse*’s can be created for projects by adding a `.fastverse`

config file in the project directory and listing packages there. Only these packages will then be loaded and managed with `library(fastverse)`

in the project.

High-performing packages for different data manipulation and statistical computing topics are suggested below. Each topic has a 2-character topic-id, which can be used to quickly attach all available packages with `fastvere_extend(topcis = c(..id's..))`

, and to install missing packages by adding argument `install = TRUE`

. The majority of these packages provide compiled code and have few dependencies. The total (recursive) dependency count is indicated for each package.

**xts**and**zoo**: Fast and reliable matrix-based time series classes providing fully identified ordered observations and various utilities for plotting and computations (1 dependency).**roll**: Very fast rolling and expanding window functions for vectors and matrices (3 dependencies).*Notes*:*xts*/*zoo*objects are preserved by*roll*functions and by*collapse*’s time series and data transformation functions^{1}. As*xts*/*zoo*objects are matrices, all*matrixStats*functions apply to them as well.*xts*objects can also easily be converted to and from*data.table*.

**lubridate**: Facilitates ‘POSIX-’ and ‘Date’ based computations (2 dependencies).**anytime**: Anything to ‘POSIXct’ or ‘Date’ converter (2 dependencies).**fasttime**: Fast parsing of strings to ‘POSIXct’ (0 dependencies).**nanotime**: Provides a coherent set of temporal types and functions with nanosecond precision -

based on the ‘integer64’ class (7 dependencies).**clock**: Comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) (6 dependencies).**timechange**: Efficient manipulation of date-times accounting for time zones and daylight saving times (1 dependency).*Notes*: Date and time variables are preserved in many*data.table*and*collapse*operations.*data.table*additionally offers an efficient integer based date class ‘IDate’ with some supporting functionality.*xts*and*zoo*also provide various functions to transform dates, and*zoo*provides classes ‘yearmon’ and ‘yearqtr’ for convenient computation with monthly and quarterly data. Package*mondate*also provides a class ‘mondate’ for monthly data.

**stringi**: Main R package for fast, correct, consistent, and convenient string/text manipulation (backend to*stringr*and*snakecase*) (0 dependencies).**stringr**: Simple, consistent wrappers for common string operations, based on*stringi*(3 dependencies).**snakecase**: Convert strings into any case, based on*stringi*and*stringr*(4 dependencies).**stringfish**: Fast computation of common (base R) string operations using the ALTREP system (2 dependencies).**stringdist**: Fast computation of string distance metrics, matrices, and fuzzy matching (0 dependencies).

**Rfast**and**Rfast2**: Heterogeneous sets of fast functions for statistics, estimation and data manipulation operating on vectors and matrices. Missing values and object attributes are not (consistently) supported (4-5 dependencies).**parallelDist**: Multi-threaded distance matrix computation (3 dependencies).**coop**: Fast implementations of the covariance, correlation, and cosine similarity (0 dependencies).**rsparse**: Implements many algorithms for statistical learning on sparse matrices - matrix factorizations, matrix completion, elastic net regressions, factorization machines (8 dependencies). See also package**MatrixExtra**.**rrapply**: The`rrapply()`

function extends base`rapply()`

by including a condition or predicate function for the application of functions and diverse options to prune or aggregate the result (0 dependencies).*Notes*:*Rfast*has a number of like-named functions to*matrixStats*. These are simpler but typically faster and support multi-threading. Some highly efficient statistical functions can also be found scattered across various other packages, notable to mention here are*Hmisc*(60 dependencies) and*DescTools*(17 dependencies).*fastDummies*(16 dependencies) implements creation of dummy (binary) variables.

**sf**: Leading framework for geospatial computing and manipulation in R, offering a simple and flexible spatial data frame and supporting functionality (13 dependencies).**stars**: Spatiotemporal data (raster and vector) in the form of dense arrays, with space and time being array dimensions (17 dependencies).**terra**: Methods for spatial data analysis with raster and vector data. Processing of very large (out of memory) files is supported (4 dependencies).*Notes*:*collapse*can be used for efficient manipulation and computations on*sf*data frames.*sf*also offers tight integration with*dplyr*.

**dygraphs**: Interface to ‘Dygraphs’ interactive time series charting library (11 dependencies).**lattice**: Trellis graphics for R (0 dependencies).**grid**: The grid graphics package (0 dependencies).**ggplot2**: Create elegant data visualizations using the Grammar of Graphics (30 dependencies).**scales**: Scale functions for visualizations (10 dependencies).*Notes:**latticeExtra*provides extra graphical utilities base on*lattice*.*gridExtra*provides miscellaneous functions for*grid*graphics (and consequently for*ggplot2*which is based on*grid*).*gridtext*provides improved text rendering support for*grid*graphics. Many packages offer*ggplot2*extensions, (typically starting with ‘gg’) such as*ggExtra*,*ggalt*,*ggforce*,*ggmap*,*ggtext*,*ggthemes*,*ggrepel*,*ggridges*,*ggfortify*,*ggstatsplot*,*ggeffects*,*ggsignif*,*GGally*,*ggcorrplot*,*ggdendro*, etc…

**tidytable**: A tidy interface to*data.table*that is*rlang*compatible. Quite comprehensive implementation of*dplyr*,*tidyr*and*purr*functions.*tidyverse*function names are appended with a`.`

e.g.`mutate.()`

. Package uses a class*tidytable*that inherits from*data.table*. The`dt()`

function makes*data.table*syntax pipeable (14 total dependencies).**tidyfast**: Fast tidying of data. Covers*tidyr*functionality,`dt_`

prefix, preserves*data.table*object. Some unnecessary deep copies (2 dependencies).**tidyfst**: Tidy verbs for fast data manipulation. Covers*dplyr*and some*tidyr*functionality. Functions have`_dt`

suffix and preserve*data.table*object. A cheatsheet is provided (7 dependencies).**tidyft**: Tidy verbs for fast data operations by reference. Best for big data manipulation on out of memory data using facilities provided by*fst*(7 dependencies).**maditr**: Fast data aggregation, modification, and filtering with pipes and*data.table*. Minimal implementation with functions`let()`

and`take()`

for most common data manipulation tasks. Also provides Excel-like lookup functions (2 dependencies).*Notes*: One could also mention Rstudio’s*dtplyr*and the*table.express*package here, but these packages import*dplyr*and thus have a around 20 dependencies.

**qs**provides a lightning-fast and complete replacement for the`saveRDS`

and`readRDS`

functions in R. It supports general R objects with attributes and references - at similar speeds to*fst*- but does not provide on-disk random access to data subsets like*fst*(4 dependencies).**arrow**provides both a low-level interface to the Apache Arrow C++ library (a multi-language toolbox for accelerated data interchange and in-memory processing) and some higher-level, R-flavored tools for working with it - including fast reading / writing delimited files and sharing data between R and Python (12 dependencies).*Notes*: Package*vroom*offers fast reading and writing of delimited files, but with 24 dependencies is not really a*fastverse*candidate.

Feel free to notify me of any other packages you think should be included here. Such packages should be well designed, top-performing, low-dependency, and, with few exceptions, provide own compiled code. Please note that the *fastverse* focuses on general purpose statistical computing and data manipulation, thus I won’t include fast packages to estimate specific kinds of models here (of which R also has a great many).

*collapse*functions can also handle irregular time series, but this requires passing an integer time variable to the`t`

argument which has consecutive integer steps for regular parts of the time series and non-consecutive integers for the irregular parts.↩︎