3 Accelerate R Packages with Fortran
The chapter focuses on wrapping the routines developed in the previous chapter in the R package.
3.1 Introduction
A package in R is a bundle of R code, data, and documentation designed to perform a specific task or a set of tasks (Wickham & Bryan, 2023). Packages are the fundamental units of reproducible R code. Packaging has many benefits:
- It produces packages that are easily downloaded and used.
- It forces a tidy code and work process.
- It enables the re-usage of code from and for other projects.
- Files rapidly multiply when using foreign languages such as C, C++, or Fortran; thus, packaging makes it easier to maintain and manage dependencies and ports.
Many tools are available to facilitate developing R packages, such as devtools
, usethis
, and testthat
. Additionally, RStudio operates as an invaluable companion. To start the development of an R package within RStudio, navigate to File > New Project > New Directory > R Package, and then proceed to create the project. Alternatively, you can use the devtools::create("/path/to/package/location/")
or usethis::create_package("/path/to/package/location/")
functions.
By employing usethis::use_c()
, you can easily incorporate the necessary infrastructure to utilise compiled code. It’s vital to ensure that your package is properly licensed. Various license templates are available through usethis
, including the Creative Commons (e.g., CC BY 4.0), which can be implemented via usethis::use_ccby_license()
.
Figure 3.1 shows a skeleton of a typical R package called foo
. R code is placed in the R/
directory, while the compiled code resides in the src/
directory. R provides a standardised method of documenting packages. The DESCRIPTION
is generated with the package template and contains essential documentation, including the package’s title, what the package does, the author, and dependencies. Additionally, the SystemRequirements
is important in case the package depends on libraries or tools external to R, e.g., FFTW or GNU make.
The R documentation files .Rd
are stored within the man/
folder. When employing devtools
for package development, the .Rd
files are automatically modified when the specially formatted “roxgen comments” above R source codes are modified (Wickham & Bryan, 2023). Thus, it is important to run devtools::document()
whenever a function is completed. The NAMESPACE
file is a fundamental component of a package. It delivers a context for looking up the value of an object associated with a name (Wickham & Bryan, 2023). Typically, it is modified automatically by roygen2
and contains routines such as export
to make the package functions visible, import
to import all objects from another package’s namespace or importForm
to import selected objects, and useDynLib()
for packages with compiled code to register routines from DLL.
The second edition of the R Packages book by Hadley Wickham and Jennifer Bryan, particularly the chapter “The Whole Game”, offers essential insights and presents a comprehensive overview of creating an R package.
3.2 Developing AquaFortR
In this section, we will wrap the routines developed in Chapter 2 within an R package. For simplicity, we will focus on implementing 2D cross-correlation here. In the package ecosystem, it is advantageous to avoid the .Fortran
interface. The interface is mainly tailored for Fortran 77, predates any support for ‘modern’ Fortran, and carries a significant overhead. Therefore, we will use .Call
the modern interface to C/C++.
The development consists mainly of three stages:
- Writing the Fortran subroutine
- Crafting C functions to interact between R and Fortran
- And finally, creating an R function to call the C function.
3.2.1 Fortran Subroutine
The best approach to organising Fortran code with an R package is to use modules, storing each module in individual files. The intrinsic module iso_c_binding
ensures that precisely the same variable type and kind is used in C and Fortran.
Listing 3.1 shows the structure of the Fortran file AquaFortRmodulef.f90
. Fortran files can multiply rapidly, and thus, managing module dependencies can be a tedious task. Numbering the files and writing the routines sequentially, such as AquaFortRmodulesf001.f90
to AquaFortRmodulesf00n.f90
, can serve as a practical workaround.
Listing 3.2 shows the subroutine xcorr2d_f
. The differences in comparison to the script version in Listing 2.2 are:
- The kind of the variable is defined according to
iso_c_binding
. Integers are of thec_int
kind, while double precision variables are defined as real with thec_double
kind. - Typically, variables are passed to Fortran (C) by reference (value). The
value
attribute allows passing by value to Fortran. - Using
bind()
facilitates the interoperability of Fortran procedures with C. The binding label is the name by which the C processer recognises the Fortran procedure. TheF77_
prefixes in C add a trailing underscore.
3.2.2 C Functions
Now, we write C functions that communicate between Fortran and R. The R API has many entry points for the C code. To use the public stable API, the header files in Listing 3.3 should be included in the AquaFortRmodulec.c
. Additionally, stdlib.h
is included for NULL
. It is recommended to use R_NO_REMAP
so all API functions have the prefix R_
or Rf_
.
F77_NAME()
is used to declare the Fortran subroutine in C. Note that variables with value attribute should not be pointers, as shown in Listing 3.4.
In Listing 3.5, the c_xcorr2d_f
function communicates between R and C, and invoke the declared Fortran subroutine in C. Everything that moves between R and C should be SEXP
. Additional variables required by the Fortran subroutine, such as matrix dimensions, are created within the C code. The F77_CALL()
macro is to call the Fortran routine, declared by F77_NAME()
, from C. The PROTECT
and UNPROTECT
macros are required to save the result, i.e., ret
, from being destroyed during R’s garbage collection. Variables of type double
need to be altered to real
before being passed to Fortran.
Registration of native routines for compiled code is achieved in two stages: first, creating an array describing individual routines using R_CallMethodDef
; then, actually registering the routines with R using R_registerRoutines
, as shown in Listing 3.6. Registration’s benefits include a faster way to find the address of the entry point and a run-time check concerning the number and type of arguments. The R_useDynamicSymbols
routine instructs the .Call
function to find only registered routines, even when no package is provided, preventing unnecessary delay due to searching.
For further information, please refer to Writing R Extension: 5.4 Registering native routines.
All the C code in Section 3.2.2 is stored within the AquaFortRmodulec.c
file.
3.2.3 R Function
After finishing the Fortran subroutine and C functions, we can write the R function, as shown in Listing 3.7. The first few lines contain the roxgen2 documentation for the function. We use the .Call
function to invoke the C function c_xcorr2d_f
that will invoke a Fortran subroutine xcorr2d_f
.
It is crucial to add the following line to the package-level documentation (i.e.AquaFortR-package.R
file) to load the DLL and define them in the package’s namespace.
Code
#' @useDynLib AquaFortR, .registration=TRUE
This is the end of the general workflow for developing an R package with Fortran code using the .Call
interface. In the src/
directory, Fortran modules reside in .f90 files, while C routines are found in .c files. Some packages store their C code in an init.c file. All R code should be kept in the R/
directory.
3.2.4 System Dependencies
Occasionally, you develop a package that needs system-dependent configuration or libraries before installation, such as FFTW. An executable is then executed by R CMD INSTALL
before any other action is performed to ensure the successful installation of the package. R allows /configure
scripts to check for these dependencies. The /configure
script can be written manually or using Autoconf.
Aaron Jacobs wrote about “An Autoconf Primer for R Package Authors.” It is a vital tutorial on developing a /configure
script for an R package. Furthermore, detailed information concerning configuration and cleanup is available in Writing R extensions, 1.2 Configure-and-cleanup.
In case the configure script requires auxiliary files, it is advised that they should be shipped with the R package in the tools directory. Numerous macros are available at the GNU Autoconf Archive.