How to make an R Package

Created: 2019-11-05 ; Updated: 2019-11-05

Package Authoring

R is great for interactive data exploration, but professional software developers know that there comes a time when “playing around” takes more time than “doing it right”. In the long run, you’re almost always better off investing in good software engineering techniques right from the beginning.

In R, that means writing packages. The equivalent of a Python module, these are self-contained programs that include everything needed to commercially productize something: documentation, tests, versioning. The place to start is the R Packages (2nd Edition) site. I won’t repeat the information there, but will just summarize some of the issues I ran into.

You can use the RStudio built-in “New Project…” and select Package to get started. Nothing complicated there as the obvious dialog commands will lead you through making a minimal package.

Key things to remember:

All your R code goes into the /R directory.
Follow the instructions for object documentation . This is simple: just insert some #' comments at the front of each function you want to expose. Manual pages will be created automatically when you type devtools::document()
As soon as possible, you should also write some test cases using the testthat package.
Keep any data in the /data directory.

You will need to run devtools::document() each time you update documentation. It doesn’t update automatically upon building.

(Unit) Testing

Hadley’s testthat documentation , while delightfully concise, appears to be a bit out of date and somewhat tricky for me to follow, especially since my package needs some external data files for testing.

usethis::use_test("myNewFunction")

where myNewFunction is some function defined in the /r/ directory.

This automatically creates a file with the exact name /tests/testthat.R, which will contain the following lines:

library(testthat)
library(mynewpackage)
test_check("mynewpackage")

That was pretty much it. Run the tests by typing:

devtools::test()

One cool option is to run this from the RStudio console:

auto_test("./R"","./tests/testthat")

Now you can go editing code or tests as much as you like, and it will automatically re-run the affected tests. Makes it super-easy to deprecate functions or rename variables. Once you have enough test coverage, you can feel confident that everything just works.

I put my data files into a separate directory within the package: \inst\extdata\.

The trick for making test cases was to remember that these files will be swept out of that directory and into the top level when the package is run. So, to reference the pathnames of each file, you’ll need:

DATA_DIR <- system.file(“extdata”, package = “actino”) # “../../inst/extdata”

Saving Data

Once the package is working and I’m able to create new data from within it, I may want to save some of my results so they can be loaded quickly, possibly by somebody who doesn’t want to recompute it. Here’s the command that will save my R variable activity_raw to the /data/ directory:

usethis::use_data(activity_raw, overwrite = TRUE)

Remember to include a data description file called R/data.R that contains something like this:

#' Activity file
#'
#' @format a dataframe of 5 variables
"activity_raw"

Global Variables

It’s ugly, but this is one way to get around the annoying warning notes will flag references to column names for data frames. To get around that (and do any other convenient pre-loading), create one last file, by convention called zzz.R:

# zzz.R
# top level initializations

.onLoad <- function(libname = find.package("actino"), pkgname = "actino"){
  if(getRversion() >= "2.15.1")
    utils::globalVariables(c("ssr","tax_name","count","count_norm","tax_rank"))


  invisible()

}

The sample code for my first package is on Github.