Information Safety

Improving technology through lessons from safety.

Working with R

Around the time of SIRACon 2020, I decided to start using R. I needed a data analysis tool that would allow me to conduct traditional statistical analysis, and I wanted a tool that would be valuable to learn and one that would allow me to do exploratory analysis as well. Originally I considered SPSS (free to students) and RStudio. The tradeoffs between the two were pretty clear: SPSS is very easy to use, but expensive, proprietary, and old. RStudio and R have a tougher learning curve, but are free and open source, under active development, and have a large online community. After reading a thread on the SIRA mailing list, I was leaning towards R, and re-watched Elliot Murphy’s 2019 SIRAcon presentation on using notebooks, which led me to consider both R Markdown and Python Jupyter Notebooks. I did more searching and reading, and finally settled on R Notebooks for a few reasons: R Notebooks are more disciplined (no strange side effects from running code out of order), fewer environment problems, the support of the RStudio company, better visualizations, and just because R is the more data-sciency language.

The SIRA community was quite supportive of this idea when I asked for suggestions on getting started in the BOF session, and recommended Teacup Giraffes and Tidy Tuesday for learning R, and on my own I found RStudio recommendations. Of course, being a sysadmin at heart, I set out to figure out how exactly to best install R and RStudio, and manage the notebooks in git.

Installation on macOS was easy enough, just brew install r and brew cask install rstudio. GitHub published a tutorial in 2018 on getting RStudio integrated with GitHub, and I started working on that. Quickly I discovered that while the tutorial was helpful, it wasn’t quite the setup I wanted; it published R Markdown through GitHub pages, but wouldn’t directly support the automatically generated html of R Notebooks. Side note: the consensus was to use html_notebook as a working document, and html_document to publish. After more searching, I was able to get Notebooks working on GitHub, but I used the method described in rstudio/rmarkdown #1020 - checking in the .nb.html into git, and using GitHub Pages so that you can view the rendered HTML instead of just the HTML code.

Working through this, I noted that RStudio is quite good at automatically downloading and installing packages as needed; it triggered installation of rmarkdown and supporting pacakges when creating a new R Notebook, and also readr when importing data from csv. Which got me thinking, what about package management? While it seems that R doesn’t have the level of challenge posed by Python or Ruby, managing packages on a per-project basis is a best practice I learned from using Bundler to manage the code of this site. (the only gem I install outside a project is bundler) So I went looking for the R equivalent…

I first found Packrat and then its replacement, renv (Packrat is maintained, but all new development has shifted to renv). Setting it up is as simple as install.packages("renv") and renv::init(), and RStudio has published:

This left one final question: how exactly to install r? Homebrew itself offers 2 methods: install the official binaries using brew cask install r, and just brew install r. Poking around further, I found that the cask method was sub-optimal as it installs in /usr/local which causes issues with brew doctor. Interestingly, I also found that Homebrew’s R doesn’t include all R features, but the same author, Luis Puerto offered a solution to install all the things. I haven’t tried it yet, but I may go with homebrew-r-srf as suggested by Luis (or a fork of it).

What’s next? At some point I plan to try to integrate GitHub actions for testing, and create a CI/CD pipeline of sorts for Pages, using GitHub actions. And, of course, actually using R for data analysis…

Update: I tested homebrew-r-srf, and am going with homebrew r. There was some weirdness with the install/uninstall (/usr/local/lib/R left over), I don’t know if I’ll need the optional features, and homebrew r now uses openblas. If I find I actaully need any of the missing capabilities, I’ll likely write my own formula.