Learning Objectives

  • Describe what a R add-on package is
  • Install a package from CRAN
  • Install a package from Bioconductor
  • Learn how to manage package conflicts
  • Learn about other ways to practice R

Add-on packages

CRAN

In the last two lessons, we used an add-on package called dplyr. R is open source, so anyone can write their own functions for particular tasks. When you want to make these functions available to other people, you can develop them into a package with extra documentation like help pages for each function and a guide on how to put all the functions together in a typical analysis. If your package meets certain requirements, you can submit it to the Comprehensive R Archive Network (CRAN), who will host it and make it available to anyone in the world. CRAN currently has over 15,000 packages! To get any one of them, use R’s built in install.packages function:

install.packages("swirl") ## install a package to help learn R

You might get asked to choose a CRAN mirror – this is basically asking you to choose a site to download the package from. The choice doesn’t matter too much. I’d recommend choosing the RStudio mirror or one that is geographically close. You might also notice that more than one package gets downloaded. install.packages() checks to see if the package depends upon additional packages and downloads them as well.

Bioconductor

Besides CRAN, there are other repositories for R packages. Bioconductor provides packages for analyzing genomic data. They currently have over 1,800 software packages and 900 Annotation packages. To install a package from Bioconductor, you cannot use install.packages with default options. Instead, Bioconductor submitted their own small package to CRAN with a function to download packages from either Bioconductor or CRAN.

install.packages("AnnotationDbi") #this should throw an error

install.packages("BiocManager") ## Bioconductor's CRAN package
BiocManager::install("AnnotationDbi")

Package conflicts

What was that :: in the line of code above? It is a way to call the function install that is specifically from the BiocManager package. You can imagine that with 13,000 + 1,500 software packages, the same simple function names get used often. It is also a way to use a function without loading its package first. Remember from the dplyr lesson, in order to use the functions we had to first load the dplyr package with the library function. Every time you open a new R session, you have to load packages from your library to use the functions unless you use the :: designation. The :: can also come in handy when you have two packages loaded that share a function name. Try this:

library(dplyr)
library(AnnotationDbi)

If you already had the dplyr package loaded this session, then the first line may not produce any output in the console. But the second line should produce a lot of output, much of it similar to this:

The following object is masked from ‘package:dplyr’:

    select

“Masked” means that since AnnotationDbi was loaded second, using the function select will typically result in using the one from AnnotationDbi instead of dplyr. Ask for help on select in the following ways:

?select   #R gives both options
?dplyr::select   #takes you to dplyr's help page
?AnnotationDbi::select   #takes you to AnnotationDbi's help page

If the order of loading packages had been reversed, then the select function from dplyr would mask the one from AnnotationDbi. This package confict becomes an issue when you need more and more packages for an analysis. Package maintainers can mitigate conflicts by writing their functions in a way that tells R to check the class of object given to the function and then select the correction function.

More R practice

We have covered many of the basics of R, but because R is a language, you will not become fluent unless you get lots of practice. In addition to this workshop, there are many other ways to learn R. One excellent way is using the swirl package that we installed above. swirl uses self-paced, interactive lessons directly in R to cover many more topics that we have time to cover in this workshop. The only thing you have to remember to use swirl is to load the package with the library function:

library(swirl)

Just read the output and do what it says. It will wait for you to type in a response and hit enter. It then covers some basics of how it works. There are several different courses and I suggest to start with 1: R Programming: The basics of programming in R. Once the course installs, choose 1: R Programming, then from the lessons start with 1: Basic Building Blocks. Each lesson is interactive, giving you some information or asking a question and then waiting for your response. It checks to make sure your response is correct, and if not, will ask you to try again. These are great to work through on your own, at your own pace.

Another excellent way to get more R practice is to run through Software Carpentry’s Programming with R lessons. It’s focus is more on general programming, writing functions, for loops, if/else statements, writing packages and writing dynamic reports.


Data Carpentry, 2017. License. Contributing.
Questions? Feedback? Please file an issue on GitHub.
On Twitter: @datacarpentry