Wellcome & Disclaimer

This site contains the materials for the Coding tools for Biochemistry & Molecular Biology (Herramientas de Programación para Bioquímica y Biología Molecular) course of fall 2022 in the Bachelor’s Degree in Biochemistry @UAM. This materials are the basis for GitHub-pages-based website that can be accessed here. Detailed academic information about the course contents, dates and assessment only can be found at the UAM Moodle site.

All this material is open access and it is shared under CC BY-NC license.

RStudio Projects

Working with relative paths can get a little bit confusing, especially if run large projects with different folders and subfolders for data and results. A good way to avoid confusion is to make an R project in R Studio.

As you can see in the screenshot above you can easily start a new project in RStudio using the File menu. Then, just select a name and save it on your computer (it will be have the extension Rproj).

Now here’s the cool part: When you open your project, you will see that the project folder will be your default relative path, which you can check with getwd(). Also, in the top-right corner of Rstudio you can see the project name and the containing folder.

Working with projects come along with other extra features. For instance, every time you open the project, the .Rdata and the .Rhistory are loaded; thus previously edited source documents and RStudio settings (e.g. active tabs, splitter positions, etc.) are restored. This will make a difference if you manage different projects at the same time.

There are more reasons to do this, which become more obvious as you progress as a coder and start working on collaborative projects. You can also create your project using a version control (Git or Subversion) repository. Version control helps software teams manage changes to source code over time. Version control software keeps track of every modification to the code in a special kind of database. If a mistake is made, developers can turn back the clock and compare earlier versions of the code to help fix the mistake while minimizing disruption to all team members. Version control systems have been around for a long time but continue to increase in popularity with data science workflows.

R Markdown and Notebooks

Markdown and RMarkdown

Markdown is an easy-to-write language for formatting text. It was created in 2004 as a new markup language and since then it has widely used for blogging, instant messaging, online forums and software documentations, among other tools. For instance, websites like Github or Reddit use variants of markdown to facilitate discussions. The typical read me intro file that you see in Github repositories is written in markdown. Several programming languages also implemented markdown as an standard for documentation pages or reports generation.

RMarkdown is an R packages for the implementation of markdown for R and more commonly RStudio. It combines the core syntax of markdown with embedded code chunks that are run so their output can be included in the final document. The key about R Markdown documents is that they are fully reproducible (they can be automatically regenerated whenever underlying R code or data changes). Currently, RMarkdown documents can include chunks in different languages (see the screenshot below), so you can combine in the same document bash or Python scripts along with the R data analysis.

Also, they can be written using a Visual mode (above) that looks like any other word processor, but you can switch to Source anytime.

I found two main advantages in using RMarkdown documents for my data analysis. Initially, I started to use it in order to generate formatted reports including all the information about the project in a paper-like format, from the background to the data analysis and the conclusions. Indeed, those “reports” can be as fancy as you wish, from html or latex-formatted pdf, to MS Word or even presentations. A second benefit of the use of markdown is that it facilitates your own work a lot, specially when you work in several different projects or you if you need to do some analysis after some time. Additionally, you can use Rmarkdown documents for web applications based in R using Shiny.

Moreover, there are a number of specialized sites to publish your Rmarkdown documents, such as Rpubs, bookdown.org or RStudio connect.

Create your first R Markdown document

Like any other language, learning the syntax is much easier by making some examples. We can convert in a Markdown html document one of our exercises in a few steps:

  1. Start a new R Markdown document. Initially, it will ask you a title, author and date. You can also specify here the type of output document. This data will be included in the header of your rmd document, a configuration section written in YAML format that can be edited anytime. This section also can include information about the use of a TOC (Table of Contents), section numbering, html themes…

  2. Once you have your new document, it will include some default instructions about how to write and create coding chunks. You can switch between Source and Visual format to see the difference and then replace the instructions for some text like the exercise wording.

  3. Add your exercise script into one chunk and try it. You can also split it into more than one chunk and intercalate some explanations.

  4. Add a final chunk with the sessionInfo() to facilitate reproducibility.

  5. Knit it! You can try knitting into html, pdf or word document.

Note that knitting into pdf may require the installation of a LaTeX distribution or R package. There are several alternatives, like tinytex, MacTeX or MikTeX, among others (see references 9 and 10).

R Notebooks

Technically, R Markdown is a file, whereas R Notebook is a way to work with R Markdown files. R Notebooks do not have their own file format, they all use Rmd. All R Notebooks can be ‘knitted’ to R Markdown outputs, and all R Markdown documents can be interfaced as a Notebook. The R nb document is launched directly and no wizard (as in the case of .RMD) appears. Along with the notebook file, an additional html file that extension *.nb.html is generated. The notebook has the option for Preview. If any code is altered or edited, the new output is not shown. The output is shown in the code editor itself. Whatever the old output was it is only rendered. No new output is generated from the code change. In order to show the code output, we need to execute the chunk and then it will appear in the output. The YAML header has output as: output: html_notebook

Writing an R Notebook document is no different than writing an R Markdown document. The text and code chunk syntax does not differ from what you learned in the R Markdown tutorial. The primary difference is in the interativeness of an R Notebook. Primarily that when executing chunks in an R Markdown document, all the code is sent to the console at once, but in an R Notebook, only one line at a time is sent. This allows execution to stop if a line raises an error.

Quarto

In 2022, RStudio launched Quarto, a novel markdown flavor, based in Pandoc, a free-software document converter, that expanded the possibilities of RMarkdown with new format options and enriched templates. See some examples here.

Quarto is particularly designed to generate complex documents, like a whole website or books. It requires a .qmd file and a companion YAML document that includes the site/book structure, metadata and configuration. You can check the Quarto Reference to learn how to configure your yml file. As an example, have a look to the _quarto.yml file for this site on Structural Bioinformatics.

Besides the references below, I suggest you to check these examples and the this video about the use of Quarto to create amazing websites, books and interactive sites.

Session Info

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] formatR_1.12 knitr_1.41  
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.30   R6_2.5.1        jsonlite_1.8.3  magrittr_2.0.3 
##  [5] evaluate_0.18   stringi_1.7.8   cachem_1.0.6    rlang_1.0.6    
##  [9] cli_3.4.1       rstudioapi_0.14 jquerylib_0.1.4 bslib_0.4.1    
## [13] rmarkdown_2.18  tools_4.2.2     stringr_1.4.1   xfun_0.35      
## [17] yaml_2.3.6      fastmap_1.1.0   compiler_4.2.2  htmltools_0.5.3
## [21] sass_0.4.4