- Born in Colombia
- Ecologist
- Research assistant and consultant: UniAndes, WWF, Inter American Development Bank
- MSc Ecosystems Governance
- PhD Sustainability Science
- Research interests:
- Regime Shifts
- Complex systems
- Networks
- Collective action
Juan C. Rocha
Stockholm Resilience Centre, Stockholm University
Slides: juanrocha.se/presentations/R_Intro
Recommended browser: Chrome (for nicer fonts)
Session | Content | When |
---|---|---|
1 | How R language works: Data types, functions, and other basics | April |
2 | Data cleaning, wrangling, and visualization | May |
3 | Tools for reproducible research and workflows | June |
4 | Useful packages for SES research: clustering and ordination | September |
5 | Tools for geographic analysis and mapping | October |
6 | Tools for time series analysis | November |
7 | Tools for network analysis | December |
8 | Tools for text analysis/text mining | January |
9 | Web-based interactive data tools | February |
10 | Multilingual programming | March |
You will get a soft intro to each topic, but each topic is worth a course on its own
There are >350 programming languages (Valverde 2016)
R is an object oriented programming languange. Learning R takes time and practice as with any other language. The more you use it, the better you become at speaking it. Do you have R already installed? Do you have RStudio? If not please follow:
You can use R to calculate simple and complex operations:
[1] 9
You can store the results of any operation by assigning it to an object. The assignment operator <-
does it for you
[1] 501.6667
Some times people uses =
instead of <-
for doing assignments
[1] 5
Note you can make comments on you code with #
Dimensions | Homogeneous | Heterogeneous |
---|---|---|
1 | Vector | List |
2 | Matrix | Data Frame |
3 | Array | NA |
One can think of a vector as a line of objects of the same kind
[1] "numeric"
vector
: 1d objectb <- c("Celinda", "Juan", "Miriam") # character vector
c <- c(TRUE, TRUE, FALSE) # logical vector
# Check
class(b)
[1] "character"
[1] "logical"
You can access the value of any element of your vectors by referring to their position using []
[1] 3
[1] 3 5 8
[1] "Celinda" "Juan"
Unlike other languages, R indexing starts at 1
matrix
: 2d objects [,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[1] 3 4
[1] 12
array
: multi-dim objects, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
, , 3
[,1] [,2]
[1,] 9 11
[2,] 10 12
list
: mixed objectsMix of objects, don’t need to be the same class nor length
[1] 1 3 5 8
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
To access the elements of a list you should use [[]]
by position, or $
by name
list
: mixed objects[1] "list"
[1] "matrix" "array"
[1] 4
If lists are mixed objects, can you put a list within a list? – Yes!
Data frames are one of the most used objects in R. It can have different types of data
books student pass
1 3 Celinda TRUE
2 5 Juan TRUE
3 8 Miriam FALSE
'data.frame': 3 obs. of 3 variables:
$ books : num 3 5 8
$ student: chr "Celinda" "Juan" "Miriam"
$ pass : logi TRUE TRUE FALSE
summary()
str()
and summary()
are very useful functions to understand how your data looks likehead()
gives you the first 6 rows of your dataset and tail(n = 3)
the last 3 ones. books student pass
Min. :3.000 Length:3 Mode :logical
1st Qu.:4.000 Class :character FALSE:1
Median :5.000 Mode :character TRUE :2
Mean :5.333
3rd Qu.:6.500
Max. :8.000
tibble
s are fancy printing data frames. They do the same as data.frame
but have nicer printing options.
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# … with 1,694 more rows
You can index by position
[1] TRUE
books student pass
1 3 Celinda TRUE
2 5 Juan TRUE
3 8 Miriam FALSE
Remember
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[1] 1 3 5 8
We can do some element-wise comparison with binary operators
[1] TRUE TRUE FALSE TRUE TRUE
[1] 0.9917544 1.3410942
[1] 0.2
[1] 0.2
The assignment operator <-
allows you to update or create new objects
[1] 5
[1] "list"
Where the magic starts…
Discuss in pairs for
`
03:00
`
Everything in R is a function call - John Chambers
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
{
if (is.object(data) || !is.atomic(data))
data <- as.vector(data)
.Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow),
missing(ncol)))
}
<bytecode: 0x7ff0030aa320>
<environment: namespace:base>
()
are arguments, the values above are the default values, e.g. byrow = FALSE
<environment: namespace:base>
means that matrix
is a function of the package base
We will see more about functions and how to write them in the future. For now, it’s good to know that functions are procedures or routines that were written by other R users and programmers to make your life simple (most of the times!). R comes with a number of basic funcitons by default, such as mean()
or matrix()
. Functions that work together for certain purposes (e.g. network analysis, linear regression) come in packages that you need to download, install and call every time you want to use them.
'network' 1.17.1 (2021-06-12), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
Network attributes:
vertices = 25
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 60
missing edges= 0
non-missing edges= 60
Vertex attribute names:
vertex.names
No edge attributes
Currently CRAN package repository features 19021 available packages.
One of the most powerful tools in R is its graphics capabilities. Visualizations helps you to explore and understand better your data, it inspires new questions and approaches to analysis. Some useful packages:
base::plot
grid
lattice
ggplot2
base
ggplot
circlos
networks
ggplot
or heatmaps
Lerning R is an iterative process between doing something, getting stuck, asking for help, solving your problem, and keep going.
You can also type ?mean
or ??mean
reprex
stairs | escalators, lift | elevators, trucks | lorries, pants | trousers
Be patient, diversity requires open mind but is good: redundancy.
gapminder
datagdpPercap
in history?
sort()
and order()
plot()
lifeExp
, where and when does it happen?
max()
plot()
, with()