Hints for working with R for statistics

Here I collect a few answers I have given to requests for some hints for the assignments:

First, when you create a new table that you want to use later, you need to save the output of your operations in a new data object:

newtable <- oldtable %>% some_operation() %>% another_operation()

Second, to create new variables in a data table you have to use the operation

mutate(newvariable1 = some_function(oldvar1), newvariable2 = other_function(oldvar2))

to add to the existing table or you have to use

transmute(newvariable1 = some_function(oldvar1), newvariable2 = other_function(oldvar2))

to instead replace all variables from the existing table with the new ones you define within the transmute command.

Third, let’s say x is defined as a numeric vector with x <-c(4,1,2) then, factor(x) will output a data object of class factor with three unique levels (the number of unique entries in the vector x), the names for each level will be the original number as a character:

x <- c(4,1,2)
x
## [1] 4 1 2
y <- factor(x)
y
## [1] 4 1 2
## Levels: 1 2 4

If you want to change the names for the levels, you can define them in the factor command with the option “labels”, the levels and corresponding names of the levels follow the same order:

y <- factor(x, labels = c("eins", "zwei", "!drei"))
y
## [1] !drei eins  zwei 
## Levels: eins zwei !drei