R includes a number of datasets that it is convenient to use for examples. You can get a description of what’s available by typing
> data()
To access any of these datasets, you then type data(dataset) where dataset is the name of the dataset you wish to access. For example,
> data(trees)
Typing
> trees[1:5,]
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
gives us the first 5 rows of these data, and we can now see that the columns represent measurements of girth, height and volume of trees respectively.
Now, if we want to work on the columns of these data, we can use the subscripting technique explained above: for example, trees[,2]gives all of the heights. This is a bit tedious however, and it would be easier if we could refer to the heights more explicitly. We can achieve this by attaching to the trees dataset:
> attach(trees)
Effectively, this makes the contents of trees a directory, and if we type the name of an object, R will look inside this directory to find it. Since Height is the name of one of the columns of trees, R now recognises this object when we type the name. Hence, for example,
> mean(Height)
[1] 76and
> mean(trees[,2])
[1] 76are synonymous, while it is easier to remember exactly what calculation is being performed by the first of these expressions. In actual fact, trees is an object called a data frame, essentially a matrix with named columns (though a data frame, unlike a matrix, may also include non-numerical variables, such as character names). Because of this, there is another equivalent syntax to extract, for example, the vector of heights:
> trees$Height
which can also be used without having first attached to the dataset.