Exercise 1: R Basics

Let’s get this party started! In this simple exercise, we’ll get everyone on the same page with RStudio. As already discussed, RStudio is a statistical software system. It is based on R, a programming language and environment. RStudio is different because it has all sorts of added features, in particular interface enhancements so you can write code and browse your files and so on, right inside RStudio. In addition, it’s free!

To begin, let’s get accustomed to the RStudio environment. Open up RStudio and you should see an array of “tiled” windows representing different tools or features. One of these will be called the console. Click the chevron and put your key cursor inside the console and type the following:

getwd()

## [1] "/Users/rickdale/Downloads/data"

This is a function – your first RStudio function! It is simply asking RStudio to tell you what the “working directory” (wd) is. It “gets the working directory” (getwd). The working directory is a key concept in RStudio. This is where you do all your work, where your data files will be stored, where your programs will be stored, etc.

Let’s change the working directory to something different that contains lots of data. First, you’ll need to download this big ZIP archive of data files I’ve prepared for us. Once you’ve downloaded it, extract the “data” folder, and you will “navigate” to that new data folder you downloaded. In other words – you will move your RStudio working directory form the one you see above, to this new one, perhaps in your Downloads folder or anywhere else you wish to do your work this week.

For example, after I’ve unarchived (extracted) the new “data” folder, this is what it looks like for me:

setwd("/Users/rickdale/Downloads/data")

Notice instead of getwd() it is now setwd (“set working directory”). This moves me to the new data folder so I can get RStudio to easily access all the data files inside it. Alright!

Notice that this “function” – setwd – is a bit more complicated than getwd from above. This setwd is obviously doing something different for us, but in order to change the working directory, it needs information. This information is referred to ask an argument – a variable that you give setwd. In this case, we give it the folder (“location”) on our computer in which we wish to be working.

Now if we want to see what is in that data folder, it’s easy. Check this out:

dir()

##  [1] "AAPL.dat"                                                                
##  [2] "allWords1900on.csv.zip"                                                  
##  [3] "arrest_data_2005-2014.zip"                                               
##  [4] "child.txt"                                                               
##  [5] "eyemovs_person_1.txt"                                                    
##  [6] "eyemovs_person_2.txt"                                                    
##  [7] "gretzky.Rdata"                                                           
##  [8] "high-satisfaction-dyad.txt"                                              
##  [9] "keystroke_times.csv"                                                     
## [10] "limb_movements.txt"                                                      
## [11] "low-satisfaction-dyad.txt"                                               
## [12] "mariposa_air_quality.txt"                                                
## [13] "merced_air_quality.txt"                                                  
## [14] "parent.txt"                                                              
## [15] "table_8_offenses_known_to_law_enforcement_by_california_by_city_2012.xls"
## [16] "typing_subjects.Rdata"

As getwd() is, this is also a simple function. It just lists the directory contents of the folder we just navigated to.

(Note: A trick for those who may want to know what their computer is doing. In one of your RStudio “tiles” (windows) you should see an option to navigate your “Files” This should look like a tab called “Files” and you can click on this tab and navigate to the folder you wish yourself. Once you have, you can click “More” then “Set as working directory” and RStudio will show you the line of code that is needed for your computer.)

Okay, now you’re in the data folder, and we can load up a data file to get the party started.

Let’s load up a sample data file that is in this set of data you just downloaded. Try this:

keystrokes = read.table('keystroke_times.csv',header=TRUE)

Notice that nothing yet happens when you read the table. In RStudio, reading a table will just store a variable that contains the data inside the text file. We’ll take a look at it in a minute. In R and RStudio, tables hold data for us inside variables that we can reference. Here I have called this keystrokes. If RStudio has any problems loading the data – for example, if the data file is an unfamiliar format or there is a mistake somewhere in the data file – RStudio will tell you, as an error in the console. Hopefully everything went smoothly for you. So, let’s start taking a peek at this data file.

(Note: Sometimes I’ll have elements in the code and won’t explain them, just for simplicitly. Don’t hesitate to ask me if something isn’t explained, I’m happy to go around and answer any questions! For example, the header=T in the above line tells RStudio that a variable name is in the header.

This data file is pretty neat… It’s just a single, simple sequence of numbers, but these numbers reflect keypress speeds – the millisecond timing of each keypress of someone typing description (in fact, it’s someone typing back the plot of their favorite movie!). As you’ll see, there are lots of keypresses in the task. The data don’t tell us what key. They only show us a series of responses, over time, of how fast each keypress was. You might ask – well, how do I tell how many keypresses there are? There’s a function for that of course.

dim(keystrokes)

## [1] 270   1

This gives us the dimensions of the keytrokes table, rows then columns.

Notice that keystrokes is a “table” – it has rows and columns. In fact, it has just one column. To take a look at the first ten entries of this column, try this out:

keystrokes[1:5,1]

## [1] 240.025 143.995 183.925 103.460 232.600

This says to RStudio: “Give me the first five rows, 1:5, of keystrokes, and the first column.” The way that RStudio understands this instruction is to give a subset of the table by using the square brackets: [rows,columns]. Here we can ask for a series of rows using the colon, and just one column using a single number: [1:5,1]. Putting it together we get: keystrokes[1:5,1].

Notice that it has hundreds of keystrokes! Now we are in the realm of cognitive science… these simple data reflect the speed at which someone’s mind is producing language – the click and clack of their keyboard is now a series of simple numbers, in milliseconds. Kewl.

In the next exercise we are going to start plotting these human data. Alright.

Simple exercise 1

Look through the keystrokes and identify the largest one. How slow was the slowest keystroke (in other words, what is the highest keystroke time, in milliseconds?).

Simple exercise 2

Take your various lines of code for your RStudio setup/computer that you’ve just put together, and create a script. This is a .R file that contains the series of lines one by one. In RStudio, as discussed in the intro, the biggest window or tile is often the “code editor” window. These are scripts that you can write and store so that you don’t have to keep typing the lines over and lines again in the console. Rick showed you how to set this up in the intro, but just to remind you: You can store a .R file (name it anything you want, X.R, etc.), and put it in your working directory. You can now open that file anytime you want to revisit your data. The script should look something like this, but modified for your computer:

setwd("/Users/rickdale/Downloads/data")
keystrokes = read.table('keystroke_times.csv',header=TRUE)
keystrokes[1:5,1]

## [1] 240.025 143.995 183.925 103.460 232.600