Blog 1: Data Representation

Introduction

I spent this past week slowly exploring R Studio to see its functionalities. Specifically, I looked through several resources such as TutorialsPoint in order to find examples of these actions in action (ha). I also used the incredible internal software "Help()" which allows you to put help() around any function/operation that is confusing and R provides a full description of what it does and how to use it with examples!

I found very quickly that I learned coding best through examples and customizing someone else's work. A broader picture of the mechanics (how instructions are embedded within each other and how to define variables) became clear through looking at patterns. I anticipate this will be my method of instruction with the students: short examples that they can build upon for their needs.

Through my noodling, it became clear the quickest thing a person can learn in R is how to plot and analyze data sets. Plot (Line and Scatter), BoxPlot, Histogram, Pie, and Bar charts are all very easily accomplished and their instructions (this is what I'm calling what's inside the brackets that you need to fill in) very similar to Excel, which I grew up with and am pretty good at. Here are some examples:

Scatter Plots

plot(x = (0:10),y = c(2, 2.2, 2.5, 2.8, 3.1, 4, 4.9, 6.2, 7.1, 8.3,         9.5), #the x and y coordinates split into strings [not real]
    xlab = "Time (years since 2000)", #the label for the x-axis
    ylab = "Population (millions)", #the label for the y-axis
    xlim = c(0,10), #max and min values for the x-axis
    ylim = c(1,11), # max and min values for the y-axis
    main = "Population of Mississauga from 2000 to 2010" #Title

And without the comments so that the code works:

x = (0:10)
y = c(2, 2.2, 2.5, 2.8, 3.1, 4, 4.9, 6.2, 7.1, 8.3, 9.5)
plot(x,y,
    xlab = "Time (years since 2000)",
    ylab = "Population (millions)",
    xlim = c(0,10), ylim = c(1,11),
    main = "Population of Mississauga from 2000 to 2010 (not real)")


Using this example, students can replace the data coordinates (x,y) to what fits their work, as well as the titles and axes. I have shown both the use of an array "0:10" and an array "c(2, 2.2, 2.5, 2.8, 3.1, 4, 4.9, 6.2, 7.1, 8.3, 9.5)". One creates a list of numbers from 0 to 10 to be the x-values, the other lists the exact y-values to correspond to their x-value counterparts. Without the students knowing it, if I were to create a vector 'v", I could make it a 2x11 vector with all the data points:

> v = array(c(x,y))
> print(v)
[1] 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
[2] 2.0 2.2 2.5 2.8 3.1 4.0 4.9 6.2 7.1 8.3 9.5

Line Plots

Once I learned scatter plots, line plots became as simple as adding "type = "o"" between the commas. Why "o"? Because "type = p" means show only the points, "type = l" means show only the lines, and "type = o" means show both points and lines. 

type = "o",

Box Plots

Again, once I got the gist of it, creating different types of plots became easy. For example box plots, the scourge of grade 9s everywhere. TutorialsPoint goes into the much more detail, showing how to colour and adjust the shapes of the boxplots to be more complex, but showing a simple boxplot with the Q1, Q2, , Q3, max, and min values becomes very simple. R automatically solves for the three quartiles, including the mean (Q2) and shows where the max and min values are.

I think this will help students a lot during the instruction process where they can see what boxplots look like before they have to draw their own. I would use this tool in the same way I would use Desmos Graphing Calculator in my classes (for anything that has to do with graphs); as a way for students to check their work. 

x = c(1,1,2,3,4,4,6,7,7,8,9,10,3,4,14,16,17,17,18,19,12)
boxplot(x,
    xlab = "Grade 9 Class",
    ylab = "Student Marks out of 50",
    main = "Student grades for unit test 2",
    horizontal = TRUE)

Unfortunately, I don't know how to adjust the y-axis scale.

Histograms

I just thought I would try a simple line without any reference and it worked perfectly!

hist(x)

Again, this function allows you to ado several additional things:
  1. Add breaks between the bars using "breaks = c()" where the word inside the brackets can be "strurges", "fd", or "scott", presumably to describe the three types?
  2. Frequency type by using "freq = " where TRUE or FALSE can go in depending on wanting to use total values or probability density (of 1). 
  3. "Right" for right- or left-closed bins
  4. Axis labels (as shown above)
  5. Axis max and min values (as shown above)
  6. Colour (col) and borders (border)
  7. And data labels on top of each bar (labels)
hist(x,
    breaks="scott",
    freq=TRUE,
    right="TRUE",
    xlab="Bins",
    ylab="Percent")

Some things I learned:

-> "c" takes non-character values and makes all the results characters, allowing answers to contain both words and numbers, or clarify all the numbers in a string. I call this function "c" for "clever". 
-> The "=" works the exact same way as "<-" which is used in several websites.












Comments