Homework 1

Problem 2

(a) Time in terms of AM or PM.

Binary, qualitative, nominal (most people consider binary attributes to be nominal)

(b) Brightness as measured by a light meter.

Continuous, quantitative, ratio

(c) Brightness as measured by people’s judgments.

Discrete, qualitative, ordinal (assuming we make them choose from a discrete set of ratings)

(d) Angles as measured in degrees between 0◦ and 360◦.

Continuous, quantitative, ratio

(e) Bronze, Silver, and Gold medals as awarded at the Olympics.

Discrete, qualitative, ordinal

(f) Height above sea level.

Continuous, quantitative, interval/ratio (depends on whether sea level is regarded as an arbitrary origin)

(g) Number of patients in a hospital.

Discrete, quantitative, ratio

(h) ISBN numbers for books. (Look up the format on the Web.)

Discrete, qualitative, nominal (but ISBN numbers do have some order information so it could be ordinal if you use that information)

(i) Ability to pass light in terms of the following values: opaque, translucent, transparent.

Discrete, qualitative, ordinal

(j) Military rank.

Discrete, qualitative, ordinal

(k) Distance from the center of campus.

Continuous, quantitative, interval/ratio (depends)

(l) Density of a substance in grams per cubic centimeter.

Continuous, quantitative, ratio

(m) Coat check number. (When you attend an event, you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)

Discrete, qualitative, nominal (or ordinal if you are using the order information)

Problem 3

(a) After reading the data, I used the R commands is.factor and is.numeric to determine that both columns are quantitative

# First setwd() to the folder containing myfirstdata.csv
data <- read.csv("myfirstdata.csv", header = F)

# Look at the first few rows
head(data)
##   V1 V2
## 1  0  0
## 2  0  3
## 3  0  1
## 4  1  2
## 5  0  0
## 6  1  2

# Ask R if the columns are factors or numeric
c(is.factor(data[, 1]), is.numeric(data[, 1]))
## [1] FALSE  TRUE
c(is.factor(data[, 2]), is.numeric(data[, 2]))
## [1] FALSE  TRUE

(b) The plot for column 1 shows the row numbers on the x axis and the column 1 values on the y axis. A point is drawn for each row.

plot(data[, 1])

plot of chunk unnamed-chunk-2

plot(data[, 2])

plot of chunk unnamed-chunk-2

The plot for column 2 has the same interpretation.

(c) The R code and output appears below

c1 <- data[, 1]
c(mean(c1), max(c1), var(c1), quantile(c1, 0.25))
##                         25% 
##  1.593 27.000  4.527  0.000 

Problem 4

Problem 5

Read in the data

ca <- read.csv("CA_house_prices.csv")
oh <- read.csv("OH_house_prices.csv")

(a)

hist(ca[, 1], breaks = seq(0, 3500, by = 500), col = "orange", xlab = "CA House Prices (in Thousands)", 
    ylab = "Frequency", main = "Stats202 Solution's California House Prices Frequency Histogram")

plot of chunk unnamed-chunk-5

(b)

ca_hist <- hist(ca[, 1], breaks = seq(0, 3500, by = 500), plot = F)
oh_hist <- hist(oh[, 1], breaks = seq(0, 3500, by = 500), plot = F)
ca_counts <- ca_hist$counts
oh_counts <- oh_hist$counts
breaks <- ca_hist$breaks
mids <- ca_hist$mids
plot(mids, ca_counts/1500, pch = 19, ylim = c(0, 1), xlab = "House Prices (in Thousands)", 
    ylab = "Relative Frequency", main = "Stats202 Solution's Relative Frequency Polygons for House Price")
lines(mids, ca_counts/1500)
points(mids, oh_counts/10000, col = "blue", pch = 19)
lines(mids, oh_counts/10000, col = "blue", lty = 2)
legend(2000, 0.75, c("California", "Ohio"), col = c("black", "blue"), 
    lty = c(1, 2), pch = 19)

plot of chunk unnamed-chunk-6

(c)

CAecdf <- ecdf(ca[, 1])
OHecdf <- ecdf(oh[, 1])
plot(CAecdf, pch = 1, xlab = "House Prices (in Thousands)", ylab = "Cumulative Percent", 
    main = "Stats202 Solution's ECDF for House Prices")
lines(OHecdf, col = "blue", pch = 3)
legend(2000, 0.6, legend = c("California", "Ohio"), pch = c(1, 3), 
    col = c("black", "blue"), lwd = 1:3)

plot of chunk unnamed-chunk-7