Homework 1

Problem 2

(a) Time in terms of AM or PM.

Binary, qualitative, nominal (most people consider binary attributes to be nominal)

(b) Brightness as measured by a light meter.

Continuous, quantitative, ratio

(c) Brightness as measured by people’s judgments.

Discrete, qualitative, ordinal (assuming we make them choose from a discrete set of ratings)

(d) Angles as measured in degrees between 0◦ and 360◦.

Continuous, quantitative, ratio

(e) Bronze, Silver, and Gold medals as awarded at the Olympics.

Discrete, qualitative, ordinal

(f) Height above sea level.

Continuous, quantitative, interval/ratio (depends on whether sea level is regarded as an arbitrary origin)

(g) Number of patients in a hospital.

Discrete, quantitative, ratio

(h) ISBN numbers for books. (Look up the format on the Web.)

Discrete, qualitative, nominal (but ISBN numbers do have some order information so it could be ordinal if you use that information)

(i) Ability to pass light in terms of the following values: opaque, translucent, transparent.

Discrete, qualitative, ordinal

(j) Military rank.

Discrete, qualitative, ordinal

(k) Distance from the center of campus.

Continuous, quantitative, interval/ratio (depends)

(l) Density of a substance in grams per cubic centimeter.

Continuous, quantitative, ratio

(m) Coat check number. (When you attend an event, you can often give your coat to someone who, in turn, gives you a number that you can use to claim your coat when you leave.)

Discrete, qualitative, nominal (or ordinal if you are using the order information)

Problem 3

(a) After reading the data, I used the R commands is.factor and is.numeric to determine that both columns are quantitative

# First setwd() to the folder containing myfirstdata.csv
data <- read.csv("myfirstdata.csv", header = F)

# Look at the first few rows
head(data)

##   V1 V2
## 1  0  0
## 2  0  3
## 3  0  1
## 4  1  2
## 5  0  0
## 6  1  2


# Ask R if the columns are factors or numeric
c(is.factor(data[, 1]), is.numeric(data[, 1]))

## [1] FALSE  TRUE

c(is.factor(data[, 2]), is.numeric(data[, 2]))

## [1] FALSE  TRUE

(b) The plot for column 1 shows the row numbers on the x axis and the column 1 values on the y axis. A point is drawn for each row.

plot(data[, 1])

plot of chunk unnamed-chunk-2

plot(data[, 2])

plot of chunk unnamed-chunk-2

The plot for column 2 has the same interpretation.

c1 <- data[, 1]
c(mean(c1), max(c1), var(c1), quantile(c1, 0.25))

##                         25% 
##  1.593 27.000  4.527  0.000

Problem 4

Advantages: Color makes it much easier to visually distinguish visual elements from one another. For example, three clusters of two-dimensional points are more readily distinguished if the markers representing the points have different colors, rather than only different shapes. Also, figures with color are more interesting to look at.
Disadvantages: Some people are color blind and may not be able to properly interpret a color figure. Grayscale figures can show more detail in some cases. Color can be hard to use properly. For example, a poor color scheme can be garish or can focus attention on unimportant elements.

Problem 5

Read in the data

ca <- read.csv("CA_house_prices.csv")
oh <- read.csv("OH_house_prices.csv")

(a)

hist(ca[, 1], breaks = seq(0, 3500, by = 500), col = "orange", xlab = "CA House Prices (in Thousands)", 
    ylab = "Frequency", main = "Stats202 Solution's California House Prices Frequency Histogram")

plot of chunk unnamed-chunk-5

(b)

ca_hist <- hist(ca[, 1], breaks = seq(0, 3500, by = 500), plot = F)
oh_hist <- hist(oh[, 1], breaks = seq(0, 3500, by = 500), plot = F)
ca_counts <- ca_hist$counts
oh_counts <- oh_hist$counts
breaks <- ca_hist$breaks
mids <- ca_hist$mids
plot(mids, ca_counts/1500, pch = 19, ylim = c(0, 1), xlab = "House Prices (in Thousands)", 
    ylab = "Relative Frequency", main = "Stats202 Solution's Relative Frequency Polygons for House Price")
lines(mids, ca_counts/1500)
points(mids, oh_counts/10000, col = "blue", pch = 19)
lines(mids, oh_counts/10000, col = "blue", lty = 2)
legend(2000, 0.75, c("California", "Ohio"), col = c("black", "blue"), 
    lty = c(1, 2), pch = 19)

plot of chunk unnamed-chunk-6

(c)

CAecdf <- ecdf(ca[, 1])
OHecdf <- ecdf(oh[, 1])
plot(CAecdf, pch = 1, xlab = "House Prices (in Thousands)", ylab = "Cumulative Percent", 
    main = "Stats202 Solution's ECDF for House Prices")
lines(OHecdf, col = "blue", pch = 3)
legend(2000, 0.6, legend = c("California", "Ohio"), pch = c(1, 3), 
    col = c("black", "blue"), lwd = 1:3)

plot of chunk unnamed-chunk-7