Binary, qualitative, nominal (most people consider binary attributes to be nominal)
Continuous, quantitative, ratio
Discrete, qualitative, ordinal (assuming we make them choose from a discrete set of ratings)
Continuous, quantitative, ratio
Discrete, qualitative, ordinal
Continuous, quantitative, interval/ratio (depends on whether sea level is regarded as an arbitrary origin)
Discrete, quantitative, ratio
Discrete, qualitative, nominal (but ISBN numbers do have some order information so it could be ordinal if you use that information)
Discrete, qualitative, ordinal
Discrete, qualitative, ordinal
Continuous, quantitative, interval/ratio (depends)
Continuous, quantitative, ratio
Discrete, qualitative, nominal (or ordinal if you are using the order information)
(a) After reading the data, I used the R commands is.factor and is.numeric to determine that both columns are quantitative
# First setwd() to the folder containing myfirstdata.csv
data <- read.csv("myfirstdata.csv", header = F)
# Look at the first few rows
head(data)
## V1 V2
## 1 0 0
## 2 0 3
## 3 0 1
## 4 1 2
## 5 0 0
## 6 1 2
# Ask R if the columns are factors or numeric
c(is.factor(data[, 1]), is.numeric(data[, 1]))
## [1] FALSE TRUE
c(is.factor(data[, 2]), is.numeric(data[, 2]))
## [1] FALSE TRUE
(b) The plot for column 1 shows the row numbers on the x axis and the column 1 values on the y axis. A point is drawn for each row.
plot(data[, 1])
plot(data[, 2])
The plot for column 2 has the same interpretation.
(c) The R code and output appears below
c1 <- data[, 1]
c(mean(c1), max(c1), var(c1), quantile(c1, 0.25))
## 25%
## 1.593 27.000 4.527 0.000
Read in the data
ca <- read.csv("CA_house_prices.csv")
oh <- read.csv("OH_house_prices.csv")
(a)
hist(ca[, 1], breaks = seq(0, 3500, by = 500), col = "orange", xlab = "CA House Prices (in Thousands)",
ylab = "Frequency", main = "Stats202 Solution's California House Prices Frequency Histogram")
(b)
ca_hist <- hist(ca[, 1], breaks = seq(0, 3500, by = 500), plot = F)
oh_hist <- hist(oh[, 1], breaks = seq(0, 3500, by = 500), plot = F)
ca_counts <- ca_hist$counts
oh_counts <- oh_hist$counts
breaks <- ca_hist$breaks
mids <- ca_hist$mids
plot(mids, ca_counts/1500, pch = 19, ylim = c(0, 1), xlab = "House Prices (in Thousands)",
ylab = "Relative Frequency", main = "Stats202 Solution's Relative Frequency Polygons for House Price")
lines(mids, ca_counts/1500)
points(mids, oh_counts/10000, col = "blue", pch = 19)
lines(mids, oh_counts/10000, col = "blue", lty = 2)
legend(2000, 0.75, c("California", "Ohio"), col = c("black", "blue"),
lty = c(1, 2), pch = 19)
(c)
CAecdf <- ecdf(ca[, 1])
OHecdf <- ecdf(oh[, 1])
plot(CAecdf, pch = 1, xlab = "House Prices (in Thousands)", ylab = "Cumulative Percent",
main = "Stats202 Solution's ECDF for House Prices")
lines(OHecdf, col = "blue", pch = 3)
legend(2000, 0.6, legend = c("California", "Ohio"), pch = c(1, 3),
col = c("black", "blue"), lwd = 1:3)