data<-read.csv("BART.csv")
mod1<-lm(pricepersqft~commute, data = data)
data$commute_sq=data$commute*data$commute
mod2<-lm(pricepersqft~commute+commute_sq, data=data)
A real estate website estately calculated the average price per square foot of houses in the mile around each BART station, and also listed a few commute times to downtown SF (Embarcadero). I figured that meant I needed to run some regressions.
plot(pricepersqft~commute, xlab="Commute in minutes", ylab="$/sq.ft. price of house", data=data)
text(pricepersqft~commute, labels=Station, cex=0.5, pos=3, data=data)
abline(mod1)
library(stargazer)
stargazer(mod1, mod2, se=list(NULL, NULL), type="html", out="stargazerout.html", title="Home Prices and BART Commutes", align=TRUE, column.labels=c("Linear","Quadratic"))
Dependent variable: | ||
pricepersqft | ||
Linear | Quadratic | |
(1) | (2) | |
commute | -12.775*** | -28.689*** |
(1.946) | (6.549) | |
commute_sq | 0.317** | |
(0.125) | ||
Constant | 900.251*** | 1,043.321*** |
(54.379) | (76.253) | |
Observations | 44 | 44 |
R2 | 0.507 | 0.573 |
Adjusted R2 | 0.495 | 0.552 |
Residual Std. Error | 172.113 (df = 42) | 162.000 (df = 41) |
F Statistic | 43.109*** (df = 1; 42) | 27.533*** (df = 2; 41) |
Note: | p<0.1; p<0.05; p<0.01 |
That means every minute on BART is associated with $12.78 per square foot cheaper house. All the usual caveats: only on average, it’s not causal, etc. etc. Also, there’s a statistically significant quadratic relationship, but the difference in the relevant region isn’t enormous. If you can’t read a regression table, what we’re looking at is:
\[price=900-12.78*commute\] \[price=1043-28.89*commute+.317*commute^2\]
Still, I think that’s sort of neat. What’s neater is R Markdown, R projects, and how version control is seemlessly built into R Studio. That is smooth. The code for this is on Github.