data<-read.csv("BART.csv")
mod1<-lm(pricepersqft~commute, data = data)
data$commute_sq=data$commute*data$commute
mod2<-lm(pricepersqft~commute+commute_sq, data=data)

A real estate website estately calculated the average price per square foot of houses in the mile around each BART station, and also listed a few commute times to downtown SF (Embarcadero). I figured that meant I needed to run some regressions.

plot(pricepersqft~commute, xlab="Commute in minutes", ylab="$/sq.ft. price of house", data=data)
text(pricepersqft~commute, labels=Station, cex=0.5, pos=3, data=data)
abline(mod1)

library(stargazer)
stargazer(mod1, mod2, se=list(NULL, NULL), type="html", out="stargazerout.html", title="Home Prices and BART Commutes", align=TRUE, column.labels=c("Linear","Quadratic"))
Home Prices and BART Commutes
Dependent variable:
pricepersqft
Linear Quadratic
(1) (2)
commute -12.775*** -28.689***
(1.946) (6.549)
commute_sq 0.317**
(0.125)
Constant 900.251*** 1,043.321***
(54.379) (76.253)
Observations 44 44
R2 0.507 0.573
Adjusted R2 0.495 0.552
Residual Std. Error 172.113 (df = 42) 162.000 (df = 41)
F Statistic 43.109*** (df = 1; 42) 27.533*** (df = 2; 41)
Note: p<0.1; p<0.05; p<0.01

That means every minute on BART is associated with $12.78 per square foot cheaper house. All the usual caveats: only on average, it’s not causal, etc. etc. Also, there’s a statistically significant quadratic relationship, but the difference in the relevant region isn’t enormous. If you can’t read a regression table, what we’re looking at is:

\[price=900-12.78*commute\] \[price=1043-28.89*commute+.317*commute^2\]

Still, I think that’s sort of neat. What’s neater is R Markdown, R projects, and how version control is seemlessly built into R Studio. That is smooth. The code for this is on Github.