TLDR: When performing a simple linear regression, if you have any concern about outliers or heterosedasticity, consider the
Theil-Sen estimator
.
A simple linear regression estimator that is not commonly used or taught in the social sciences is the Theil-Sen estimator. This is a shame given that this estimator is very intuitive, once you know what a slope means. Three steps:
- Plot a line between all the points in your data
- Calculate the slope for each line
- The median slope is your regression slope
Calculating the slope this way happens to be quite robust. And when the errors are normally distributed and you have no outliers, the slope is very similar to OLS.1
There are several methods to obtain the intercept. It is reasonable to know what your software is doing if you care for the intercept in your regression. Theil-Sen regression is available in two R packages I know of: WRS
2 and mblm.
mblm
includes a modification to Theil’s original method that has a higher breakdown point (more robust).3 This modification is the default method.
WRS
contains two functions for Theil-Sen regression: Theil’s original method in the tsreg
function, and a modification for small samples when there are tied values in the outcome in the tshdreg
function.
Re my comment at the top regarding Theil-Sen for simple linear regression when there are concerns about outliers and heteroskedasticity, see Dietz4 and Wilcox5 below.
I conducted a toy simulation to see how Theil-Sen competes with OLS under heteroskedasticity; It is the more efficient estimator.
-
Wilcox, R. R. (1998). A note on the Theil-Sen regression estimator when the regressor is random and the error term is heteroscedastic. Biometrical Journal, 40(3), 261–268. doi: 10.1002/(SICI)1521-4036(199807)40:3<261::AID-BIMJ261>3.0.CO;2-V ↩︎
-
Wilcox Robust Statistics - Rand Wilcox’s collection of robust methods. It is not available on CRAN, as CRAN requires proper documentation for all functions. This is a good set of installation instructions - https://web.archive.org/web/20170712140359/http://www.nicebread.de/installation-of-wrs-package-wilcox-robust-statistics/. ↩︎
-
Siegel, A. F. (1982). Robust regression using repeated medians. Biometrika, 69(1), 242–244. https://doi.org/10.1093/biomet/69.1.242 ↩︎
-
Dietz, E. J. (1987). A comparison of robust estimators in simple linear regression. Communications in Statistics - Simulation and Computation, 16(4), 1209–1227. https://doi.org/10.1080/03610918708812645 ↩︎
-
Wilcox, R. R. (1998). A note on the Theil-Sen regression estimator when the regressor is random and the error term is heteroscedastic. Biometrical Journal, 40(3), 261–268. ↩︎
Comments powered by Talkyard.