Quantile Regression for Extraordinarily Large Data
Dr. Shih-Kang Chao
Friday, November 11, 2016
EMS Building E495
With emergence of new data collection and storage technologies, extremely large data sets have become increasingly common. The huge size makes it difficult to estimate the underlying distributional structure, which may be highly heterogeneous and asymmetric. A common approach to ease the computational burden is to split the complete data set into smaller subsamples and perform computation on each of the subsamples, combining results across subsamples at the end. Whilst this can be easily implemented, a detailed analysis of the theoretical and practical properties of the resulting approach is necessary to guarantee statistical accuracy of conclusions that are based on such a divide and conquer strategy. In this paper we provide explicit conditions that guarantee the statistical validity of procedures based on this divide and conquer procedure in the setting of quantile regression and illustrate via theory and simulations that inference can fail if those conditions are violated. We apply our method to make statistical inference for the cumulative distribution function of extraordinary large data.
Light Refreshments served at 1:30 pm in EMS E424A.