Application of Regression Analysis to Big Data
Articles
Indrė Baltušninkaitė
Vilniaus Gedimino technikos universitetas
Nomeda Bratčikovienė
Vilniaus Gedimino technikos universitetas
Published 2018-12-20
https://doi.org/10.15388/LJS.2018.5
PDF

Keywords

big data
regression analysis
leveraging
LASSO
LARS
RMSLE

How to Cite

Baltušninkaitė, I. and Bratčikovienė, N. (2018) “Application of Regression Analysis to Big Data”, Lithuanian Journal of Statistics, 57(1), pp. 56–69. doi:10.15388/LJS.2018.5.

Abstract

[full article and abstract in Lithuanian; abstract in English]

Opportunities and challenges of regression analysis for big data are investigated in the present article. Firstly, the main characteristics describing big data are identified and explained, and then potential challenges that arise in big data analytics are identified. According to the identified challenges, some methods used in the regression analysis for big data are proposed. These methods reduce the calculation burden and select variables that best describe the response variable, thus achieving sufficient statisti-cal accuracy and reducing costs and time of calculations. One of the main purposes of this article is to apply the methods for real data set. Simulation and real data regression models are formed and parameters are estimated using divided regression and regression based on leverage techniques. The LASSO and LARS regressions are used to select the best subset of variables. Finally, model diag-nostics, accuracy estimation and comparisons of results are performed.

PDF

Downloads

Download data is not yet available.