Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments
Апстракт
The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008-2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1) the non-linear nature of the prediction assignment task; (2) input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3) the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS) ready dataset, including the structural and age characteristics of the apartments as well as environmental a...nd neighbourhood information were considered in the modelling procedure. All performance measures (R-2 values, sales ratios, mean average percentage error (MAPE), coefficient of dispersion (COD)) revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.
Кључне речи:
random forest / OLS / hedonic price model / PCA / LjubljanaИзвор:
Isprs International Journal of Geo-Information, 2018, 7, 5Издавач:
- MDPI AG
Финансирање / пројекти:
- Slovenian-Serbian bilateral research project 451-03-3095/2014-09/34
DOI: 10.3390/ijgi7050168
ISSN: 2220-9964
WoS: 000435194700008
Scopus: 2-s2.0-85047141908
Институција/група
GraFarTY - JOUR AU - Ceh, Marjan AU - Kilibarda, Milan AU - Lisec, Anka AU - Bajat, Branislav PY - 2018 UR - https://grafar.grf.bg.ac.rs/handle/123456789/959 AB - The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008-2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1) the non-linear nature of the prediction assignment task; (2) input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3) the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS) ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R-2 values, sales ratios, mean average percentage error (MAPE), coefficient of dispersion (COD)) revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction. PB - MDPI AG T2 - Isprs International Journal of Geo-Information T1 - Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments IS - 5 VL - 7 DO - 10.3390/ijgi7050168 ER -
@article{ author = "Ceh, Marjan and Kilibarda, Milan and Lisec, Anka and Bajat, Branislav", year = "2018", abstract = "The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008-2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1) the non-linear nature of the prediction assignment task; (2) input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3) the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS) ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R-2 values, sales ratios, mean average percentage error (MAPE), coefficient of dispersion (COD)) revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.", publisher = "MDPI AG", journal = "Isprs International Journal of Geo-Information", title = "Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments", number = "5", volume = "7", doi = "10.3390/ijgi7050168" }
Ceh, M., Kilibarda, M., Lisec, A.,& Bajat, B.. (2018). Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. in Isprs International Journal of Geo-Information MDPI AG., 7(5). https://doi.org/10.3390/ijgi7050168
Ceh M, Kilibarda M, Lisec A, Bajat B. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. in Isprs International Journal of Geo-Information. 2018;7(5). doi:10.3390/ijgi7050168 .
Ceh, Marjan, Kilibarda, Milan, Lisec, Anka, Bajat, Branislav, "Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments" in Isprs International Journal of Geo-Information, 7, no. 5 (2018), https://doi.org/10.3390/ijgi7050168 . .