Heuvelink, Gerard B. M.

Link to this page

Authority KeyName Variants
orcid::0000-0003-0959-9358
  • Heuvelink, Gerard B. M. (4)
Projects

Author's Bibliography

Random Forest Spatial Interpolation

Sekulić, Aleksandar; Kilibarda, Milan; Heuvelink, Gerard B. M.; Nikolić, Mladen; Bajat, Branislav

(MDPI, 2020)

TY  - JOUR
AU  - Sekulić, Aleksandar
AU  - Kilibarda, Milan
AU  - Heuvelink, Gerard B. M.
AU  - Nikolić, Mladen
AU  - Bajat, Branislav
PY  - 2020
UR  - https://www.mdpi.com/2072-4292/12/10/1687
UR  - https://grafar.grf.bg.ac.rs/handle/123456789/1973
AB  - For many decades, kriging and deterministic interpolation techniques, such as inverse distance weighting and nearest neighbour interpolation, have been the most popular spatial interpolation techniques. Kriging with external drift and regression kriging have become basic techniques that benefit both from spatial autocorrelation and covariate information. More recently, machine learning techniques, such as random forest and gradient boosting, have become increasingly popular and are now often used for spatial interpolation. Some attempts have been made to explicitly take the spatial component into account in machine learning, but so far, none of these approaches have taken the natural route of incorporating the nearest observations and their distances to the prediction location as covariates. In this research, we explored the value of including observations at the nearest locations and their distances from the prediction location by introducing Random Forest Spatial Interpolation (RFSI). We compared RFSI with deterministic interpolation methods, ordinary kriging, regression kriging, Random Forest and Random Forest for spatial prediction (RFsp) in three case studies. The first case study made use of synthetic data, i.e., simulations from normally distributed stationary random fields with a known semivariogram, for which ordinary kriging is known to be optimal. The second and third case studies evaluated the performance of the various interpolation methods using daily precipitation data for the 2016–2018 period in Catalonia, Spain, and mean daily temperature for the year 2008 in Croatia. Results of the synthetic case study showed that RFSI outperformed most simple deterministic interpolation techniques and had similar performance as inverse distance weighting and RFsp. As expected, kriging was the most accurate technique in the synthetic case study. In the precipitation and temperature case studies, RFSI mostly outperformed regression kriging, inverse distance weighting, random forest, and RFsp. Moreover, RFSI was substantially faster than RFsp, particularly when the training dataset was large and high-resolution prediction maps were made.
PB  - MDPI
T2  - Remote Sensing
T1  - Random Forest Spatial Interpolation
IS  - 10
SP  - 1687
VL  - 12
DO  - 10.3390/rs12101687
ER  - 
@article{
author = "Sekulić, Aleksandar and Kilibarda, Milan and Heuvelink, Gerard B. M. and Nikolić, Mladen and Bajat, Branislav",
year = "2020",
abstract = "For many decades, kriging and deterministic interpolation techniques, such as inverse distance weighting and nearest neighbour interpolation, have been the most popular spatial interpolation techniques. Kriging with external drift and regression kriging have become basic techniques that benefit both from spatial autocorrelation and covariate information. More recently, machine learning techniques, such as random forest and gradient boosting, have become increasingly popular and are now often used for spatial interpolation. Some attempts have been made to explicitly take the spatial component into account in machine learning, but so far, none of these approaches have taken the natural route of incorporating the nearest observations and their distances to the prediction location as covariates. In this research, we explored the value of including observations at the nearest locations and their distances from the prediction location by introducing Random Forest Spatial Interpolation (RFSI). We compared RFSI with deterministic interpolation methods, ordinary kriging, regression kriging, Random Forest and Random Forest for spatial prediction (RFsp) in three case studies. The first case study made use of synthetic data, i.e., simulations from normally distributed stationary random fields with a known semivariogram, for which ordinary kriging is known to be optimal. The second and third case studies evaluated the performance of the various interpolation methods using daily precipitation data for the 2016–2018 period in Catalonia, Spain, and mean daily temperature for the year 2008 in Croatia. Results of the synthetic case study showed that RFSI outperformed most simple deterministic interpolation techniques and had similar performance as inverse distance weighting and RFsp. As expected, kriging was the most accurate technique in the synthetic case study. In the precipitation and temperature case studies, RFSI mostly outperformed regression kriging, inverse distance weighting, random forest, and RFsp. Moreover, RFSI was substantially faster than RFsp, particularly when the training dataset was large and high-resolution prediction maps were made.",
publisher = "MDPI",
journal = "Remote Sensing",
title = "Random Forest Spatial Interpolation",
number = "10",
pages = "1687",
volume = "12",
doi = "10.3390/rs12101687"
}
Sekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M.,& Bajat, B.. (2020). Random Forest Spatial Interpolation. in Remote Sensing
MDPI., 12(10), 1687.
https://doi.org/10.3390/rs12101687
Sekulić A, Kilibarda M, Heuvelink GBM, Nikolić M, Bajat B. Random Forest Spatial Interpolation. in Remote Sensing. 2020;12(10):1687.
doi:10.3390/rs12101687 .
Sekulić, Aleksandar, Kilibarda, Milan, Heuvelink, Gerard B. M., Nikolić, Mladen, Bajat, Branislav, "Random Forest Spatial Interpolation" in Remote Sensing, 12, no. 10 (2020):1687,
https://doi.org/10.3390/rs12101687 . .
9
157
22
141

Sparse regression interaction models for spatial prediction of soil properties in 3D

Pejović, Milutin; Nikolić, Mladen; Heuvelink, Gerard B. M.; Hengl, Tomislav; Kilibarda, Milan; Bajat, Branislav

(Elsevier Ltd, 2018)

TY  - JOUR
AU  - Pejović, Milutin
AU  - Nikolić, Mladen
AU  - Heuvelink, Gerard B. M.
AU  - Hengl, Tomislav
AU  - Kilibarda, Milan
AU  - Bajat, Branislav
PY  - 2018
UR  - https://grafar.grf.bg.ac.rs/handle/123456789/943
AB  - An approach for using lasso (Least Absolute Shrinkage and Selection Operator) regression in creating sparse 3D models of soil properties for spatial prediction at multiple depths is presented. Modeling soil properties in 3D benefits from interactions of spatial predictors with soil depth and its polynomial expansion, which yields a large number of model variables (and corresponding model parameters). Lasso is able to perform variable selection, hence reducing the number of model parameters and making the model more easily interpretable. This also prevents overfitting, which makes the model more accurate. The presented approach was tested using four variable selection approaches - none, stepwise, lasso and hierarchical lasso, on four kinds of models - standard linear model, linear model with polynomial expansion of depth, linear model with interactions of covariates with depth and linear model with interactions of covariates with depth and its polynomial expansion. This framework was used to predict Soil Organic Carbon (SOC) in three contrasting study areas: Bor (Serbia), Edgeroi (Australia) and the Netherlands. Results show that lasso yields substantial improvements in accuracy over standard and stepwise regression - up to 50 % of total variance. It yields models which contain up to five times less nonzero parameters than the full models and that are usually more sparse than models obtained by stepwise regression, up to three times. Extension of the standard linear model by including interactions typically improves the accuracy of models produced by lasso, but is detrimental to standard and stepwise regression. Regarding computation time, it was demonstrated that lasso is several orders of magnitude more efficient than stepwise regression for models with tens or hundreds of variables (including interactions). Proper model evaluation is emphasized. Considering the fact that lasso requires meta-parameter tuning, standard cross-validation does not suffice for adequate model evaluation, hence a nested cross-validation was employed. The presented approach is implemented as publicly available sparsereg3D R package.
PB  - Elsevier Ltd
T2  - Computers & Geosciences
T1  - Sparse regression interaction models for spatial prediction of soil properties in 3D
EP  - 13
SP  - 1
VL  - 118
DO  - 10.1016/j.cageo.2018.05.008
ER  - 
@article{
author = "Pejović, Milutin and Nikolić, Mladen and Heuvelink, Gerard B. M. and Hengl, Tomislav and Kilibarda, Milan and Bajat, Branislav",
year = "2018",
abstract = "An approach for using lasso (Least Absolute Shrinkage and Selection Operator) regression in creating sparse 3D models of soil properties for spatial prediction at multiple depths is presented. Modeling soil properties in 3D benefits from interactions of spatial predictors with soil depth and its polynomial expansion, which yields a large number of model variables (and corresponding model parameters). Lasso is able to perform variable selection, hence reducing the number of model parameters and making the model more easily interpretable. This also prevents overfitting, which makes the model more accurate. The presented approach was tested using four variable selection approaches - none, stepwise, lasso and hierarchical lasso, on four kinds of models - standard linear model, linear model with polynomial expansion of depth, linear model with interactions of covariates with depth and linear model with interactions of covariates with depth and its polynomial expansion. This framework was used to predict Soil Organic Carbon (SOC) in three contrasting study areas: Bor (Serbia), Edgeroi (Australia) and the Netherlands. Results show that lasso yields substantial improvements in accuracy over standard and stepwise regression - up to 50 % of total variance. It yields models which contain up to five times less nonzero parameters than the full models and that are usually more sparse than models obtained by stepwise regression, up to three times. Extension of the standard linear model by including interactions typically improves the accuracy of models produced by lasso, but is detrimental to standard and stepwise regression. Regarding computation time, it was demonstrated that lasso is several orders of magnitude more efficient than stepwise regression for models with tens or hundreds of variables (including interactions). Proper model evaluation is emphasized. Considering the fact that lasso requires meta-parameter tuning, standard cross-validation does not suffice for adequate model evaluation, hence a nested cross-validation was employed. The presented approach is implemented as publicly available sparsereg3D R package.",
publisher = "Elsevier Ltd",
journal = "Computers & Geosciences",
title = "Sparse regression interaction models for spatial prediction of soil properties in 3D",
pages = "13-1",
volume = "118",
doi = "10.1016/j.cageo.2018.05.008"
}
Pejović, M., Nikolić, M., Heuvelink, G. B. M., Hengl, T., Kilibarda, M.,& Bajat, B.. (2018). Sparse regression interaction models for spatial prediction of soil properties in 3D. in Computers & Geosciences
Elsevier Ltd., 118, 1-13.
https://doi.org/10.1016/j.cageo.2018.05.008
Pejović M, Nikolić M, Heuvelink GBM, Hengl T, Kilibarda M, Bajat B. Sparse regression interaction models for spatial prediction of soil properties in 3D. in Computers & Geosciences. 2018;118:1-13.
doi:10.1016/j.cageo.2018.05.008 .
Pejović, Milutin, Nikolić, Mladen, Heuvelink, Gerard B. M., Hengl, Tomislav, Kilibarda, Milan, Bajat, Branislav, "Sparse regression interaction models for spatial prediction of soil properties in 3D" in Computers & Geosciences, 118 (2018):1-13,
https://doi.org/10.1016/j.cageo.2018.05.008 . .
1
17
10
15

SoilGrids250m: Global gridded soil information based on machine learning

Hengl, Tomislav; de Jesus, Jorge Mendes; Heuvelink, Gerard B. M.; Gonzalez, Maria Ruiperez; Kilibarda, Milan; Blagotić, Aleksandar; Shangguan, Wei; Wright, Marvin N.; Geng, Xiaoyuan; Bauer-Marschallinger, Bernhard; Guevara, Mario Antonio; Vargas, Rodrigo; MacMillan, Robert A.; Batjes, Niels H.; Leenaars, Johan G. B.; Ribeiro, Eloi; Wheeler, Ichsani; Mantel, Stephan; Kempen, Bas

(Public Library of Science, 2017)

TY  - JOUR
AU  - Hengl, Tomislav
AU  - de Jesus, Jorge Mendes
AU  - Heuvelink, Gerard B. M.
AU  - Gonzalez, Maria Ruiperez
AU  - Kilibarda, Milan
AU  - Blagotić, Aleksandar
AU  - Shangguan, Wei
AU  - Wright, Marvin N.
AU  - Geng, Xiaoyuan
AU  - Bauer-Marschallinger, Bernhard
AU  - Guevara, Mario Antonio
AU  - Vargas, Rodrigo
AU  - MacMillan, Robert A.
AU  - Batjes, Niels H.
AU  - Leenaars, Johan G. B.
AU  - Ribeiro, Eloi
AU  - Wheeler, Ichsani
AU  - Mantel, Stephan
AU  - Kempen, Bas
PY  - 2017
UR  - https://grafar.grf.bg.ac.rs/handle/123456789/885
AB  - This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods D random forest and gradient boosting and/or multinomial logistic regression D as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10 -fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.
PB  - Public Library of Science
T2  - PLOS One
T1  - SoilGrids250m: Global gridded soil information based on machine learning
IS  - 2
VL  - 12
DO  - 10.1371/journal.pone.0169748
ER  - 
@article{
author = "Hengl, Tomislav and de Jesus, Jorge Mendes and Heuvelink, Gerard B. M. and Gonzalez, Maria Ruiperez and Kilibarda, Milan and Blagotić, Aleksandar and Shangguan, Wei and Wright, Marvin N. and Geng, Xiaoyuan and Bauer-Marschallinger, Bernhard and Guevara, Mario Antonio and Vargas, Rodrigo and MacMillan, Robert A. and Batjes, Niels H. and Leenaars, Johan G. B. and Ribeiro, Eloi and Wheeler, Ichsani and Mantel, Stephan and Kempen, Bas",
year = "2017",
abstract = "This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods D random forest and gradient boosting and/or multinomial logistic regression D as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10 -fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.",
publisher = "Public Library of Science",
journal = "PLOS One",
title = "SoilGrids250m: Global gridded soil information based on machine learning",
number = "2",
volume = "12",
doi = "10.1371/journal.pone.0169748"
}
Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara, M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S.,& Kempen, B.. (2017). SoilGrids250m: Global gridded soil information based on machine learning. in PLOS One
Public Library of Science., 12(2).
https://doi.org/10.1371/journal.pone.0169748
Hengl T, de Jesus JM, Heuvelink GBM, Gonzalez MR, Kilibarda M, Blagotić A, Shangguan W, Wright MN, Geng X, Bauer-Marschallinger B, Guevara MA, Vargas R, MacMillan RA, Batjes NH, Leenaars JGB, Ribeiro E, Wheeler I, Mantel S, Kempen B. SoilGrids250m: Global gridded soil information based on machine learning. in PLOS One. 2017;12(2).
doi:10.1371/journal.pone.0169748 .
Hengl, Tomislav, de Jesus, Jorge Mendes, Heuvelink, Gerard B. M., Gonzalez, Maria Ruiperez, Kilibarda, Milan, Blagotić, Aleksandar, Shangguan, Wei, Wright, Marvin N., Geng, Xiaoyuan, Bauer-Marschallinger, Bernhard, Guevara, Mario Antonio, Vargas, Rodrigo, MacMillan, Robert A., Batjes, Niels H., Leenaars, Johan G. B., Ribeiro, Eloi, Wheeler, Ichsani, Mantel, Stephan, Kempen, Bas, "SoilGrids250m: Global gridded soil information based on machine learning" in PLOS One, 12, no. 2 (2017),
https://doi.org/10.1371/journal.pone.0169748 . .
33
2479
1196
2302

Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution

Kilibarda, Milan; Hengl, Tomislav; Heuvelink, Gerard B. M.; Graeler, Benedikt; Pebesma, Edzer; Tadić-Percec, Melita; Bajat, Branislav

(Wiley-Blackwell, 2014)

TY  - JOUR
AU  - Kilibarda, Milan
AU  - Hengl, Tomislav
AU  - Heuvelink, Gerard B. M.
AU  - Graeler, Benedikt
AU  - Pebesma, Edzer
AU  - Tadić-Percec, Melita
AU  - Bajat, Branislav
PY  - 2014
UR  - https://grafar.grf.bg.ac.rs/handle/123456789/639
AB  - Combined Global Surface Summary of Day and European Climate Assessment and Dataset daily meteorological data sets (around 9000 stations) were used to build spatio-temporal geostatistical models and predict daily air temperature at ground resolution of 1km for the global land mass. Predictions in space and time were made for the mean, maximum, and minimum temperatures using spatio-temporal regression-kriging with a time series of Moderate Resolution Imaging Spectroradiometer (MODIS) 8 day images, topographic layers (digital elevation model and topographic wetness index), and a geometric temperature trend as covariates. The accuracy of predicting daily temperatures was assessed using leave-one-out cross validation. To account for geographical point clustering of station data and get a more representative cross-validation accuracy, predicted values were aggregated to blocks of land of size 500x500km. Results show that the average accuracy for predicting mean, maximum, and minimum daily temperatures is root-mean-square error (RMSE) =2 degrees C for areas densely covered with stations and between 2 degrees C and 4 degrees C for areas with lower station density. The lowest prediction accuracy was observed at high altitudes (>1000m) and in Antarctica with an RMSE around 6 degrees C. The model and predictions were built for the year 2011 only, but the same methodology could be extended for the whole range of the MODIS land surface temperature images (2001 to today), i.e., to produce global archives of daily temperatures (a next-generation repository) and to feed various global environmental models. Key Points  Global spatio-temporal regression-kriging daily temperature interpolation   Fitting of global spatio-temporal models for the mean, maximum, and minimum temperatures   Time series of MODIS 8 day images as explanatory variables in regression part
PB  - Wiley-Blackwell
T2  - Journal of Geophysical Research-Atmospheres
T1  - Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution
EP  - 2313
IS  - 5
SP  - 2294
VL  - 119
DO  - 10.1002/2013JD020803
ER  - 
@article{
author = "Kilibarda, Milan and Hengl, Tomislav and Heuvelink, Gerard B. M. and Graeler, Benedikt and Pebesma, Edzer and Tadić-Percec, Melita and Bajat, Branislav",
year = "2014",
abstract = "Combined Global Surface Summary of Day and European Climate Assessment and Dataset daily meteorological data sets (around 9000 stations) were used to build spatio-temporal geostatistical models and predict daily air temperature at ground resolution of 1km for the global land mass. Predictions in space and time were made for the mean, maximum, and minimum temperatures using spatio-temporal regression-kriging with a time series of Moderate Resolution Imaging Spectroradiometer (MODIS) 8 day images, topographic layers (digital elevation model and topographic wetness index), and a geometric temperature trend as covariates. The accuracy of predicting daily temperatures was assessed using leave-one-out cross validation. To account for geographical point clustering of station data and get a more representative cross-validation accuracy, predicted values were aggregated to blocks of land of size 500x500km. Results show that the average accuracy for predicting mean, maximum, and minimum daily temperatures is root-mean-square error (RMSE) =2 degrees C for areas densely covered with stations and between 2 degrees C and 4 degrees C for areas with lower station density. The lowest prediction accuracy was observed at high altitudes (>1000m) and in Antarctica with an RMSE around 6 degrees C. The model and predictions were built for the year 2011 only, but the same methodology could be extended for the whole range of the MODIS land surface temperature images (2001 to today), i.e., to produce global archives of daily temperatures (a next-generation repository) and to feed various global environmental models. Key Points  Global spatio-temporal regression-kriging daily temperature interpolation   Fitting of global spatio-temporal models for the mean, maximum, and minimum temperatures   Time series of MODIS 8 day images as explanatory variables in regression part",
publisher = "Wiley-Blackwell",
journal = "Journal of Geophysical Research-Atmospheres",
title = "Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution",
pages = "2313-2294",
number = "5",
volume = "119",
doi = "10.1002/2013JD020803"
}
Kilibarda, M., Hengl, T., Heuvelink, G. B. M., Graeler, B., Pebesma, E., Tadić-Percec, M.,& Bajat, B.. (2014). Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution. in Journal of Geophysical Research-Atmospheres
Wiley-Blackwell., 119(5), 2294-2313.
https://doi.org/10.1002/2013JD020803
Kilibarda M, Hengl T, Heuvelink GBM, Graeler B, Pebesma E, Tadić-Percec M, Bajat B. Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution. in Journal of Geophysical Research-Atmospheres. 2014;119(5):2294-2313.
doi:10.1002/2013JD020803 .
Kilibarda, Milan, Hengl, Tomislav, Heuvelink, Gerard B. M., Graeler, Benedikt, Pebesma, Edzer, Tadić-Percec, Melita, Bajat, Branislav, "Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution" in Journal of Geophysical Research-Atmospheres, 119, no. 5 (2014):2294-2313,
https://doi.org/10.1002/2013JD020803 . .
12
198
126
185