SoilGrids250m: Global gridded soil information based on machine learning

2017
Authors
Hengl, Tomislav
de Jesus, Jorge Mendes
Heuvelink, Gerard B. M.

Gonzalez, Maria Ruiperez
Kilibarda, Milan

Blagotić, Aleksandar
Shangguan, Wei

Wright, Marvin N.
Geng, Xiaoyuan
Bauer-Marschallinger, Bernhard

Guevara, Mario Antonio
Vargas, Rodrigo

MacMillan, Robert A.
Batjes, Niels H.

Leenaars, Johan G. B.

Ribeiro, Eloi
Wheeler, Ichsani
Mantel, Stephan
Kempen, Bas
Article (Published version)
Metadata
Show full item recordAbstract
This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods D random forest and gradient boosting and/or multinomial logistic regression D ...as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10 -fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.
Source:
PLOS One, 2017, 12, 2Publisher:
- Public Library of Science
Funding / projects:
- Dutch government
- GILAB DOO
DOI: 10.1371/journal.pone.0169748
ISSN: 1932-6203
PubMed: 28207752
WoS: 000394424500005
Scopus: 2-s2.0-85012965320
Institution/Community
GraFarTY - JOUR AU - Hengl, Tomislav AU - de Jesus, Jorge Mendes AU - Heuvelink, Gerard B. M. AU - Gonzalez, Maria Ruiperez AU - Kilibarda, Milan AU - Blagotić, Aleksandar AU - Shangguan, Wei AU - Wright, Marvin N. AU - Geng, Xiaoyuan AU - Bauer-Marschallinger, Bernhard AU - Guevara, Mario Antonio AU - Vargas, Rodrigo AU - MacMillan, Robert A. AU - Batjes, Niels H. AU - Leenaars, Johan G. B. AU - Ribeiro, Eloi AU - Wheeler, Ichsani AU - Mantel, Stephan AU - Kempen, Bas PY - 2017 UR - https://grafar.grf.bg.ac.rs/handle/123456789/885 AB - This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods D random forest and gradient boosting and/or multinomial logistic regression D as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10 -fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License. PB - Public Library of Science T2 - PLOS One T1 - SoilGrids250m: Global gridded soil information based on machine learning IS - 2 VL - 12 DO - 10.1371/journal.pone.0169748 ER -
@article{ author = "Hengl, Tomislav and de Jesus, Jorge Mendes and Heuvelink, Gerard B. M. and Gonzalez, Maria Ruiperez and Kilibarda, Milan and Blagotić, Aleksandar and Shangguan, Wei and Wright, Marvin N. and Geng, Xiaoyuan and Bauer-Marschallinger, Bernhard and Guevara, Mario Antonio and Vargas, Rodrigo and MacMillan, Robert A. and Batjes, Niels H. and Leenaars, Johan G. B. and Ribeiro, Eloi and Wheeler, Ichsani and Mantel, Stephan and Kempen, Bas", year = "2017", abstract = "This paper describes the technical development and accuracy assessment of the most recent and improved version of the SoilGrids system at 250m resolution (June 2016 update). SoilGrids provides global predictions for standard numeric soil properties (organic carbon, bulk density, Cation Exchange Capacity (CEC), pH, soil texture fractions and coarse fragments) at seven standard depths (0, 5, 15, 30, 60, 100 and 200 cm), in addition to predictions of depth to bedrock and distribution of soil classes based on the World Reference Base (WRB) and USDA classification systems (ca. 280 raster layers in total). Predictions were based on ca. 150,000 soil profiles used for training and a stack of 158 remote sensing-based soil covariates (primarily derived from MODIS land products, SRTM DEM derivatives, climatic images and global landform and lithology maps), which were used to fit an ensemble of machine learning methods D random forest and gradient boosting and/or multinomial logistic regression D as implemented in the R packages ranger, xgboost, nnet and caret. The results of 10 -fold cross-validation show that the ensemble models explain between 56% (coarse fragments) and 83% (pH) of variation with an overall average of 61%. Improvements in the relative accuracy considering the amount of variation explained, in comparison to the previous version of SoilGrids at 1 km spatial resolution, range from 60 to 230%. Improvements can be attributed to: (1) the use of machine learning instead of linear regression, (2) to considerable investments in preparing finer resolution covariate layers and (3) to insertion of additional soil profiles. Further development of SoilGrids could include refinement of methods to incorporate input uncertainties and derivation of posterior probability distributions (per pixel), and further automation of spatial modeling so that soil maps can be generated for potentially hundreds of soil variables. Another area of future research is the development of methods for multiscale merging of SoilGrids predictions with local and/or national gridded soil products (e.g. up to 50 m spatial resolution) so that increasingly more accurate, complete and consistent global soil information can be produced. SoilGrids are available under the Open Data Base License.", publisher = "Public Library of Science", journal = "PLOS One", title = "SoilGrids250m: Global gridded soil information based on machine learning", number = "2", volume = "12", doi = "10.1371/journal.pone.0169748" }
Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara, M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S.,& Kempen, B.. (2017). SoilGrids250m: Global gridded soil information based on machine learning. in PLOS One Public Library of Science., 12(2). https://doi.org/10.1371/journal.pone.0169748
Hengl T, de Jesus JM, Heuvelink GBM, Gonzalez MR, Kilibarda M, Blagotić A, Shangguan W, Wright MN, Geng X, Bauer-Marschallinger B, Guevara MA, Vargas R, MacMillan RA, Batjes NH, Leenaars JGB, Ribeiro E, Wheeler I, Mantel S, Kempen B. SoilGrids250m: Global gridded soil information based on machine learning. in PLOS One. 2017;12(2). doi:10.1371/journal.pone.0169748 .
Hengl, Tomislav, de Jesus, Jorge Mendes, Heuvelink, Gerard B. M., Gonzalez, Maria Ruiperez, Kilibarda, Milan, Blagotić, Aleksandar, Shangguan, Wei, Wright, Marvin N., Geng, Xiaoyuan, Bauer-Marschallinger, Bernhard, Guevara, Mario Antonio, Vargas, Rodrigo, MacMillan, Robert A., Batjes, Niels H., Leenaars, Johan G. B., Ribeiro, Eloi, Wheeler, Ichsani, Mantel, Stephan, Kempen, Bas, "SoilGrids250m: Global gridded soil information based on machine learning" in PLOS One, 12, no. 2 (2017), https://doi.org/10.1371/journal.pone.0169748 . .