Abstract

A k-nearest neighbor (k-NN) nonparametric algorithm variant was earlier applied successfully to estimate soil water retention. In this study, we tested the sensitivity of that k-NN variant to different data and algorithm options, such as: (i) estimations made to soils with differing distribution of properties; (ii) the use of different sample weighting methods; (iii) the number of ensembles we developed; (iv) data density in the reference data set; (v) the presence of outliers in the reference data set; (vi) unequal weighting of input attributes; and (vii) the addition of locally specific data to the reference data set. We used a hierarchical set of input attributes and data set sizes to develop ensembles of predictions using multiple randomized subset selections. The k-NN technique performed comparably well as neural network models developed on the same data. Using >50 ensemble members did not improve the results any further. The k-NN technique showed little sensitivity to the choice of sample weighting methods and to suboptimal weighting of input attributes. Differences in data density in parts of the reference data set did not substantially impact estimation errors. Estimations substantially improved for locally specific data when some local samples were included in the reference data set, while estimations for other samples remained almost unaffected. The k-NN technique shows a large degree of stability and insensitivity to different settings and options, can easily adopt new data without the need to redevelop equations, and is an effective alternative to other techniques to estimate soil water retention.

You do not currently have access to this article.