Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

Cánovas García, Fulgencio; Alonso Sarria, Francisco; Gomariz Castillo, Francisco; Oñate Valdivieso, Fernando

Por favor, use este identificador para citar o enlazar este ítem: https://doi.org/10.1016/j.cageo.2017.02.012

RefMan EndNote BibTex RefWorks Excel CSV PDF Mendeley

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Cánovas García, Fulgencio	-
dc.contributor.author	Alonso Sarria, Francisco	-
dc.contributor.author	Gomariz Castillo, Francisco	-
dc.contributor.author	Oñate Valdivieso, Fernando	-
dc.date.accessioned	2024-01-30T08:55:33Z	-
dc.date.available	2024-01-30T08:55:33Z	-
dc.date.issued	2017	-
dc.identifier.citation	Computers & Geosciences, 103. 2017	es
dc.identifier.uri	http://hdl.handle.net/10201/138091	-
dc.description	©2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/	es
dc.description.abstract	Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.	es
dc.format	application/pdf	es
dc.format.extent	28	es
dc.language	eng	es
dc.publisher	Pergamon-Elsevier Science Ltd	es
dc.relation	Sin financiación externa a la Universidad	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Classifcation	es
dc.subject	Random Forest	es
dc.subject	Object-based image analysis	es
dc.subject	Bagging	es
dc.subject	Statistical independence	es
dc.subject.other	CDU::9 - Geografía e historia	es
dc.title	Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery	es
dc.type	info:eu-repo/semantics/article	es
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S0098300416303909	es
dc.identifier.doi	https://doi.org/10.1016/j.cageo.2017.02.012	-
dc.contributor.department	Departamento de Geografía	-
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
2017_CanovasGarcia_etal_AcceptedVersion.pdf	Accepted version	8,17 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem Mostrar el registro PREMIS del ítem Estadísticas

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons