Browsing by Subject "Random forest"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- PublicationOpen AccessAnalysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods(Springer, 2024-04-05) Alonso Sarría, Francisco; Valdivieso Ros, Carmen; Gomariz Castillo, Francisco; GeografíaThe classification of land use and land cover (LULC) from remotely sensed imagery in semi-arid Mediterranean areas is a challenging task due to the fragmentation of the landscape and the diversity of spatial patterns. Recently, the use of deep learning (DL) for image analysis has increased compared to commonly used machine learning (ML) methods. This paper compares the performance of four algorithms, Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Convolutional Network (CNN), using multi-source data, applying an exhaustive optimisation process of the hyperparameters. The usual approach in the optimisation process of a LULC classification model is to keep the best model in terms of accuracy without analysing the rest of the results. In this study, we have analysed such results, discovering noteworthy patterns in a space defined by the mean and standard deviation of the validation accuracy estimated in a 10-fold cross validation (CV). The point distributions in such a space do not appear to be completely random, but show clusters of points that facilitate the discovery of hyperparameter values that tend to increase the mean accuracy and decrease its standard deviation. RF is not the most accurate model, but it is the less sensitive to changes in hyperparameters. Neural Networks, tend to increase commission and omission errors of the less represented classes because their optimisation lead the model to learn better the most frequent classes. On the other hand, RF and MLP prediction layers are the most accurate from a general qualitative point of view.
- PublicationOpen AccessCalibration and spatial modelling of daily ET0 in semiarid areas using Hargreaves equation(Springer Heidelberg, 2018-09-01) Gomariz Castillo, Francisco; Alonso Sarria, Francisco; Cabezas Calvo-Rubio, Francisco; GeografíaEvapotranspiration is difficult to measure and, when measured, its spatial variability is not usually taken into account. The recommended method to estimate evapotranspiration, Penman-Monteith FAO, requires variables not available in most weather stations. Simplified but less accurate methods, as Hargreaves equation, are normally used. Several approaches have been proposed to improve Hargreaves equation accuracy. In this work, 14 calibrations of the Hargreaves equation are compared. Three goodness of fit statistics were used to select the optimal, in terms of simplicity and accuracy. The best option was an annual linear regression. Its parameters were interpolated using regression-kriging combining Random Forest and Ordinary Kriging. Twelve easy to obtain ancillary variables were used as predictors. The same approach was used to interpolate Hargreaves and Penman-Monteith-FAO ET0 on a daily basis; the Hargreaves ET0 layers and the parameter layers were used to obtain calibrated ET0 estimations. To compare the spatial patterns of the three estimations the daily layers were integrated into annual layers. The results of the proposed calibration are much more similar to Penman-Monteith FAO results than those obtained with Hargreaves equation. The research was conducted in south-east Spain with 79 weather stations with data from 01/01/2003 to 31/12/2014.
- PublicationOpen AccessEffect of different atmospheric correction algorithms on Sentinel-2 imagery classification accuracy in a semiarid mediterranean area(MDPI, 2021-05-01) Valdivieso Ros, Carmen; Alonso Sarria, Francisco; Gomariz Castillo, Francisco; GeografíaMulti-temporal imagery classification using spectral information and indices with random forest allows improving accuracy in land use and cover classification in semiarid Mediterranean areas where the high fragmentation of the landscape caused by multiple factors complicates the task. Hence, since data come from different dates, atmospheric correction is needed to retrieve surface reflectivity values. The Sen2Cor, MAJA and ACOLITE algorithms have proven their good performances in these areas in different comparative studies, and DOS is a basic method that is widely used. The aim in this study was to test the feasibility of its application to the data set to improve the values of accuracy in classification and the performance in properly labelling different classes. Additionally, we tried to correct accuracy and separability mixing predictors with different algorithms. The results showed that, using a single algorithm, the general accuracy and kappa index from ACOLITE were the highest, 0.80 and 0.76, but the separability between problematic classes was slightly improved by using MAJA. Any combination of the different algorithms tested increased the values of classification, although they may help with separability between some pairs of classes.
- PublicationOpen AccessEffect of the Synergetic Use of Sentinel-1, Sentinel-2, LiDAR and Derived Data in Land Cover Classification of a Semiarid Mediterranean Area Using Machine Learning Algorithms(Multidisciplinary Digital Publishing Institute, 2023-01-05) Valdivieso Ros, Carmen; Alonso Sarria, Francisco; Gomariz Castillo, Francisco; GeografíaLand cover classification in semiarid areas is a difficult task that has been tackled using different strategies, such as the use of normalized indices, texture metrics, and the combination of images from different dates or different sensors. In this paper we present the results of an experiment using three sensors (Sentinel-1 SAR, Sentinel-2 MSI and LiDAR), four dates and different normalized indices and texture metrics to classify a semiarid area. Three machine learning algorithms were used: Random Forest, Support Vector Machines and Multilayer Perceptron; Maximum Likelihood was used as a baseline classifier. The synergetic use of all these sources resulted in a significant increase in accuracy, Random Forest being the model reaching the highest accuracy. However, the large amount of features (126) advises the use of feature selection to reduce this figure. After using Variance Inflation Factor and Random Forest feature importance, the amount of features was reduced to 62. The final overall accuracy obtained was 0.91 & PLUSMN; 0.005 (alpha = 0.05) and kappa index 0.898 & PLUSMN; 0.006 (alpha = 0.05). Most of the observed confusions are easily explicable and do not represent a significant difference in agronomic terms.
- PublicationOpen AccessIsolation forests to evaluate class separability and the representativeness of training and validation areas in land cover classification(MDPI, 2019-12-13) Alonso-Sarria, Francisco; Valdivieso Ros, Carmen; Gomariz Castillo, Francisco; GeografíaSupervised land cover classification from remote sensing imagery is based on gathering a set of training areas to characterise each of the classes and to train a predictive model that is then used to predict land cover in the rest of the image. This procedure relies mainly on the assumptions of statistical separability of the classes and the representativeness of the training areas. This paper uses isolation forests, a type of random tree ensembles, to analyse both assumptions and to easily correct lack of representativeness by digitising new training areas where needed to improve the classification of a Landsat-8 set of images with Random Forest. The results show that the improved set of training areas after the isolation forest analysis is more representative of the whole image and increases classification accuracy. Besides, the distribution of isolation values can be useful to estimate class separability. A class separability parameter that summarises such distributions is proposed. This parameter is more correlated to omission and commission errors than other separability measures such as the Jeffries–Matusita distance.
- PublicationOpen AccessPinna nobilis in the Mar Menor coastal lagoon: a story of colonization and uncertainty(Inter-Research, 2020-10-15) Gimenez-Casalduero, Francisca; Gomariz Castillo, Francisco; Alonso Sarria, Francisco; Cortés, Emilio; Izquierdo-Muñoz, Andrés; Ramos Esplá, Alfonso A.; GeografíaPopulations of the Mediterranean fan mussel Pinna nobilis have progressively decreased over the last decades as a result of anthropogenic activities. The rate of decline has strongly increased since 2016, when a mass mortality event triggered by the parasite Haplosporidium pinnae occurred, and evidence exists that Mycobacterium species may also have played a major role in the event. Indeed, the epidemic has spread throughout the Mediterranean, although coastal lagoons seem to offer a degree of ‘resistance’ against the parasite. In the early 1980s, P. nobilis appeared in the Mar Menor lagoon and rapidly became an important component of the benthos. However, colonization of the lagoon by the fan mussel was cut short in 2016 when a massive mortality event occurred, possibly as a consequence of the environmental collapse that occurred in the lagoon, parallel to the mortality that the species suffered in the Mediterranean that same year. In this study, we estimated the spatial distribution of P. nobilis in the Mar Menor for 3 periods: 2003-2004, 2013 and 2016. The first 2 periods use published data, and the last period uses data collected in a new campaign. The probability of occurrence for the 3 periods was estimated using random forest and random forest regression-kriging models. The main environmental variables that determined the dispersion and colonization of the bivalve in the lagoon before 2016 are also identified.
- PublicationOpen AccessPrice and spatial distribution office rental in Madrid: a decision tree analysis(Pontificia Universidad Católica del Perú, 2021) Camacho, Maximo; Ramallo, Salvador; Ruiz, Manuel; Métodos Cuantitativos para la Economía y la EmpresaIn this paper, we assess the drivers of office rental prices in the municipality of Madrid with a sample of 4,721 offices in March, 2020. The estimation was performed using the decision tree approach, which was built with a random forest algorithm. This technique allows us to capture the strong nonlinear component in the relation between price and its drivers, mainly geospatial location. Through a stratified analysis, we find out that the willingness to pay high rent in the center of Madrid is a feature of particular relevance to medium-sized offices. For diferent reasons, we also find out some office clusters located far from the city center with high rent for both large and small offices.
- PublicationOpen AccessSubsidies for investing in energy efficiency measures: Applying a random forest model for unbalanced samples(Elsevier Ltd., 2024-04-01) Álvarez Díez, Susana; Baixauli Soler, Juan Samuel; Lozano Reina, Gabriel; Rodríguez Linares Rey, Diego; Organización de Empresas y Finanzas; Métodos Cuantitativos para la Economía y la Empresa; Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Organización de Empresas y Finanzas; Facultades, Departamentos, Servicios y Escuelas::Departamentos de la UMU::Métodos Cuantitativos para la Economía y la EmpresaInvesting in energy efficiency measures is a major challenge for SMEs, both for environmental and economic reasons. However, certain barriers often make it difficult to invest in such measures. Although public financial support helps to overcome economic barriers, public bodies face the challenge of identifying which SMEs display the greatest potential to invest in energy efficiency measures. By applying a random forest technique and by using sampling balancing techniques, this paper identifies the profile of industrial SMEs that might be potential beneficiaries of public aid, thereby helping public institutions to target their calls and direct their efforts towards this group of SMEs. Specifically, liquidity and indebtedness are found to be the most useful predictors for SMEs in the industrial sector. The results are robust and reveal that applying a random forest approach for unbalanced samples offers greater predictive capacity and statistical power than applying traditional estimation techniques. By identifying potentially benefiting firms, this work helps to boost the effectiveness of public subsidies and to improve the channeling of public funds, which ultimately favors investment in energy efficiency.