Multivariate feature ranking with high-dimensional data for classification tasks

Jiménez Barrionuevo, Fernando; Sanchez Carpena, G.; Palma Méndez, José Tomás; Miralles Pechuan, L.; Botia Blaya, J. A.

Publication:
Multivariate feature ranking with high-dimensional data for classification tasks

Files

Multivaria..s.pdf(1.24 MB)

Date

2022-06-08

Authors

Jiménez Barrionuevo, Fernando ; Sanchez Carpena, G. ; Palma Méndez, José Tomás ; Miralles Pechuan, L. ; Botia Blaya, J. A.

publication.page.department

Ingeniería de la Información y las Comunicaciones

DOI

10.1109/ACCESS.2022.3180773

item.page.type

info:eu-repo/semantics/article

Description

©2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted Manuscript version of a Published Work that appeared in final form in IEEE Access. To access the final edited and published work see DOI 10.1109/ACCESS.2022.3180773

Abstract

In many machine learning classification problems, datasets are usually of high dimensionality and therefore require efficient and effective methods for identifying the relative importance of their attributes, eliminating the redundant and irrelevant ones. Due to the huge size of the search space of the possible solutions, the attribute subset evaluation feature selection methods are not very suitable, so in these scenarios feature ranking methods are used. Most of the feature ranking methods described in the literature are univariate methods, which do not detect interactions between factors. In this paper, we propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency, which havebeen applied for cancer gene expression and genotype-tissue expression classification tasks using public datasets. We statistically proved that the proposed methods outperform the state-of-the-art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance, as well as other feature selection methods for attribute subset evaluation based on correlation and consistency with the multi-objective evolutionary search strategy, and with the embedded feature selection methods C4.5 and LASSO. The proposed methods have been implemented on the WEKA platform for public use, making all the results reported in this paper repeatable and replicable.

publication.page.subject

Artificial intelligence , Feature Selection , Machine learning , rankers

Citation

IEEE Access
https://ieeeaccess.ieee.org/about-ieee-access/learn-more-about-ieee-access/

URI

http://hdl.handle.net/10201/121146

Collections

Artículos

Full item page

Ir a Estadísticas

Este ítem está sujeto a una licencia Creative Commons. http://creativecommons.org/licenses/by-nc-nd/4.0/

Publication:
Multivariate feature ranking with high-dimensional data for classification tasks

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication: Multivariate feature ranking with high-dimensional data for classification tasks

Files

Date

relationships.isAuthorOfPublication

relationships.isSecondaryAuthorOf

relationships.isDirectorOf

Authors

item.page.secondaryauthor

item.page.director

Publisher

publication.page.editor

publication.page.department

DOI

item.page.type

Description

Abstract

publication.page.subject

Citation

URI

item.page.embargo

Collections

Publication:
Multivariate feature ranking with high-dimensional data for classification tasks