DigitalUM :: Browsing by Subject "Natural language processing"

Browsing by Subject "Natural language processing"

Now showing 1 - 14 of 14

Open Access
A study on LIWC categories for opinion mining in Spanish reviews
(SAGE Publications, 2014-08-26) Salas Zárate, María del Pilar; López López, Estanislao; Valencia García, Rafael; Aussenac Gilles, Natalie; Almela, Ángela; Alor Hernández, Giner; Filología Inglesa
With the exponential growth of social media, that is, blogs and social networks, organizations and individual persons are increasingly using the number of reviews of these media for decision-making about a product or service. Opinion mining detects whether the emotions of an opinion expressed by a user on Web platforms in natural language are positive or negative. This paper presents extensive experiments to study the effectiveness of the classification of Spanish opinions in five categories: highly positive, highly negative, positive, negative and neutral, using the combination of the psychological and linguistic features of LIWC (Linguistic Inquiry and Word Count). LIWC is a text analysis software that enables the extraction of different psychological and linguistic features from natural language text. For this study, two corpora have been used, one about movies and one about technological products. Furthermore, we conducted a comparative assessment of the performance of various classification techniques, J48, SMO and BayesNet, using precision, recall and F-measure metrics. The findings revealed that the positive and negative categories provide better results than the other categories. Finally, experiments on both corpora indicated that SMO produces better results than BayesNet and J48 algorithms, obtaining an F-measure of 90.4 and 87.2% in each domain.
Open Access
Compilation and evaluation of the Spanish SatiCorpus 2021 for satire identification using linguistic features and transformers
(Springer , 2021-12-17) García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Open Access
Detección automática de errores lingüísticos en textos clínicos: análisis de patrones de error en varias especialidades médicas
(Tremédica, 2021) López Hernández, Jésica; Almela, Ángela; Lengua Española, Lingüistica General; Filología Inglesa; Facultades de la UMU::Facultad de Letras
El objetivo de este trabajo es aportar el primer análisis cuantitativo de tipos de errores contenidos en un corpus formado por informes clínicos en español. Se han analizado informes clínicos pertenecientes a las especialidades de urgencias, uci, psiquiatría y cirugía general. Los errores fueron estudiados teniendo en cuenta criterios como distancia de edición, tipo de error o existencia de multierror en la palabra. Para tal cometido, se desarrolló una herramienta de identificación y clasificación de errores, se utilizaron técnicas estadísticas y se compararon los resultados con trabajos previos sobre patrones de errores. Los resultados indican que el tipo de error más frecuente es el de omisión de tilde y la mayoría de los errores ocurren a distancia de edición 1, entre parejas de caracteres con similitudes fonéticas y parejas de caracteres adyacentes en el teclado.
Open Access
Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings
(Elsevier, 2020-08-22) García Díaz, José Antonio; Cánovas-García, Mar; Colomo-Palacios, Ricardo; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
Online social networks allow powerless people to gain enormous amounts of control over particular people's lives and pro t from the anonymity or social distance that the Internet provides in order to harass other people. One of the most frequently targeted groups comprise women, as misogyny is, unfortunately, a reality in our society. However, although great e orts have recently been made to identify misogyny, it is still di cult to distinguish as it can sometimes be very subtle and deep, signifying that the use of statistical approaches is not su cient. Moreover, as Spanish is spoken worldwide, context and cultural di erences can complicate this identi cation. Our contribution to the detection of misogyny in Spanish is two-fold. On the one hand, we apply Sentiment Analysis and Social Computing technologies for detecting misogynous messages in Twitter. On the other, we have compiled the Spanish MisoCorpus-2020, a balanced corpus regarding misogyny in Spanish, and classi ed it into three subsets concerning (1) violence towards relevant women, (2) messages harassing women in Spanish from Spain and Spanish from Latin America, and (3) general traits related to misogyny. Our proposal combines a classi cation based on average word embeddings and linguistic features in order to understand which linguistic phenomena principally contribute to the identi cation of misogyny. We have evaluated our proposal with three machine-learning classi ers, achieving the best accuracy of 85.175%. Finally the proposed approach is also validated with existing corpora for misogyny and aggressiveness detection such as AMI and HatEval obtaining good results
Open Access
Diseño y metodología de un etiquetador semántico-ontológico multilingüe: ESMAS-ES+.
(Universidad de Murcia, Servicio de Publicaciones., 2025) Domínguez Vázquez, María José; Sin departamento asociado
El etiquetador automático ESMAS-ES+ tiene como objetivo central la anotación semántico-ontológica de textos en español, francés, alemán y gallego. Junto con el estudio de la viabilidad de un nuevo método de análisis, el desarrollo del etiquetador requiere explorar nuevas vías para el procesamiento inteligente de la información y conocimiento, y, por ende, para la comprensión profunda del significado. Esta publicación presenta los principios metodológicos para su diseño, así como una panorámica de técnicas y estrategias aplicables para la generación de conocimiento lingüístico, multilingüe y tecnológico sostenible, lo que, a su vez, contribuirá al diseño de herramientas extrapolables a diferentes lenguas. La evolución de ESMAS-ES+ puede repercutir en algunas áreas del procesamiento del lenguaje natural, en especial, en aquellas ligadas a la comprensión y desambiguación del significado. De este modo, puede contribuir a favorecer la legibilidad y comprensión de datos lingüísticos por parte de máquinas.
Open Access
Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers
(Springer, 2023) García Díaz, José Antonio; Jiménez Zafra, Salud María; García Cumbreras, Miguel Ángel; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
The rise of social networks has allowed misogynistic, xenophobic, and homophobic people to spread their hate-speech to intimidate individuals or groups because of their gender, ethnicity or sexual orientation. The consequences of hate-speech are devastating, causing severe depression and even leading people to commit suicide. Hate-speech identification is challenging as the large amount of daily publications makes it impossible to review every comment by hand. Moreover, hate-speech is also spread by hoaxes that requires language and context understanding. With the aim of reducing the number of comments that should be reviewed by experts, or even for the development of autonomous systems, the automatic identification of hate-speech has gained academic relevance. However, the reliability of automatic approaches is still limited specifically in languages other than English, in which some of the state-of-the-art techniques have not been analyzed in detail. In this work, we examine which features are most effective in identifying hate-speech in Spanish and how these features can be combined to develop more accurate systems. In addition, we characterize the language present in each type of hate-speech by means of explainable linguistic features and compare our results with state-of-the-art approaches. Our research indicates that combining linguistic features and transformers by means of knowledge integration outperforms current solutions regarding hate-speech identification in Spanish.
Open Access
Evaluation of transformer models for financial targeted sentiment analysis in Spanish
(PeerJ, 2023-05-09) Pan, Ronghao; García Díaz, José Antonio; García Sánchez, Francisco; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Open Access
Fine grain emotion analysis in Spanish using linguistic features and transformers
(PeerJ, 2024-04-30) Salmerón Ríos, Alejandro; García Díaz, José Antonio; Pan, Ronghao; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
Mental health issues are a global concern, with a particular focus on the rise of depression. Depression affects millions of people worldwide and is a leading cause of suicide, particularly among young people. Recent surveys indicate an increase in cases of depression during the COVID-19 pandemic, which affected approximately 5.4% of the population in Spain in 2020. Social media platforms such as X (formerly Twitter) have become important hubs for health information as more people turn to these platforms to share their struggles and seek emotional support. Researchers have discovered a link between emotions and mental illnesses such as depression. This correlation provides a valuable opportunity for automated analysis of social media data to detect changes in mental health status that might otherwise go unnoticed, thus preventing more serious health consequences. Therefore, this research explores the field of emotion analysis in Spanish towards mental disorders. There are two contributions in this area. On the one hand, the compilation, translation, evaluation and correction of a novel dataset composed of a mixture of other existing datasets in the bibliography. This dataset compares a total of 16 emotions, with an emphasis on negative emotions. On the other hand, the in-depth evaluation of this novel dataset with several state-ofthe- art transformers based on encoder-only and encoder-decoder architectures. The analysis compromises monolingual, multilingual and distilled models as well as feature integration techniques. The best results are obtained with the encoder-only MarIA model, with a macro-average F1 score of 60.4771%.
Open Access
Hope speech detection in Spanish. The LGBT case
(Springer, 2023-03-17) García‑Baena, Daniel; García‑Cumbreras, Miguel Ángel; Jiménez‑Zafra, Salud María; García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultad de Informática
In recent years, systems have been developed to monitor online content and remove abusive, offensive or hateful content. Comments in online social media have been analyzed to find and stop the spread of negativity using methods such as hate speech detection, identification of offensive language or detection of abusive language. We define hope speech as the type of speech that is able to relax a hostile environment and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression. Detecting it automatically, in order to give greater diffusion to positive comments, can have a very significant effect when it comes to fighting against sexual or racial discrimination or when we intend to foster less bellicose environments. In this article we perform a complete study on hope speech in Spanish, analyzing existing solutions and available resources. In addition, we have generated a quality resource, a new Twitter dataset on LGBT community, and we have conducted some experiments that can serve as a baseline for further research.
Open Access
Methodology for measuring individual affective polarization using sentiment analysis in social networks
(2024-07-22) Martínez España, Raquel; Fernández-Pedauye, Julio; Giner-Pérez de Lucia, José; Rojo Martínez, José Miguel; Bakdid-Albane, Kaoutar; García Escribano, Juan José; Sociología; Facultad de Trabajo Social
Affective polarization has important consequences for societies and institutions. At the institutional level, it hinders agreement among political actors, which damages the stability of the system. At the social level, it increases tensions and conflicts between people, damaging coexistence. Until now, affective polarization has been studied essentially through surveys, which are generally very costly if large and representative samples are to be obtained and in which the answers of the interviewees may not be totally sincere. Through this article, we apply sentiment analysis techniques to measure affective polarization without resorting to surveys, simply by monitoring the non-self-reported behavior of individuals in social networks. To do that, a novel methodology and a new indicator of affective polarization has been proposed using data from social networks. The proposed methodology and new indicator have been applied to the real case study of the regional elections in Spain, specifically to the autonomous Region of Murcia. The application of the methodology has been satisfactory, as well as that of the new indicator of affective polarization, providing a cost-effective way of calculating polarization. The results show that all political groups are polarized to a greater or lesser extent. Furthermore, the results conclude that the winning ideology in the elections, i.e., the right, was the one whose supporters behaved differently from the supporters of other ideologies.
Open Access
Psychographic traits identification based on political ideology: an author analysis study on Spanish politicians’ tweets posted in 2020
(Elsevier, 2022-05) García Díaz, José Antonio; Colomo Palacios, Ricardo; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
In general, people are usually more reluctant to follow advice and directions from politicians who do not have their ideology. In extreme cases, people can be heavily biased in favour of a political party at the same time that they are in sharp disagreement with others, which may lead to irrational decision making and can put people’s lives at risk by ignoring certain recommendations from the authorities. Therefore, considering political ideology as a psychographic trait can improve political micro-targeting by helping public authorities and local governments to adopt better communication policies during crises. In this work, we explore the reliability of determining psychographic traits concerning political ideology. Our contribution is twofold. On the one hand, we release the PoliCorpus-2020, a dataset composed by Spanish politicians’ tweets posted in 2020. On the other hand, we conduct two authorship analysis tasks with the aforementioned dataset: an author profiling task to extract demographic and psychographic traits, and an authorship attribution task to determine the author of an anonymous text in the political domain. Both experiments are evaluated with several neural network architectures grounded on explainable linguistic features, statistical features, and state-of-the-art transformers. In addition, we test whether the neural network models can be transferred to detect the political ideology of citizens. Our results indicate that the linguistic features are good indicators for identifying fine-grained political affiliation, they boost the performance of neural network models when combined with embedding-based features, and they preserve relevant information when the models are tested with ordinary citizens. Besides, we found that lexical and morphosyntactic features are more effective on author profiling, whereas stylometric features are more effective in authorship attribution.
Open Access
Seeing through deception: a computational approach to deceit detection in written communication
(Association for Computational Linguistics, 2012) Almela, Ángela; Valencia García, Rafael; Cantos Gómez, Pascual; Filología Inglesa; Informática y Sistemas
The present paper addresses the question of the nature of deception language. Specifically, the main aim of this piece of research is the exploration of deceit in Spanish written communication. We have designed an automatic classifier based on Support Vector Machines (SVM) for the identification of deception in an ad hoc opinion corpus. In order to test the effectiveness of the LIWC2001 categories in Spanish, we have drawn a comparison with a Bag-of-Words (BoW) model. The results indicate that the classification of the texts is more successful by means of our initial set of variables than with the latter system. These findings are potentially applicable to areas such as forensic linguistics and opinion mining, where extensive research on languages other than English isneeded.
Open Access
Spanish MEACorpus 2023: a multimodal speech–text corpus for emotion analysis in Spanish from natural environments
(Elsevier, 2024-08) Pan, Ronghao; García Díaz, José Antonio; Rodríguez García, Miguel Ángel; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
In human–computer interaction, emotion recognition provides a deeper understanding of the user’s emotions, enabling empathetic and effective responses based on the user’s emotional state. While deep learning models have improved emotion recognition solutions, it is still an active area of research. One important limitation is that most emotion recognition systems use only text as input, ignoring features such as voice intonation. Another limitation is the limited number of datasets available for multimodal emotion recognition. In addition, most published datasets contain emotions that are simulated by professionals and produce limited results in real-world scenarios. In other languages, such as Spanish, hardly any datasets are available. Therefore, our contributions to emotion recognition are as follows. First, we compile and annotate a new corpus for multimodal emotion recognition in Spanish (Spanish MEACorpus 2023), which contains 13.16 h of speech divided into 5129 segments labeled by considering Ekman’s six basic emotions. The dataset is extracted from YouTube videos in natural environments. Second, we explore several deep learning models for emotion recognition using text- and audio-based features. Third, we evaluate different multimodal techniques to build a multimodal recognition system that improves the results of unimodal models, achieving a Macro F1-score of 87.745%, using late fusion with concatenation strategy approach.
Open Access
Spanish MTLHateCorpus 2023: multi-task learning for hate speech detection to identify speech type, target, target group and intensity
(Elsevier, 2025-08) Ronghao Pan; García Díaz, José Antonio; Valencia García, Rafael; Informática y Sistemas; Facultades de la UMU::Facultad de Informática
The rise of digital communication has exacerbated the challenge of tackling harmful speech online, particularly hate speech, which dehumanises individuals or groups on the basis of traits such as race, gender or ethnicity. This study highlights the urgent need for fine-grained detection methods that take into account several subtasks of hate speech detection, including its intensity, determining the groups to which hate speech is directed, and whether the target is an individual or a group. Furthermore, there is a gap in comprehensive Spanish language corpora that cover these subtasks of hate speech detection. Therefore, we created a novel corpus entitled Spanish MTLHateCorpus 2023 to facilitate the analysis of hate speech in these subtasks and evaluated the effectiveness of the multi-task learning strategy evaluating mBART and T5, comparing its results with other Large Language Models using Zero-Shot Learning as a lower bound and an ensemble based on the mode of several Fine-Tuning as an upper bound. The results achieved by the Multi-Task Learning strategy demonstrated its potential to increase model versatility, allowing a single model to effectively tackle multiple tasks while achieving competitive results, particularly in target group recognition. However, the ensemble learning slightly outperforms the Multi-Task Learning strategy.

Browsing by Subject "Natural language processing"

Results Per Page

Sort Options