Título del conjunto de datos:Dataset used for the study entitled: Goal Patterns and Situational Dynamics in Professional Futsal: A Multivariate Analysis with a Predictive Approach Información del autor: Nombre:DIEGO HERNÁN VILLAREJO GARCÍA Institución:Universidad de Murcia Correo electrónico:dvillarejo@um.es ORCID:https://orcid.org/0000-0001-6149-9253 Fecha de recopilación de datos (fecha única o rango de fechas): [2024-04-21] Fecha de depósito: [2025-07-07] Idioma:Español/inglés INFORMACIÓN METODOLÓGICA Procedure: The data were recorded, observed, coded, and analyzed following match analysis methodology (Sarmento et al., 2015). The goals were recorded by the Royal Spanish Football Federation and uploaded to its official website, publicly available at http://www.youtube.com/c/federaciónespañolafutbol. Only those goals that met quality requirements and presented uninterrupted sequences without commercials or production cuts were selected. The goals were observed on a 32-inch television screen. The data were collected by the technical staff (n=3) of a top-tier professional futsal team, all with over 20 years of experience in coaching and performance analysis. The three members of the technical staff analyzed the goals jointly. Any discrepancies in judgment were resolved by consensus. To ensure the reliability of the observations and data recording, the researchers conducted an intra-observer reliability test. This reliability test consisted of re-analyzing the goals from one matchday after a 10-day interval from the initial analysis. Intra-observer reliability was above 0.98 for all variables under study. Data collection was carried out using the Longomatch 8.1 software (2016). The data were transferred to a spreadsheet by the researchers and subsequently imported into SPSS version 29.0 for statistical analysis. Since the study is based on official performance data publicly accessible from elite competitions, individual informed consent from players or teams was not required. Data access and processing were conducted for academic research purposes and adhered to regulations for the use of publicly available sports data. The Ethics Committee of the University of Murcia deemed the study exempt from formal review given the public nature of the data. Statistical Analysis: All statistical analyses were performed using SPSS version 29.0 (IBM Corporation, 2022). To analyze the relationship between the independent variable (type of goal scored) and the four dependent variables (Match Outcome, Score, Ranking, and Stage), a descriptive analysis was conducted, calculating frequencies and percentages. Subsequently, the chi-square test and Cramer’s V coefficient were used to evaluate the relationship and bivariate association between variables. For all statistical tests, a significance level of p < 0.05 was applied. Values of Cramer’s V were interpreted to assess the strength of associations and were classified as weak (< 0.1), moderate (< 0.3), or strong (≥ 0.3). Subsequently, a correspondence analysis was performed to graphically explore the relationships between the categories of the independent variable and the categories of the dependent variables. For this correspondence analysis, and initially, the same frequency table used in the univariate analysis was employed, with the data normalized so that each cell became a proportion relative to the grand total of the table (𝑓𝑖𝑗/𝑁), where 𝑓𝑖𝑗 represents the frequency in the cell and 𝑁 is the total number of observations. Next, the total inertia was calculated as a measure of the deviation between the observed and expected frequencies under the hypothesis of independence, using the following formula: where 𝑒𝑖𝑗=𝑓𝑖 𝑓⋅𝑗 represents the expected frequency under the assumption of independence. Subsequently, the total inertia was decomposed using a Principal Component Analysis (PCA) with the aim of identifying the underlying dimensions that explain variability in the data. The PCA allowed for the decomposition of the profile matrix, adjusted according to the relative contributions of the rows and columns, into a set of orthogonal axes (dimensions) that maximize the dispersion of profiles in multidimensional space. Each of these axes is associated with an eigenvalue, representing the proportion of total inertia explained by that specific dimension. This made it possible to calculate the coordinates of each category based on its relative contribution to inertia, which were used for graphical representation. Finally, for graphical representation, the first two axes were selected, as they accounted for the largest percentage of the total variability. They were plotted in a two-dimensional graph, where each point represented a category. To assess the quality of the model, the percentage of inertia explained was verified, considering a value greater than 50% as a good representation of the data. Finally, four multinomial logistic regression models were used to analyze the effect of the independent variable on each of the dependent variables. The dependent variable in the model was 𝑌 (0,1,2). For the variable “match outcome,” the value 0 corresponded to the loser category, value 1 to draw, and value 2 to winner; for the variable “score,” value 0 corresponded to the highly unbalanced category, value 1 to unbalanced, and value 2 to balanced. For the variable “ranking,” value 0 corresponded to the category of teams ranked between 9th and 12th, value 1 to teams ranked between 5th and 8th, and value 2 to teams ranked between 1st and 4th. For the variable “stage,” value 0 corresponded to the first phase, value 1 to the second phase, and value 3 to playoffs. For all four dependent variables, the reference category was category 3. The multinomial logistic regression model with three categories can be expressed as follows: where:p(Y=j) is the probability that the observation belongs to category j; P(Y=3) is the probability of the reference category; X1,X2,…,Xp are the independent variables; βj,0 is the intercept for category j; βj,1; βj,2… are the coefficients of the variables X1,X2,…,Xp for category j; where j=1,2,…,k, excluding the reference category j=3. This nonlinear regression model estimates the regression coefficients, which represent the estimated change in the log-odds corresponding to a one-unit change in the respective explanatory variable, assuming that all other explanatory variables remain constant (Tabachnick et al., 2013). Odds ratios (OR) and their respective 95% confidence intervals (CI) were also calculated. ARCHIVOS Nombre del/los archivo: Statiscal_Analysis_Chi_correspondence.spv Statistical_Analysis_Logistic_Regression.spv Data_general.xlsx Formato de los archivos:.spv / xlsx PALABRAS CLAVE: Correspondence Analysis, Futsal, Goal Types, Multinomial Logistic Regression, Situational Variables INFORMACIÓN DE PATROCINIO E IDENTIFICADORES DE SUBVENCIONES Sin financiación externa a la Universidad PUBLICACIONES RELACIONADAS Publicación relacionada: Conjunto de datos relacionado: LICENCIAS Y PRIVACIDAD Licencias:Creative Commons Privacidad: MÁS INFORMACIÓN [Incluir cualquier otra información sobre el conjunto de datos que no esté reflejada en esta plantilla y que se considere relevante.]