Publications

Export 5 results:
Ordenar por: Autor Tipo [ Ano  (Desc)]
2023
Cintra, RF, Valk M, Filho DM.  2023.  A model-free-based control chart for batch process using U-statistics. Journal of Process Control. Website
Bello, DZ, Valk M, Cybis GB.  2023.  Towards U-statistics clustering inference for multiple groups. Journal of Statistical Computation and Simulation. :1-19.Website
2022
Fraga, AZ, Hauschild L, Campos PHRF, Valk M, Bello DZ, Kipper M, Andretta I.  2022.  Genetic selection modulates feeding behavior of group-housed pigs exposed to daily cyclic high ambient temperatures. Plos One. OnlineWebsite
2021
Valk, M, Cybis GB.  2021.  U-statistical inference for hierarchical clustering. Journal of Computational and Graphical Statistics. 30(1) Abstractwebsite

Clustering methods are valuable tools for the identifcation of patterns in high dimensional data with applications in many scientifc felds. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop a U-statistics based clustering approach that assesses statistical signifcance in clustering and is specifcally tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of dataset for which the Euclidean distance captures relevant features. Our main result is the development of a hierarchical signifcance clustering method. In order to do so, we first introduce an extension of a relevant U-statistic and develop its asymptotic theory. Additionally, as a preliminary step, we propose a binary non-nested signifcance clustering method and show its optimality in terms of expected values. Our approach is tested through multiple simulations and found to have more statistical power than competing alternatives in all scenarios considered. They are further showcased in three applications ranging from genetics to image recognition problems.

2018
Cybis, GB, Valk M, Lopes SRC.  2018.  Clustering and classification problems in genetics through U -statistics. Journal of Statistical Simulation and Computation. 88(10):1882-1902. AbstractWebsite

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a versatile U-statistics-based approach for non-parametric clustering that allows for an unconventional way of solving these problems. In this paper we propose a statistical test to assess group homogeneity taking into account multiple testing issues and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. We present Monte Carlo simulations that evaluate size and power of the proposed tests under different scenarios. Finally, the methodology is applied to three different genetic data sets: global human genetic diversity, breast tumour gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions in the high dimension low sample size scenario while adapting to the specificities of the different datatypes.