Publications

Export 13 results:
Ordenar por: Autor Tipo [ Ano  (Desc)]
2024
HILGEMBERG, JOÃOOTÁVIO, Andreta I, MARIANI ALEXANDREBONADIMAN, NEIMAIER ALISSON, Valk M, BITTARELLO FERNANDO, HILGEMBERG RAFAELA, LEHNEN CHEILAROBERTA.  2024.  Decision trees as a tool for selecting sows in commercial herds. SCIENTIA AGRICOLA. 81Website
2023
de Oliveira, MJK, Valk M, Melo ADB, Marçal DA, Silva CA, da Valini GAC, Arnaut PR, Gonçalves JPR, Andretta I, Hauschild L.  2023.  Feeding Behavior of Finishing Pigs under Diurnal Cyclic Heat Stress. Animals. 13(5):908.Website
Cintra, RF, Valk M, Filho DM.  2023.  A model-free-based control chart for batch process using U-statistics. Journal of Process Control. Website
Bello, DZ, Valk M, Cybis GB.  2023.  Towards U-statistics clustering inference for multiple groups. Journal of Statistical Computation and Simulation. :1-19.Website
2022
de Oliveira, BN, Valk M, Filho DM.  2022.  Fault detection and diagnosis of batch process dynamics using ARMA-based control charts. Journal of Process Control. 111:46-58. AbstractWebsite

A wide range of approaches for batch processes monitoring can be found in the literature. This kind of process generates a very peculiar data structure, in which successive measurements of many process variables in each batch run are available. Traditional approaches do not take into account the time series nature of the data. The main reason is that the time series inference theory is not based on replications of time series, as it is in batch process data. It is based on the variability in a time domain. This fact demands some adaptations of this theory in order to accommodate the model coefficient estimates, considering jointly the batch to batch samples variability (batch domain) and the serial correlation in each batch (time domain). In order to address this issue, this paper proposes a new approach grounded in a group of control charts based on the classical ARMA model for monitoring and diagnostic of batch processes dynamics. The model coefficients are estimated (through the ordinary least square method) for each historical time series sample batch and modified Hotelling and t-Student distributions are derived and used to accommodate those estimates. A group of control charts based on that distributions are proposed for monitoring the new batches. Additionally, those groups of charts help to fault diagnosis, identifying the source of disturbances. Through simulated and real data we show that this approach seems to work well for both purposes.

Fraga, AZ, Hauschild L, Campos PHRF, Valk M, Bello DZ, Kipper M, Andretta I.  2022.  Genetic selection modulates feeding behavior of group-housed pigs exposed to daily cyclic high ambient temperatures. Plos One. OnlineWebsite
2021
Gomes, BCK, Andretta I, Valk M, Pomar C, Hauschild L, Fraga AZ, Kipper M, Trevizan L, Remus A.  2021.  Prandial Correlations and Structure of the Ingestive Behavior of Pigs in Precision Feeding Programs. Animals. 11(10) AbstractWebsite

The feeding behavior of growing-finishing pigs was analyzed to study prandial correlations and the probability of starting a new feeding event. The data were collected in real-time based on 157,632 visits by a group of 70 growing-finishing pigs (from 30.4 to 115.5 kg body weight, BW) to automatic feeders. The data were collected over 84 days, during which period the pigs were kept in conventional (by phase and by group) or precision (with daily and individual adjustments) feeding programs. A criterion to delimit each meal was then defined based on the probability of an animal starting a new feeding event within the next minute since the last visit. Prandial correlations were established between meal size and interval before meal (pre-prandial) or interval after meal (post-prandial) using Pearson correlation analysis. Post-prandial correlations (which can be interpreted as hunger-regulating mechanisms) were slightly stronger than pre-prandial correlations (which can be interpreted as satiety regulation mechanisms). Both correlations decreased as the animals’ age increased but were little influenced by the feeding programs. The information generated in this study allows a better understanding of pigs’ feeding behavior regulation mechanisms and could be used in the future to improve precision feeding programs.

Valk, M, Cybis GB.  2021.  U-statistical inference for hierarchical clustering. Journal of Computational and Graphical Statistics. 30(1) Abstractwebsite

Clustering methods are valuable tools for the identifcation of patterns in high dimensional data with applications in many scientifc felds. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop a U-statistics based clustering approach that assesses statistical signifcance in clustering and is specifcally tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of dataset for which the Euclidean distance captures relevant features. Our main result is the development of a hierarchical signifcance clustering method. In order to do so, we first introduce an extension of a relevant U-statistic and develop its asymptotic theory. Additionally, as a preliminary step, we propose a binary non-nested signifcance clustering method and show its optimality in terms of expected values. Our approach is tested through multiple simulations and found to have more statistical power than competing alternatives in all scenarios considered. They are further showcased in three applications ranging from genetics to image recognition problems.

2020
Marcondes, DF, Valk M.  2020.  Dynamic Var Model-Based Control Charts for Batch Process Monitoring. European Journal of Operational Research (EJOR). 285(1):296-305. AbstractWebsite

In the field of Statistical Process Control (SPC) there are several different approaches to deal with monitoring of batch processes. Such processes present a three-way data structure (batches × variables × time-instants), so that for each batch a multivariate time series is available. Traditional approaches do not take into account the time series nature of the data. They deal with this kind of data by applying multivariate techniques in a reduced two-way data structure, in order to capture variables dynamics in some way. Recent developments in SPC have proposed the use of the Vector Autoregressive (VAR) time series model considering the original three-way structure. However, they are restricted to control approaches focused on VAR residuals. This paper proposes a new approach to deal with batch processes using the VAR model, but focusing on coefficients instead of residuals. Through a simulated batch process, we illustrate the better performance of our approach over the residual-based control charts in both offline and online context.

2019
Pumi, G, Valk M, Bisognin C, Bayer FM, Prass TS.  2019.  Beta autoregressive fractionally integrated moving average models. Journal of Statistical Planning and Inference. 200:196-212. AbstractWebsite

In this work we introduce the class of beta autoregressive fractionally integrated moving average models for continuous random variables taking values in the continuous unit interval (0,1). The proposed model accommodates a set of regressors and a long-range dependent time series structure. We derive the partial likelihood estimator for the parameters of the proposed model, obtain the associated score vector and Fisher information matrix. We also prove the consistency and asymptotic normality of the estimator under mild conditions. Hypotheses testing, diagnostic tools and forecasting are also proposed. A Monte Carlo simulation is considered to evaluate the finite sample performance of the partial likelihood estimators and to study some of the proposed tests. An empirical application is also presented and discussed.

2018
Cybis, GB, Valk M, Lopes SRC.  2018.  Clustering and classification problems in genetics through U -statistics. Journal of Statistical Simulation and Computation. 88(10):1882-1902. AbstractWebsite

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a versatile U-statistics-based approach for non-parametric clustering that allows for an unconventional way of solving these problems. In this paper we propose a statistical test to assess group homogeneity taking into account multiple testing issues and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. We present Monte Carlo simulations that evaluate size and power of the proposed tests under different scenarios. Finally, the methodology is applied to three different genetic data sets: global human genetic diversity, breast tumour gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions in the high dimension low sample size scenario while adapting to the specificities of the different datatypes.

2013
Valk, M, Mesquita DR.  2013.  Clustering Correlated Time Series via Quasi U-Statistics, 18 July. 29th European Meeting of Statisticians. , Budapest Abstract

Discrimination and classification time series becomes almost indispensable since the large amountof information available nowadays. The problem of time series discrimination and classification isdiscussed in [1]. In this work the authors propose a novel clustering algorithm based on a class ofquasi U-statistics and subgroup decomposition tests. The decomposition may be applied to any con-cave time-series distance. The resulting test statistics is proved to be asymptotically normal for eitheri.i.d. or non-identically distributed groups of time-series under mild conditions. In practice thereare many time series that are correlated among themselves. An example that can describe this fact isthe financial markets globalization. When one of these markets is affected by an exogenous factor, achain reaction can affect many others. So the independence condition fail.We are interested in analyzing how the correlation among the groups of time series can affectclassification and clustering methods especially the one proposed by [1]. Empirical results show thatthe proposed method is robust to the presence of correlation among time series. The convergence ofthe test statistic for dependent time series will be one of the goals in this work.

References[1]

Valk, M., Pinheiro, A. 2012: Time-series clustering via quasi U-statistics,J. Time Ser. Anal.,Vol. 33, 4, 608 -619.

2012
Valk, M, Pinheiro AS.  2012.  Time-series clustering via quasi U-statistics . Journal of Time Series Analysis . 33:608-619. AbstractWebsite

The problem of time‐series discrimination and classification is discussed. We propose a novel clustering algorithm based on a class of quasi U‐statistics and subgroup decomposition tests. The decomposition may be applied to any concave time‐series distance. The resulting test statistics are proven to be asymptotically normal for either i.i.d. or non‐identically distributed groups of time‐series under mild conditions. We illustrate its empirical performance on a simulation study and a real data analysis. The simulation setup includes stationary vs. stationary and stationary vs. non‐stationary cases. The performance of the proposed method is favourably compared with some of the most common clustering measures available.