Highly motivated individual studying statistical data science within the Confirm Smart Manufacturing Centre and the Mathematics Application Consortium for Science and Industry (MACSI) in the University of Limerick. My research involves creating state-of-the-art methodology using modern statistical techniques for optimizing manufacturing processes. My work includes the development of novel variable selection methods, which can be used to gain meaningful insights into the key drivers of a process. I have had a leading role in several high-impact interdisciplinary projects, where I have strengthened my communication, teamwork and problem-solving skills while working closely with industry experts.
Modern variable selection procedures make use of penalization methods to execute simultaneous model selection and estimation. A popular method is the LASSO (least absolute shrinkage and selection operator), the use of which requires selecting the value of a tuning parameter. This parameter is typically tuned by minimizing the cross-validation error or Bayesian information criterion (BIC) but this can be computationally intensive as it involves fitting an array of different models and selecting the best one. In contrast with this standard approach, we have developed a procedure based on the so-called “smooth IC” (SIC) in which the tuning parameter is automatically selected in one step. We also extend this model selection procedure to the distributional regression framework, which is more flexible than classical regression modelling. Distributional regression, also known as multiparameter regression (MPR), introduces flexibility by taking account of the effect of covariates through multiple distributional parameters simultaneously, e.g., mean and variance. These models are useful in the context of normal linear regression when the process under study exhibits heteroscedastic behaviour. Reformulating the distributional regression estimation problem in terms of penalized likelihood enables us to take advantage of the close relationship between model selection criteria and penalization. Utilizing the SIC is computationally advantageous, as it obviates the issue of having to choose multiple tuning parameters.
Datasets with extreme observations and/or heavy-tailed error distributions are commonly encountered and should be analyzed with careful consideration of these features from a statistical perspective. Small deviations from an assumed model, such as the presence of outliers, can cause classical regression procedures to break down, potentially leading to unreliable inferences. Other distributional deviations, such as heteroscedasticity, can be handled by going beyond the mean and modelling the scale parameter in terms of covariates. We propose a method that accounts for heavy tails and heteroscedasticity through the use of a generalized normal distribution (GND). The GND contains a kurtosis-characterizing shape parameter that moves the model smoothly between the normal distribution and the heavier-tailed Laplace distribution - thus covering both classical and robust regression. A key component of statistical inference is determining the set of covariates that influence the response variable. While correctly accounting for kurtosis and heteroscedasticity is crucial to this endeavour, a procedure for variable selection is still required. For this purpose, we use a novel penalized estimation procedure that avoids the typical computationally demanding grid search for tuning parameters. This is particularly valuable in the distributional regression setting where the location and scale parameters depend on covariates, since the standard approach would have multiple tuning parameters (one for each distributional parameter). We achieve this by using a “smooth information criterion” that can be optimized directly, where the tuning parameters are fixed at log(n) in the BIC case.
Process visualizations of data from manufacturing execution systems (MESs) provide the ability to generate valuable insights for improved decision-making. Industry 4.0 is awakening a digital transformation where advanced analytics and visualizations are critical. Exploiting MESs with data-driven strategies can have a major impact on business outcomes. The advantages of employing process visualizations are demonstrated through an application to real-world data. Visualizations, such as dashboards, enable the user to examine the performance of a production line at a high level. Furthermore, the addition of interactivity facilitates the user to customize the data they want to observe. Evidence of process variability between shifts and days of the week can be investigated with the goal of optimizing production.