The VII AMMCS International Conference
Waterloo, Ontario, Canada | August 17-21, 2026
AMMCS 2026 Plenary Talk
Recent advances in clustering microbiome data
Sanjeena Dang (Carleton University)
The human microbiome plays a crucial role in health and disease. Advances in next-generation sequencing technologies have made it possible to quantify microbiome composition with high resolution. Clustering microbiome data can uncover meaningful patterns across samples, offering insights into biological variability and disease mechanisms. However, this task presents several challenges. Microbiome data are typically high-dimensional, over-dispersed, and compositional, the latter meaning they reflect relative, not absolute, abundances. As such, analyzing such compositional data presents many challenges because they are restricted to a simplex, which complicates standard statistical analysis. Various mixture model-based approaches have been proposed previously. These models assume that the population is a finite mixture of subpopulations (or clusters), each characterized by a probability distribution. My research team has focused on developing and refining such models to improve clustering accuracy and interpretability. We will present recent contributions from our group, emphasizing the strengths and limitations of different component-specific distributions used within mixture models. For example, Dirichlet-multinomial (DM) mixture models are computationally efficient but often fail to capture the complex correlation structures present in microbiome data. In contrast, models based on the logistic normal multinomial (LNM) distribution provide greater flexibility in capturing correlations but are computationally intensive. To overcome these limitations, we have developed a computationally efficient framework for parameter estimation in LNM-based models using variational Gaussian approximations. This approach significantly reduces the computational burden and enables application to large-scale datasets. Some other recent and ongoing developments using extensions of the LNM distribution and others to cluster microbiome data will be discussed.
Dr. Sanjeena Dang (Subedi) is an Associate Professor and Canada Research Chair in Data Science and Analytics at the School of Mathematics and Statistics at Carleton University. She has developed an innovative interdisciplinary research program that has coalesced around developing highly efficient and scalable statistical models for clustering various types of biological datasets. Dr. Dang has also taken active leadership roles in the statistics and bioinformatics communities. She is currently the President of the Classification Society, Treasurer for the International Federation of Classification Societies, and Associate Editor for several statistics and bioinformatics journals. Her past notable roles include the President of the Business and Industrial section of the Statistical Society of Canada (SSC), member of the Board of Directors of the Classification Society, guest editor for the Fields Institute Communications Series on Data Science and Optimization, and the chair of the organizing committee for the Canadian Statistical Sciences Institute (CANSSI)'s Distinguished Lecture Series in Statistical Sciences.