Scientific Committee & Organizing Committee
Comité scientifique & Comité d’organisation
Olga Klopp (ESSEC, CREST)
Mohamed Ndaoud (ESSEC)
Christophe Pouet (École Centrale de Marseille)
Alexander Rakhlin (MIT)

IMPORTANT WARNING: Scam / Phishing / SMiShing ! Note that ill-intentioned people may be trying to contact some of participants by email or phone to get money and personal details, by pretending to be part of the staff of our conference center (CIRM). CIRM and the organizers will NEVER contact you by phone on this issue and will NEVER ask you to pay for accommodation/ board / possible registration fee in advance. Any due payment will be taken onsite at CIRM during your stay.
We plan to dedicate the 2023 – 2025 series of conferences to challenges and emerging topics in the area of mathematical statistics driven by the adventure of artificial intelligence. Tremendous progress has been made in building up powerful machine learning algorithms such as random forests, gradient boosting or neural networks. These models are exceptionally complex and difficult to interpret but offer enormous opportunities in many areas of application going from science, public policies to business. These sophisticated algorithms are often called “black boxes” as they are very hard to analyze. The widespread use of such predictive algorithms raises extremely important questions of replicability, reliability, robustness or privacy protection. The proposed series of conferences is dedicated to new statistical methods built around these black-box algorithms that leverage their power but at the same time guarantee their replicability and reliability.
As for the first conference of the cycle, the objective of the last conference is to bring to Luminy theoretical computer scientists and mathematical statisticians to exchange around the topics of carefully quantifying decision-making algorithms and causal inference. These topics are of extreme importance in modern data science and we strongly believe that bringing together the two communities will advance the field and bring us closer to solving some of the key challenges such as causal inference in machine learning, selective inference, post hoc analysis and bridging decision-making and estimation.
Nous prévoyons de consacrer la série de conférences 2023 à 2025 aux défis et aux sujets émergents dans le domaine des statistiques mathématiques motivés par les avancées récentes en intelligence artificielle. D’énormes progrès ont été réalisés dans la conception d’algorithmes d’apprentissage statistique puissants tels que les forêts aléatoires ou les réseaux de neurones. Ces modèles sont exceptionnellement complexes et difficiles à interpréter mais offrent d’énormes opportunités dans de nombreux domaines d’application allant de la science aux politiques publiques en passant par le business. Ces algorithmes sophistiqués sont souvent appelés « boîtes noires » car ils sont très difficiles à analyser. L’utilisation généralisée de tels algorithmes prédictifs soulève des questions extrêmement importantes de réplicabilité, de fiabilité, de robustesse ou de protection de la vie privée. La série de conférences proposée est dédiée aux nouvelles méthodes statistiques construites autour de ces algorithmes, dit boîte noire, qui exploitent leur puissance tout en garantissant leur réplicabilité ainsi que leur fiabilité.
Comme pour la première conférence du cycle, l’objectif de la dernière conférence est d’amener à Luminy les informaticiens théoriciens et les statisticiens mathématiciens à échanger autour des thèmes de la quantification minutieuse des algorithmes de prise de décision et de l’inférence causale. Ces sujets sont d’une extrême importance dans la science des données moderne et nous croyons fermement que la réunion des deux communautés fera progresser le domaine et nous rapprochera de la résolution de certains défis clés tels que l’inférence causale dans l’apprentissage automatique, l’inférence sélective, l’analyse post hoc et le rapprochement de la prise de décision et de l’estimation.
TUTORIALS
Unsupervised learning is a central challenge in artificial intelligence, lying at the intersection of statistics and machine learning. The goal is to uncover patterns in unlabelled data by designing learning algorithms that are both computationally efficient—that is, run in polynomial time—and statistically effective, meaning they minimize a relevant error criterion.
Over the past decade, significant progress has been made in understanding statistical–computational trade-offs: for certain canonical « vanilla » problems, it is now widely believed that no algorithm can achieve both statistical optimality and computational efficiency. However, somewhat surprisingly, many extensions of these widely accepted conjectures to slightly modified models have recently been proven false. These variations introduce additional structure that can be exploited to bypass the presumed limitations.
In these talks, I will begin by presenting a vanilla problem for which a statistical computational trade-off is strongly conjectured. I will then discuss a specific class of more complex unsupervised learning problems—namely, ranking problems—in which extensions of the standard conjectures have been refuted, and I will aim to explain the underlying reasons why.
High-dimensional high-order/tensor data refers to data organized in the form of large-scale arrays spanning three or more dimensions, which becomes increasingly prevalent across various fields, including biology, medicine, psychology, education, and machine learning.
In biomedical research, for example, longitudinal microbiome studies collect samples across individuals and time points to quantify microbial abundance across hundreds or thousands of taxa. In neuroscience, imaging techniques such as MRI, fMRI, and EEG generate inherently multi-dimensional data capturing spatial and temporal neural activity, naturally forming tensor structures.
These high-dimensional, high-order datasets present unique statistical and computational challenges. Classical matrix-based methods often fail to extend naturally to tensors, and naive approaches like vectorization or matricization can discard important structural information, resulting in suboptimal analyses. Moreover, fundamental operations—such as computing singular values, eigenvalues, or norms—become NP-hard in the tensor setting, highlighting the need for new algorithmic and theoretical frameworks.
From a methodological perspective, essential tasks like dimension reduction, regression, classification, and clustering require significant adaptation. These challenges give rise to fundamental statistical-computational trade-offs not present in lower-order settings.
This tutorial will provide a comprehensive overview of recent advances in tensor-based statistical methods, theoretical foundations, and applications.