MULTIYEAR PROGRAM
CONFERENCE

Meeting in Mathematical Statistics 
Rencontres de Statistique Mathématique

New challenges in high-dimensional statistics

16 – 20 December 2024

Scientific Committee & Organizing Committee
Comité scientifique & Comité d’organisation

Olga Klopp (ESSEC, CREST)
Mohamed Ndaoud (ESSEC)
Christophe Pouet  (École Centrale de Marseille)
Alexander Rakhlin (MIT)

We plan to dedicate the 2023 – 2025 series of conferences to challenges and emerging topics in the area of mathematical statistics driven by the adventure of artificial intelligence. Tremendous progress has been made in building up powerful machine learning algorithms such as random forests, gradient boosting or neural networks. These models are exceptionally complex and difficult to interpret but offer enormous opportunities in many areas of application going from science, public policies to business. These sophisticated algorithms are often called “black boxes” as they are very hard to analyze. The widespread use of such predictive algorithms raises extremely important questions of replicability, reliability, robustness or privacy protection. The proposed series of conferences is dedicated to new statistical methods built around these black-box algorithms that leverage their power but at the same time guarantee their replicability and reliability.

The second conference of the program will highlight recent theoretical advances in inference for high-dimensional statistical models based on the interplay of techniques from mathematical statistics, machine learning and theoretical computer science.
The importance of high-dimensional statistics is due to the increasing dimensionality and complexity of models needed to process and understand modern data. Meaningful inference about such models is possible assuming suitable lower dimensional underlying structure or low-dimensional approximations, for which the error can be reasonably controlled. Examples of such structures include sparse high dimensional regression, low rank matrix models, dictionary learning, network models, latent variable models, topic models, and others.

Nous prévoyons de consacrer la série de conférences 2023 à 2025 aux défis et aux sujets émergents dans le domaine des statistiques mathématiques motivés par les avancées récentes en intelligence artificielle. D’énormes progrès ont été réalisés dans la conception d’algorithmes d’apprentissage statistique puissants tels que les forêts aléatoires ou les réseaux de neurones. Ces modèles sont exceptionnellement complexes et difficiles à interpréter mais offrent d’énormes opportunités dans de nombreux domaines d’application allant de la science aux politiques publiques en passant par le business. Ces algorithmes sophistiqués sont souvent appelés « boîtes noires » car ils sont très difficiles à analyser. L’utilisation généralisée de tels algorithmes prédictifs soulève des questions extrêmement importantes de réplicabilité, de fiabilité, de robustesse ou de protection de la vie privée. La série de conférences proposée est dédiée aux nouvelles méthodes statistiques construites autour de ces algorithmes, dit boîte noire, qui exploitent leur puissance tout en garantissant leur réplicabilité ainsi que leur fiabilité.

La deuxième conférence du programme mettra en lumière les récents progrès théoriques en matière d’inférence pour des modèles statistiques de grande dimension basés sur l’interaction de techniques issues des statistiques mathématiques, de l’apprentissage automatique ainsi que de l’informatique.
L’importance des statistiques en grande dimension est due à la dimensionnalité et à la complexité croissantes des modèles nécessaires au traitement et à la compréhension des données modernes. Une inférence significative sur de tels modèles est possible en supposant une structure sous-jacente de dimension inférieure appropriée ou des approximations de faible dimension, pour lesquelles l’erreur peut être raisonnablement contrôlée. Des exemples de telles structures incluent la régression parcimonieuse, les modèles matriciels de faible rang, l’apprentissage de dictionnaires, les modèles de réseau, les modèles à variables latentes, les modèles thématiques parmi d’autres.

TUTORIALS

 

Johannes Schmidt-Hieber (University of Twente) : Statistical theory for biologically inspired learning 

« Compared to artificial neural networks (ANNs), the brain seems to learn faster, generalize better to new situations and consumes much less energy. ANNs are motivated by the functioning of the brain but differ in several crucial aspects. While ANNs are deterministic, biological neural networks (BNNs) are stochastic. Moreover, it is biologically implausible that the learning of the brain is based on gradient descent. In the past years, statistical theory for artificial neural networks has been developed. The idea now is to extend this to biological neural networks, as the future of AI is likely to draw even more inspiration from biology. In this lecture series we will survey the challenges and present some first statistical risk bounds for different biologically inspired learning rules. »

Rina Foygel Barber (University of Chicago) : An introduction to conformal prediction and distribution-free inference

« This two-part tutorial will introduce the framework of conformal prediction, and will provide an overview of both theoretical foundations and practical methodologies in this field. In the first part of the tutorial, we will cover methods including holdout set methods, full conformal prediction, cross-validation based methods, calibration procedures, and more, with emphasis on how these methods can be adapted to a range of settings to achieve robust uncertainty quantification without compromising on accuracy. In the second part, we will cover some recent extensions that allow the methodology to be applied in broader settings, such as weighted conformal prediction, localized methods, online conformal prediction, and outlier detection. »

SPEAKERS

Pierre Alquier (ESSEC Business School Asia-Pacific):  Optimistic Estimation of Convergence in Markov Chains with the Average Mixing Time
Yannick Baraud (Université du Luxembourg):  Robust Bayesian Statistics 
Rina Barber (University of Chicago):  An introduction to conformal prediction and distribution-free inference – Part I / Part II
Claire Boyer (Sorbonne Université):  Attention layers provably solve single-location regression
Johannes Brutsche (University of Freiburg):  The level of self-organized criticality in oscillating Brownian motion: stable limiting distribution theory for the MLE
Florentina Bunea (Cornell University):   Learning large softmax mixtures with warm start EM
Alexandra Carpentier (University of Potsdam):   A simple algorithm for noisy, convex and zeroth order optimisation
Ismaël Castillo (Sorbonne Université):  Trade-offs for multiple testing and classification of sparse vectors 
Ilias Diakonikolas (University of Wisconsin-Madison):  SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications 
Bertrand Even (Université Paris-Saclay):  Computation-information gap in high dimension clustering
Chao Gao (University of Chicago):  Are adaptive robust confidence intervals possible? 
Athanasios Georgiadis (Trinity College Dublin):  Density estimation on metric spaces
Mikolaj Kasprzak (ESSEC Business School Paris Cergy):  A Fourier Representation of Kernel Stein Discrepancy with Application to Goodness-of-Fit Tests for Measures on Infinite Dimensional Hilbert Spaces 
Alexey Naumov (HSE University):  Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Stochastic Approximation
Maxim Panov (Mohamed bin Zayed University of Artificial Intelligence):  Conformal inference?  
Marianna Pensky (University of Central Florida):  Davis-Kahan Theorem in the 2-to-infinity norm and its application to perfect clustering
Mark Podolskij (University of Luxembourg):  Recent advances in high dimensional estimation of diffusion models
Bradley Rava (University of Sydney Business School):  Ask for more than bayes optimal: A theory of indecisions for classification 
Vincent Rivoirard (Université Paris Dauphine-PSL):  PCA for point processes
Angelika Rohde (University of Freiburg): Nonparametric maximum likelihood estimation in binary regression models under weak feature impact
Johannes Schmidt-hieber (University of Twente):  Statistical theory for biologically inspired learning – Part I / Part II
Subhabrata Sen (Harvard University):  Causal effect estimation under inference using mean field methods
Zong Shang (ENSAE):  A Geometrical Analysis of Kernel Ridge Regression and its Applications
Vladimir Spokoiny (WIAS):  Inference for nonlinear inverse problems
Bernhard Stankewitz (University of Potsdam):  Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression
Pragya Sur (Harvard University):  Generalization error of min-norm interpolators in transfer learning 
Marten Wegkamp (Cornell University):  Linear Discriminant Regularized Regression
Yuhao Wang ((Tsinghua University):   Minimax estimation of functionals in sparse vector model with correlated observations
Sven Wang (Humboldt University of Berlin):  M-estimation and statistical learning of neural operators
Yihong Wu (Yale University):  Minimax estimation of functionals in sparse vector model with correlated observations

SPONSORS