Meeting in Mathematical Statistics
Rencontres de Statistique Mathématique

Robustness and computational efficiency of algorithms in statistical learning

14 – 18 December 2020

Scientific Committee & Organizing Committee
Comité scientifique & Comité d’organisation

Cristina Butucea  (CREST, ENSAE, Université Paris-Est Marne-la-Vallée)
Stanislav Minsker (University of Southern California)
Christophe Pouet  (Ecole Centrale de Marseille)
Vladimir Spokoiny (Humboldt University of Berlin)

Noisy and corrupted measurements, commonly encountered as a result of unsupervised, automated data collection process, require new algorithms that produce reliable outcomes in this challenging framework. Discussing the influence of outliers on statistical procedures, P. Huber  observed that « …the naturally occurring deviations from the idealized model are large enough to render meaningless the traditional asymptotic optimality theory. » It is known that heuristic outlier removal procedures are bias-producing and lack rigorous justification (or require strong assumptions), as it is sometimes impossible to determine if an extreme observation appears due to an error, or is a feature of the data-generating mechanism. This motivates the study of robust estimators in the context of statistical learning theory.

Our goal is to encourage collaboration and knowledge sharing between theoretical computer scientists and mathematical statisticians by bringing them together at Luminy. Both communities possess unique visions, skills and expertise, we hope that merging these strength will advance the field and bring us closer to solving some of the key challenges.

Sara van de Geer (ETH Zürich)  Adaptive rates for trend FILltering using dual certiFICates

Sara van de Geer (ETH Zürich)  Adaptive rates for trend filltering using dual certificates

File Size: 142 kb
File Type: pdf

Download File

Chao Gao (University of Chicago) Statistical Optimality and Algorithms for Top-K and Total Ranking

Chao Gao (University of Chicago) Statistical Optimality and Algorithms for Top-K and Total Ranking

Abstract: In this short course, we will discuss the problem of ranking with partially observed pairwise comparisons in the setting of Bradley-Terry-Luce (BTL) model. There are two fundamental problems: 1) top-K ranking, which is to select the set of K players with top performances; 2) total ranking, which is to rank the entire set of players. Both ranking problems find important applications in web search, recommender systems and sports competition.

In the first presentation, we will consider the top-K ranking problem. The statistical properties of two popular algorithms, MLE and rank centrality (spectral ranking) will be precisely characterized. In terms of both partial and exact recovery, the MLE achieves optimality with matching lower bounds. The spectral method is shown to be generally sub-optimal, though it has the same order of sample complexity as the MLE. Our theory also reveals the essentially only situation when the spectral method is optimal. This turns out to be the most favorable choice of skill parameter given the separation of the two groups.

The second presentation will be focused on total ranking. The problem is to find a permutation vector to rank the entire set of players. We will show that the minimax rate of the problem with respect to the Kendall’s tau loss exhibits a transition between an exponential rate and a polynomial rate depending on the signal to noise ratio of the problem. The optimal algorithm consists of two stages. In the first stage, games with very high or low scores are used to partition the entire set of players into different leagues. In the second stage, games that are very close are used to rank the players within each league. We will give intuition and some analysis to show why the algorithm works optimally.


Felix Abramovich (Tel Aviv University)    High-dimensional classification by sparse logistic regression
Pierre Bellec (Rutgers University)   Out-of-sample error estimate for robust M-estimators with convex penalty
Clément Berenfeld (Université Paris-Dauphine)    Density estimation on manifolds
Thomas Berrett (University of Warwick)   Locally private non-asymptotic testing of distributions is faster using interactive mechanisms
Natalia Bochkina (University of Edinburgh)   Bernstein-von Mises theorem for the scale hyperparameter in inverse problems with a Gaussian prior
Alexandra Carpentier (OvGU Magdeburg)  Several structured thresholding bandit problem
Julien Chhor (CREST-ENSAE)  Goodness-of-fit testing for multinomials and densities: sharp local minimax rates
Fabienne Comte (Université Paris Descartes)   Nonparametric estimation for i.i.d. Gaussian continuous time moving average models
Alexis Derumigny (University of Twente)  On lower bounds for the bias-variance trade-off
Motonobu Kanagawa (EURECOM)   On the connections and equivalences between Gaussian processes and kernel methods in nonparametric regression
Avetik Karagulyan (ENSAE/CREST)   Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets
Clément Marteau (Université Lyon 1)    SuperMix: Sparse regularization for mixtures
Mohamed Simo Ndaoud (University of Southern California)    Robust and efficient mean estimation: approach based on the properties of self-normalized sums
Tuan-Binh Nguyen (Universite Paris-Sud (Paris-Saclay)  Aggregation of Multiple Knockoffs
Marianna Pensky (University of Central Florida)   Statistical Inference in Popularity Adjusted Stochastic Block Model
Vianney Perchet (ENSAE & Criteo AI Lab)   Robustness of Community Detection to Random Geometric Perturbations
Philippe Rigollet (MIT)    Minimax Coreset Density Estimation
Vincent Rivoirard (Université Paris-Dauphine)    Nonparametric Bayesian inference for Hawkes processes
Angelika Rohde (A. Ludwigs University Freiburg)  Interactive versus non-interactive locally, differentially private estimation: Two elbows for the quadratic functional
Richard Samworth (University of Cambridge)    Adaptive transfer learning
Johannes Schmidt-Hieber (University of Twente)   The Kolmogorov-Arnold theorem revisited
Suzanne Sigalla (CREST-ENSAE)   Improved clustering algorithms for the Bipartite Stochastic Block Model
Zoltan Szabo (Ecole Polytechnique)   Kernel Machines with Hard Shape Constraints
Nicolas Verzelen (INRAE Montpellier)  Optimal Change-Point Detection and Localization
Nikita Zhivotovsky (Google Research)  Robust k-means clustering for distributions with two bounded moments