**WORKSHOP**

*Mathematical foundations of machine learning*

*Fondements mathématiques de l’apprentissage automatique*

*Fondements mathématiques de l’apprentissage automatique*

#### 16 July, 2024

#### Coordinators

This is the second workshop on the Mathematical Foundations of Machine Learning (MFML). It will also focus on some recent interactions between mathematics and machine Learning. Research topics of this second edition include theory of deep learning, harmonic analysis, Koopman operators, compression schemes, sparsity and domain adaptation.

For more information about the first MFML workshop, please refer to https://www.i2m.univ-amu.fr/seminaires_signal_apprentissage/Conf/Mai2023.

*Il s’agit du deuxième atelier sur les fondements mathématiques de l’apprentissage automatique (MFML). Il se concentrera également sur certaines interactions récentes entre les mathématiques et l’apprentissage automatique. Les sujets de recherche de cette deuxième édition incluent la théorie de l’apprentissage profond, l’analyse harmonique, les opérateurs de Koopman, les schémas de compression, la sparsité et l’adaptation au domaine.*

*Pour plus d’informations sur le premier atelier MFML, veuillez consulter https://www.i2m.univ-amu.fr/seminaires_signal_apprentissage/Conf/Mai2023.*

#### TALKS

Jérémie Chalopin (CNRS, LIS, AMU) Unlabeled sample compression schemes for maximum classes

Caroline Chaux (CNRS, IPAL) Unrolled networks in signal and image processing

Yuka Hashimoto (NTT / RIKEN) Koopman-based generalization bound for neural networks

Ryuichiro Hataya (RIKEN) Automatic Domain Adaptation by Transformers in In-Context Learning

Mimoun Mohamed (LIS/I2M) Straight-Through Meets Sparse Recovery: the Support Exploration Algorithm

Sho Sonoda (RIKEN) Deep Ridgelet Transform: Harmonic Analysis for Deep Neural Networks

#### PROGRAMME

**13h30 – 14h05 : Sho Sonoda (RIKEN) Deep Ridgelet Transform: Harmonic Analysis for Deep Neural Networks**

The ridgelet transform has been developed to study neural network parameters, and it can describe the distribution of parameters. Mathematically, it is defined as a pseudo-inverse operator of neural networks. Namely, given a function $f$, and network $NN[\gamma]$ with parameter $\gamma$, the ridgelet transform $R[f]$ for the network $NN$ satisfies the reconstruction formula $NN[R[f]]=f$. For depth-2 fully-connected networks on a Euclidean space, the ridgelet transform has been discovered up to the closed-form expression, thus we could describe how the parameters are distributed. However, for a variety of modern neural network architectures, the closed-form expression has not been known. In this talk, I will introduce a systematic method to induce the generalized neural networks and their corresponding ridgelet transforms from group equivariant functions, and present an application to deep neural networks.

**14h05 – 14h40 : Yuka Hashimoto (NTT & RIKEN)** *Koopman-based generalization bound for neural networks*

Understanding the generalization property has been one of the biggest topics for analyzing neural networks. In this talk, we propose a new bound for generalization of neural networks using Koopman operators. Whereas most of existing works focus on low-rank weight matrices, we focus on full-rank weight matrices. Indeed, as supported by several existing empirical results, low-rankness is not the only reason for generalization. Our bound does not contradict the existing bounds but is a complement to the existing bounds. Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small. Furthermore, our bound can be combined with the existing bounds to obtain a tighter bound. Our Koopman-based analysis gives a new perspective for understanding generalization of neural networks with full-rank weight matrices, and it provides a connection between operator-theoretic analysis and generalization of neural networks.

**14h40 – 15h15 : Jérémie Chalopin (LIS, CNRS, AMU)** *Unlabeled sample compression schemes for maximum classes*

We examine connections between combinatorial notions that arise in machine learning and topological notions in cubical/simplicial geometry. These connections enable to export results from geometry to machine learning. Our first main result is based on a geometric construction by Tracy Hall (2004) that allows us to derive a maximum class of VC dimension 3 that has no corners. This construction shows that all previous constructions of optimal unlabeled sample compression schemes for maximum classes are erroneous.

On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes. We leave as open whether our unlabeled sample compression scheme extends to ample (a.k.a. lopsided or extremal) classes, which represent a natural and far-reaching generalization of maximum classes. Towards resolving this question, we provide a geometric characterization in terms of unique sink orientations of the 1-skeletons of associated cubical complexes.

Joint work with Victor Chepoi, Shay Moran and Manfred Warmuth

**15h15 – 15h45 : Pause Café**

**15h45 – 16h20 : Mimoun Mohamed (LIS/I2M) Straight-Through Meets Sparse Recovery: the Support Exploration Algorithm**

The straight-through estimator (STE) is commonly used to optimize quantized neural networks, yet its contexts of effective performance are still unclear despite empirical successes. To make a step forward in this comprehension, we apply STE to a well-understood problem: sparse support recovery. We introduce the Support Exploration Algorithm (SEA), a novel algorithm promoting sparsity, and we analyze its performance in support recovery (a.k.a. model selection) problems. SEA explores more supports than the state-of-the-art, leading to superior performance in experiments, especially when the columns of A are strongly coherent. The theoretical analysis considers recovery guarantees when the linear measurements matrix A satisfies the Restricted Isometry Property (RIP). The sufficient conditions of recovery are comparable but more stringent than those of the state-of-the-art in sparse support recovery. Their significance lies mainly in their applicability to an instance of the STE.

https://arxiv.org/pdf/2301.13584

**16h20 – 16h55 : Ryuichiro Hataya (RIKEN) Automatic Domain Adaptation by Transformers in In-Context Learning**

Selecting or designing an appropriate domain adaptation algorithm for a given problem remains challenging. This paper presents a Transformer model that can provably approximate and opt for domain adaptation methods for a given dataset in the in-context learning framework, where a foundation model performs new tasks without updating its parameters at test time. Specifically, we prove that Transformers can approximate instance-based and feature-based unsupervised domain adaptation algorithms and automatically select an algorithm suited for a given dataset. Numerical results indicate that in-context learning demonstrates an adaptive domain adaptation surpassing existing methods.

**16h55 – 17h30 : Caroline Chaux (CNRS, IPAL) Unrolled networks in signal and image processing**

In this talk, we will be interested in inverse problems arising in the signal and image processing field. Solving such problems imply in a fist time to formalise the direct problem by understanding the physics behind and in a second time, to solve the associated inverse problem, through a variational formulation, that is, solving an optimization problem. Such issues are encountered in many areas such as biology, medical imaging, chemistry, audio signal processing, … for which, different tasks have to be tackled such as deconvolution, restoration, unmixing, missing data reconstruction, … Classical optimization-based approaches consist in, once the optimization problem has been formulated, proposing iterative procedures (e.g. proximal algorithms) converging to a solution of the considered inverse problem. More recently, unrolled or unfolded neural networks have been proposed. They combine optimization and learning, constitute interpretable networks and integrate information about the direct model. We will study and describe such networks for the resolution of two inverse problems: image deconvolution and robust PCA.

This work has been done in collaboration with Vincent Tan, Emmanuel Soubiès, Pascal Nguyen and Elisabeth Tan.