Open Research Positions

Different kind of positions are available. Feel free to contact me if you are interested in or if you have any questions. Moreover, feel free to also get in touch if you want to work with on nice other topics related to my research interests.

1 PhD: Learning Genetic interactions with Deep neural Networks

This is a 3-year position funded by the CNRS it will start next fall (2022). This is a co-surpevised PhD between Philippe Nghe and me. We both work at ESPCI.

The application of AI to biology is leading to revolutionary achievements with Alphafold for protein folding and generative models of artificial enzymes. However, a question remains open at a higher level of organization in cells, crucial for diagnosis and therapeutic strategies: how to predict the impact of multiple gene perturbations on biological functions? The aim of the project is to develop a deep learning approach leveraging on the one hand recent work on deep neural networks in the MILES team of Dauphine [1] and on the other hand experiments and mechanistic models developed at the ESPCI laboratory, showing that interactions between mutations are largely explained by non-linear but smooth functions of intermediate features [2-3]. How can we design a deep-learning model that can include these two steps of decomposition to learn gene interactions?

For this purpose, the PhD will develop three complementary lines of research: (i) build deep nets on published datasets of combined genetic and drug perturbations on cells; (ii) computationally simulate the response of cells based on typical gene network architectures; (iii) apply the developed models to design, predict and interpret the outcome of large scale screening experiments done at ESPCI. The PhD will be co-supervised by Alexandre Allauzen, Professor at University Paris Dauphine and ESPCI (team Machine Intelligence and LEarning Systems) and Philippe Nghe (director of the Laboratory of Biophysics and Evolution, ESPCI Paris-PSL). It is funded by the CNRS. Possible profiles include former experience in deep learning, computational biology, physics, mathematics.

Contacts: alexandre.allauzen [at] dauphine.psl.eu

  • [1] H. Le, L. Vial, J. Frej, V. Segonne, M. Coavoux, B. Lecouteux, A. Allauzen, B. Crabbé, L. Besacier, Didier S. (2020) Flaubert: Unsupervised language model pre-training for French. Proceedings of the 12th Language Resources and Evaluation Conference
  • [2] Kemble, H., Eisenhauer, C., Couce, A., Chapron, A., Magnan, M., Gautier, G., Nghe, P. & Tenaillon, O. (2020). Flux, toxicity, and expression costs generate complex genetic interactions in a metabolic pathway. Science advances, 6(23), eabb2236.
  • [3] Nghe, P., Kogenaru, M., & Tans, S. J. (2018). Sign epistasis caused by hierarchy within signalling cascades. Nature communications, 9(1), 1-9.

2 PhD: Narrative planning for Data2Text generative models

This is a 3-year position funded by the ANR project ACDC and it will start next fall (2022). The whole project is a collaboration between the MLIA at ISIR, the MILES team at LAMSADE, Muséum national d'Histoire naturelle and RECITAL. In addition to this position, another PhD will be hired more or less at the same time but on a different topic (how to learn operators to describe tables and generate text). Both of these positions will be supervised by the academic teams (MLIA and MILES) in a tight collaboration (hopefully all the partners work side by side).

Data2Text (or Data-to-Text) is a new rising task in NLP. The goal is to automatically generate fluent and fact-based descriptions or utterances given data tables. Many applications can be build upon this kind of approach: generation of financial and sports news stories, of product descriptions, the analysis and interpretation of reports, of IoT data, etc. In general, the goal is to generate narrative summaries of complex and structured data sets.

One of the challenges, is the narrative planning. While amazing progress has been achieved in language generation models at the sentence level. The discourse and document level are beyond the ability of the recent models. In this project, we propose to address this challenge along two axis:

  • Design an architecture can generate a generate a narrative from tables by ordering some operators (see the other PhD).
  • How to personalize the text by focussing on some part of the tables or by taking into accounts the users ?

Requirements:

  • Outstanding master's degree (or an equivalent university degree) in computer science or other related disciplines (as e.g. mathematics, information sciences, computer engineering, etc.)
  • Proficiency in NLP and machine learning
  • Fluency in spoken and written English is required
  • Strong background in deep-learning with pytorch

Application: To apply, please email alexandre.allauzen [at] dauphine.psl.eu with:

  • a curriculum vitae
  • a cover letter
  • a research outcome (e.g. master thesis and/or published papers) of the candidate
  • a transcript of grades

3 Master internship 2022

The following topics are proposed for funded internship positions. Most of them can also be extended by a funded PhD position.

  • On the Effectiveness of Explainability Methods for Neural Natural Language Processing
  • Stable and 1-Lipshitz architectures for NLP
  • Data2Text: learning to generate text from table
  • Tranformers for speech processing

If you are interested in, feel free to contact me.

4 Research engineer (3 month open-position) (CLOSED)

We invite applications for a Research engineer position (3 months). This position is funded by the ANR project SPEED (started in fall 2021). The whole project is a collaboration between the LAMSADE and the new LISN Lab in Orsay where many researchers and PhD students work on the same subject. The position will start as soon as possible in January 2022. Contact me if you are interested in.

4.1 Aims and scope of the project

The interaction between machine learning and Physics has recently emerged as a new and important research area. Some illustrations are simulations of complex physical systems with machine learning models, or at the opposite, the introduction of numerical methods in machine learning.

At the interfaces of artificial and Physics, different tracks described below can be explored depending on the skills of the candidate.

4.2 Stability and robustness of deep learning architectures in light of dynamical systems

4.2.1 Context

Within the LAMSADE, we have started to work on the study of stability and robustness of neural network architectures. Deep learning algorithms have shown their vulnerability to adversarial attacks: small and imperceptible perturbations of the inputs that maliciously fools the prediction. Since this discovery, building adversarial attacks and defenses against them have been an active and hot research topic. However, the understanding of this phenomenon remains in the early stages and their is still a gap to come up with to better understand adversarial attacks.

Recent papers have explored the interaction between machine learning and numerical methods, opening new perspectives for many questions. For instance, this paper proposes an interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. This formulation allows us to analyze the stability and well-posedness of deep neural network architectures like ResNet.

4.2.2 Goals

The goal of this 3 months is to develop a new kind of architecture, based on either the recurrent models or the transformers. The goal is to design a more robust and stable architecture. The development relies on pytorch and the experiments will use standard datasets (i.e NLP or image classification) and datasets from Physics.

4.3 Application

Requirements:

  • Outstanding master's degree (or an equivalent university degree) in computer science or another related disciplines (as e.g. mathematics, information sciences, computer engineering, etc.).
  • Proficiency in machine learning, computer vision, or signal processing.
  • Fluency in spoken and written English is required.
  • Strong background in deep-learning with pytorch.

Application: To apply, please email alexandre.allauzen [at] dauphine.psl.eu with:

  • a curriculum vitae
  • a cover letter
  • a research outcome (e.g. master thesis and/or published papers) of the candidate
  • a transcript of grades

5 PhD: Machine Learning and Physics (CLOSED)

We invite applications for a fully-funded PhD position on the topic of "Deep Learning for physical systems modelling". This is a 3-year position funded by the ANR project SPEED and it will start next fall (as soon as possible). The whole project is a collaboration between the IJLRA, the LAMSADE and the new LISN Lab in Orsay where many other students work on the same subject. Frequent scientific discussions and meetings are planed. The position will start as soon as possible in fall 2021. Contact me if you are interested in.

5.1 Aims and scope

The interaction between machine learning and Physics has recently emerged as a new and important research area. Some illustrations are simulations of complex physical systems with machine learning models, or at the opposite, the introduction of numerical methods in machine learning.

At the interfaces of artificial and Physics, different tracks described below can be explored depending on the skills of the candidate.

5.1.1 Noisy, scarce and partial observation

In modern machine learning, the cornerstone is to let the model learn its own representation of the process from data observation. While, for many applications, data are readily available (computer vision, natural language processing, . . . ), some requirements are not met in the case of complex physical systems. Without loss of generality, let us consider the example of a turbulent flow field or the prediction of the sea surface temperature. The corresponding dataset is really small and scarce compared with usual machine learning applications. More importantly, the state cannot be fully observed in many situations and the data acquisition step often introduces noise.

The issues raised by noisy and scarce dataset are not new in the machine learning domain and there is, for instance, a long history of research in the field of generative models and how to represent high dimensional datasets in a compressed mathematical model. However, in the context of Physics, we can leverage some important properties like symmetries and invariances to address these challenges.

5.1.2 Training algorithm to enforce physical properties

In some cases, a mathematical model for the system at hand is available, for instance: dynamical systems such as the Lorentz (63 and 93) attractors, Kuramoto-Sivashinsky and Kardar-Parisi-Zhang. With these case studies, this step includes the important definitions of the physical properties we want to introduce in the machine learning models.

Two approaches can be considered:

  • Physical regularization: the loss function optimized during the training process can be augmented with tailored regularization terms. As an example, optimal transport-based (OT) loss definitions are often more relevant for physical systems featuring significant structure. This will be made computationaly tractable with the convolutional Wasserstein flavor of OT, e.g., see this paper.
  • Adversarial training: the second approach relies on the recent adversarial learning trend to guide the model during the training process toward solutions that exhibit the desired properties. Early efforts are reported in this paper where the solution and the test functions in the weak formulation of high-dimensional linear and nonlinear PDE problems are parameterized as a primal and adversarial networks respectively.

5.1.3 Neural Ordinary Differential Equations

The relationship between neural networks and differential equations has been studied in several recent works Lu et al. (2018); Chen et al. (2018). In particular, the very efficient neural architecture ResNet (or Residual Network) can be interpreted as discretized ordinary differential equations. This kind of architectures leads to a very large number of parameters. Hence, while the idea is really appealing in our context, architectures like ResNet are suitable for applications beyond our scope, where the data availability is not an issue. Pushing the discretization step towards its limit of zero, along with parameters tying, have given rise to a new family of models called Neural Ordinary Differential Equations (or Neural ODEs). In these recent papers and their extension, Dupont et al. (2019), the experimental setup mainly relies on conventional datasets used in image classification (MNIST or CIFAR10). Preliminary work on this new type of neural networks has demonstrated its parameter efficiency for supervised learning task which can be of a great importance in our case.

5.2 Application and contacts

Applications can be sent electronically and should include a cover letter, full CV, and eventually references. A first round of interviews will start in first week of June 2021 so please submit as soon as possible. Feel free to contact us if you have questions on the topic and the position.

Alexandre Allauzen: alexandre.allauzen@dauphine.psl.eu Sergio Chibbaro: sergio.chibbaro@sorbonne-universite.fr

5.3 Some References

"When deep learning meets ergodic theory", M.A Bucci, O.S emeraro, S. Chibbaro, A. Allauzen, L. Mathelin, in https://hal.archives-ouvertes.fr/LIMSI/hal-03101431v1

"Control of chaotic systems by deep reinforcement learning", M.A Bucci, O.S emeraro, S. Chibbaro, A. Allauzen, et al. https://hal.archives-ouvertes.fr/LIMSI/hal-02406677v1

"Hamiltonian Neural Networks", Samuel Greydanus, Misko Dzamba, Jason Yosinski, NeurIPS 2019 proceedings. https://papers.nips.cc/paper/2019/hash/26cd8ecadce0d4efd6cc8a8725cbd1f8-Abstract.html

6 PhD: Stability and robustness of vision Transformers (CLOSED)

This is a 3-year PhD position, funded by Foxstream, a software company (since 2004), specialized in real-time automated processing of video content analysis. The PhD thesis is a collaboration with Dauphine Université (the MILES team of the LAMSADE) with a join supervision (Quentin Barthélemy from Foxstream and Alexandre Allauzen from MILES). The PhD student will be located at Paris-Dauphine University in close relationships with Foxstream.

For a couple of decades, Deep Learning (DL) added a huge boost to the already rapidly developing field of computer vision. While for some kind of data and tasks, DL is the most successful approach, this is not the case for all applications. For instance, the analysis of video streams generated by thermal cameras is still a research challenge because of the long range perimeter, the depth of focus and the associated geometrical issues, along with the frequent calibration change. Therefore, the stability and robustness of DL models must be better characterized and improved.

Very recently, Transformer architectures have achieved state of the art performances in many domains: from natural language processing to computer vision. In this thesis we will explore the use of Tranformers for videos generated by thermal cameras and their properties.

From a theoritical and application perspectives, the goals are to explore the stability of such architectures, the robustness against adversarial examples, and what kind of invariances and symetries can be captured.

Requirements:

  • Outstanding master's degree (or an equivalent university degree) in computer science or another related disciplines (as e.g. mathematics, information sciences, computer engineering, etc.).
  • Proficiency in machine learning, computer vision, or signal processing.
  • Fluency in spoken and written English is required.

Application: To apply, please email alexandre.allauzen [at] dauphine.psl.eu with:

  • a curriculum vitae, with contact of 2 or more referees
  • a cover letter
  • a research outcome (e.g. master thesis and/or published papers) of the candidate
  • a transcript of grades

References:

Caron et al, Emerging Properties in Self-Supervised Vision Transformers, arXiv, 2021 https://arxiv.org/abs/2104.14294

Dosovitskiy et al, An image is worth 16x16 words Transformers for image recognition at scale, arXiv, 2020 https://arxiv.org/abs/2010.11929

Le, Vial, …, Allauzen et al, FlauBERT: Unsupervised Language Model Pre-training for French, LREC http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.302.pdf

7 Master internship 2021 (Closed for this spring and summer)

The following topics are proposed for funded internship positions. Most of them can also be extended by a funded PhD position.