Natural Language Processing, IASD


This course is an introduction to Natural Language Processing (with deep-learning methods). The lab sessions use the pytorch (python module).

News

  • The course starts the 11th of Hanuary: 8h30 in B205 at Dauphine

The resources / drive

Look at this drive for the slides and the material of lab sessions.

python and Notebooks: how to

We will use python 3 (plus pytorch library) and notebooks. If you need to work with own computer, there are 2 ways:

  • install anaconda 3 on your computer: see this page.
  • use colab with a google account (the easiest, nothing todo)

To use files stored on your google drive you can add in your colab notebook:

from google.colab import drive

drive.mount('/content/gdrive')
# in my drive, I have a directory "Colab Notebooks"
# the dataset is uploaded there
root_path = 'gdrive/My Drive/Colab Notebooks/'

If you are not familiar with python notebooks, see this page.

Expected schedules

It starts in january 2020 (the 16th). The course are scheduled on Tuesday, starting at 8:30 in the morning.

11-jan, course: NLP, overview and the main tasks

18-jan, course: Text classification

25-jan, course: sequence models

1 or 8-feb, course on Advanced models

  • The end of recurrent model, with LSTM
  • Bi-LSTM, Attention
  • Transformer

Some readings:

  • ELMO paper
  • ULMFit paper
  • You can also look at the BERT paper and Transformer paper, even I found them not very easy to read.

15-feb, course on Representation learning and contrastive estimation

by Matthieu Labeau

22-feb, course on Syntax !

by Benoit Crabbé

8-march, course on model probing

by Guillaume Wisniewski

15-march

Lab sessions

Not organized yet !

Two notebooks for two parts (see the drive)

  • pytorch 101
  • text classification

Further work: text classification with convolution

Evaluation

The evaluation is in two parts. For both, first make your team (typically 3 students).

Reading

The goal is to read an article an to make a presentation (the 27-feb). A list will be availble soon, but you can also propose one (I must agree beforehand). Select one article per team to read and analyse the paper to make a clear and synthetic presentation. Some questions you may use to guide your reading are (among others):

  • Did you like the paper? Did you find it interesting? Be honest!
  • What are the most important things you learned from the paper? Why are they important?
  • Do the lessons learned generalize beyond the specific task? Do they contribute towards building an important system or application?
  • Is the experimental setup satisfying? Any experiments missing? Any obvious or important baseline missing?
  • Is the problem/approach well motivated?
  • Are you convinced by the results? Why?
  • Is the writing clear? Is the paper well structured?

The important dates are :

  • Make up your team and select the paper before the 1-February
  • Report due date: 25th of March

Project

The list is available on the drive, but you can also propose one (I must agree beforehand).

  • Team and the project registration : before 1-feb
  • Deliverable for the 15th of February: 2 pages (pdf only) to describe the data, the task and your plan
  • Deliverable for 8th of March: a github/gitlab repository
  • Final deliverable: a report in pdf and the code via the git repos
  • Final deadline: 15th of April

Feel free to use the teams channel to interact with me or with the other groups.