Natural Language Processing, IASD
This course is an introduction to Natural Language Processing (with deep-learning methods). The lab sessions use the pytorch (python module).
News
- The course starts the 11th of Hanuary: 8h30 in B205 at Dauphine
The resources / drive
Look at this drive for the slides and the material of lab sessions.
python and Notebooks: how to
We will use python 3 (plus pytorch library) and notebooks. If you need to work with own computer, there are 2 ways:
- install anaconda 3 on your computer: see this page.
- use colab with a google account (the easiest, nothing todo)
To use files stored on your google drive you can add in your colab notebook:
from google.colab import drive drive.mount('/content/gdrive') # in my drive, I have a directory "Colab Notebooks" # the dataset is uploaded there root_path = 'gdrive/My Drive/Colab Notebooks/'
If you are not familiar with python notebooks, see this page.
Expected schedules
It starts in january 2020 (the 16th). The course are scheduled on Tuesday, starting at 8:30 in the morning.
11-jan, course: NLP, overview and the main tasks
- For the linguistic part you can refer to https://faculty.washington.edu/ebender/100things-sem_prag.html
- For the NLP basics: https://web.stanford.edu/~jurafsky/slp3/
18-jan, course: Text classification
The basics and a first NNet with W2V
Further and essential readings:
- Natural Language Processing (Almost) from Scratch
- A Neural Probabilistic Language Model
- A Primer on Neural Network Models for Natural Language Processing
For word2vec and fasttext:
25-jan, course: sequence models
- Language modelling
- ngram language model
- recurrent model
Some interesting readings:
1 or 8-feb, course on Advanced models
- The end of recurrent model, with LSTM
- Bi-LSTM, Attention
- Transformer
Some readings:
- ELMO paper
- ULMFit paper
- You can also look at the BERT paper and Transformer paper, even I found them not very easy to read.
15-feb, course on Representation learning and contrastive estimation
by Matthieu Labeau
22-feb, course on Syntax !
by Benoit Crabbé
8-march, course on model probing
by Guillaume Wisniewski
15-march
Lab sessions
Not organized yet !
Two notebooks for two parts (see the drive)
- pytorch 101
- text classification
Further work: text classification with convolution
Evaluation
The evaluation is in two parts. For both, first make your team (typically 3 students).
Reading
The goal is to read an article an to make a presentation (the 27-feb). A list will be availble soon, but you can also propose one (I must agree beforehand). Select one article per team to read and analyse the paper to make a clear and synthetic presentation. Some questions you may use to guide your reading are (among others):
- Did you like the paper? Did you find it interesting? Be honest!
- What are the most important things you learned from the paper? Why are they important?
- Do the lessons learned generalize beyond the specific task? Do they contribute towards building an important system or application?
- Is the experimental setup satisfying? Any experiments missing? Any obvious or important baseline missing?
- Is the problem/approach well motivated?
- Are you convinced by the results? Why?
- Is the writing clear? Is the paper well structured?
The important dates are :
- Make up your team and select the paper before the 1-February
- Report due date: 25th of March
Project
The list is available on the drive, but you can also propose one (I must agree beforehand).
- Team and the project registration : before 1-feb
- Deliverable for the 15th of February: 2 pages (pdf only) to describe the data, the task and your plan
- Deliverable for 8th of March: a github/gitlab repository
- Final deliverable: a report in pdf and the code via the git repos
- Final deadline: 15th of April
Feel free to use the teams channel to interact with me or with the other groups.