Search programme

​Use the search function to search amongst programmes at Chalmers. The study programme and the study programme syllabus relating to your studies are generally from the academic year you began your studies.

Syllabus for

Academic year
DAT450 - Machine learning for natural language processing
Maskininlärning för språkteknologi
 
Syllabus adopted 2021-02-26 by Head of Programme (or corresponding)
Owner: MPDSC
7,5 Credits
Grading: TH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail
Education cycle: Second-cycle
Main field of study: Software Engineering
Department: 37 - COMPUTER SCIENCE AND ENGINEERING


Course round 1


Teaching language: English
Application code: 87113
Open for exchange students: No
Block schedule: A
Minimum participants: 10
Maximum participants: 50

Module   Credit distribution   Examination dates
Sp1 Sp2 Sp3 Sp4 Summer course No Sp
0120 Written and oral assignments 7,5 c Grading: TH   7,5 c    

In programs

MPDSC DATA SCIENCE AND AI, MSC PROGR, Year 2 (elective)
MPDSC DATA SCIENCE AND AI, MSC PROGR, Year 1 (compulsory elective)

Examiner:

Richard Johansson

  Go to Course Homepage


Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

The course requires at least 7.5 credits of programming, 7.5 credits of probability theory or statistics, and a first course in machine learning, such as DAT340, TDA233, SSY340 or MVE440.

Aim

The course gives an introduction to machine learning models and architectures used in modern natural language processing (NLP) systems.

Learning outcomes (after completion of the course the student should be able to)

Knowledge and understanding:
  • describe the fundamentals of storing textual data for the world's languages,
  • describe the most common types of natural language processing tasks,
  • describe the most common types of machine learning models used in modern natural language processing,
  • explain how text data can be annotated for a natural language processing task where machine learning techniques are used.
Competence and skills:
  • apply software libraries using machine learning for common natural processing tasks,
  • write the code to implement some machine learning models for natural language processing,
  • apply evaluation methods to assess the quality of natural language processing systems.
Judgement and approach:
  • discuss the advantages and limitations of different machine learning models with respect to a given task in natural language processing,
  • reason about what type of data could be useful when training a model for a given natural language processing task,
  • select the appropriate evaluation methodology for a natural language processing system and motivate this choice,
  • reason about ethical questions pertaining to machine learning based natural language processing systems, such as stereotypes and under-representation.

Content

Rapid developments in machine learning have revolutionized the field of NLP, including for commerically important applications such as translation, summarization, and information extraction. However, natural language data exhibit a number of peculiarities that make them more challenging to work with than many other types of data commonly encountered in machine learning: natural language is discrete, structured, and highly ambiguous. It is extremely diverse: not only are there thousands of languages in the world, but in each language there is substantial variation in style and genre. Furthermore, many of the phenomena encountered in language follow long-tail statistical distributions, which makes the production of training data more costly. For these reasons, machine learning architectures for NLP applications tend to be quite different from those used in other fields.

The course covers the following broad areas:
  • Working practically with text data, including fundamental tasks such as tokenization and word counting;
  • probabilistic models for text, such as topic models;
  • overview of the most common types of NLP applications;
  • architectures for representation in NLP models, including word embeddings, convolutional and recurrent neural network, and attention models;
  • machine learning models for common types of NLP problems, mainly categorization, sequence labeling, structured prediction and generation;
  • approaches to transfer learning in NLP.

Organisation

Lectures and computer labs

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course. 

Examination including compulsory elements

The course is examined by mandatory written assignments submitted as written reports, as well as a self-defined project that requires the submission of a written report and an oral presentation. Some of the assignments will be carried out individually and others in groups of normally 2-4 students. The project is conducted by 2-4 students.
A late submission of the assignments or project results in the grade Fail (U), unless special reasons exist. A failed assignment or project will be given the opportunity to submit a new solution on subsequent occasions the course is given.
A passing grade for the entire course requires at least a passing grade for all assignments and the project.
To be awarded a higher passing grade for the entire course, the student must, in addition, have a higher average on the weighted grades on the assignments and the project.


The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers on educational support due to disability.


Page manager Published: Mon 28 Nov 2016.