Search programme

​Use the search function to search amongst programmes at Chalmers. The study programme and the study programme syllabus relating to your studies are generally from the academic year you began your studies.

Syllabus for

Academic year
DAT470 - Computational techniques for large-scale data  
Beräkningsmetoder för storskaliga data
 
Syllabus adopted 2021-02-17 by Head of Programme (or corresponding)
Owner: MPDSC
7,5 Credits
Grading: TH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail
Education cycle: Second-cycle
Major subject: Computer Science and Engineering, Software Engineering
Department: 37 - COMPUTER SCIENCE AND ENGINEERING


Teaching language: English
Application code: 87121
Open for exchange students: No
Block schedule: A
Maximum participants: 100
Status, available places (updated regularly): Yes

Module   Credit distribution   Examination dates
Sp1 Sp2 Sp3 Sp4 Summer course No Sp
0121 Written and oral assignments 4,5c Grading: UG   4,5c    
0221 Examination 3,0c Grading: TH   3,0c    

In programs

MPALG COMPUTER SCIENCE - ALGORITHMS, LANGUAGES AND LOGIC, MSC PROGR, Year 1 (elective)
MPDSC DATA SCIENCE AND AI, MSC PROGR, Year 1 (compulsory elective)

Examiner:

Alexander Schliep

  Go to Course Homepage


Course specific prerequisites

To be eligible to the course, the student should have a Bachelor's degree in any subject, or have successfully completed 90 credits of studies in computer science, software engineering, or equivalent. Specifically, at least 15 credits of successfully completed courses in programming, including at least 7.5 credits in Python programming, or equivalent are required. The student needs to have successfully completed a course in probability theory or statistics, for example MVE051, TMS137 or similar.

The course cannot be included in a degree which contains DAT345 or DAT346. Neither can the course be included in a degree which is based on another degree in which the course DAT345 or DAT346 is included.

Aim

The advent of big-data has led to the development of new programming paradigms, in particular for parallel systems allowing the computation with big data on redundant clusters of commodity computers. This course provides an introduction to different programming paradigms, e.g. MapReduce and extensions, which facilitate computations with Terabytes of data. It also demonstrates that for specific tasks algorithms and data structures can provide highly efficient alternatives.   

Learning outcomes (after completion of the course the student should be able to)

After completion of the course the student should be able to:

Learning objectives
  • discuss important technological aspects when designing and implementing analysis solutions for large-scale data,
  • explain differences between parallel programming models
  • describe data structures and algorithms for big data and discuss their utility

Skills and abilities
  • implement applications for transforming and analyzing large-scale data with different parallel software frameworks,
  • use algorithms and datastructures for computations with large-scale data

Judgement ability and approach
  • suggest appropriate computational infrastructures and methodological approaches for analysis tasks and discuss their advantages and drawbacks,
  • discuss advantages and drawbacks of different strategies of parallelization,
  • decide between algorithmic and parallelization-based approaches for accelerating computational workloads


Content

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

In particular, the course will include:
  • an overview of computer architectures, algorithmic approaches, and high- performance computing infrastructures with a focus on limitations for processing large-scale data,
  • an introduction to relevant frameworks for cluster computing with large-scale data,
  • implementation of data analysis tools on a cluster using Python and appropriate software frameworks,
  • data structures and algorithms, such as index structures, which can greatly accelerate computations with large-scale data

Organisation

Lectures, computer lab sessions, and exercise sessions.

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course. 

Examination including compulsory elements

The course is examined by a written hall examination, as well as mandatory written assignments, some of which will be carried out individually and others will be carried out in groups of normally 2-4 students.
There will be non-obligatory individual assignments which grant bonus points for the written examination. These bonus points are valid for the two next scheduled re- examinations.


The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers on educational support due to disability.


Page manager Published: Mon 28 Nov 2016.