Search course

Use the search function to find more information about the study programmes and courses available at Chalmers. When there is a course homepage, a house symbol is shown that leads to this page.

Graduate courses

Departments' graduate courses for PhD-students.

​​​​
​​

Syllabus for

Academic year
EDA122 - Fault-tolerant computer systems
 
Syllabus adopted 2015-02-11 by Head of Programme (or corresponding)
Owner: MPCSN
7,5 Credits
Grading: TH - Five, Four, Three, Not passed
Education cycle: Second-cycle
Major subject: Computer Science and Engineering, Information Technology
Department: 37 - COMPUTER SCIENCE AND ENGINEERING


Teaching language: English
Open for exchange students
Block schedule: C

Course module   Credit distribution   Examination dates
Sp1 Sp2 Sp3 Sp4 Summer course No Sp
0107 Examination 6,0 c Grading: TH   6,0 c   27 Oct 2015 am M,  07 Jan 2016 pm M,  18 Aug 2016 pm M
0207 Laboratory 1,5 c Grading: UG   1,5 c    

In programs

MPSOF SOFTWARE ENGINEERING, MSC PROGR, Year 2 (elective)
MPEES EMBEDDED ELECTRONIC SYSTEM DESIGN, MSC PROGR, Year 1 (compulsory elective)
MPEES EMBEDDED ELECTRONIC SYSTEM DESIGN, MSC PROGR, Year 2 (elective)
MPSYS SYSTEMS, CONTROL AND MECHATRONICS, MSC PROGR, Year 2 (elective)
MPCSN COMPUTER SYSTEMS AND NETWORKS, MSC PROGR, Year 1 (compulsory)
TKITE SOFTWARE ENGINEERING, Year 3 (elective)

Examiner:

Professor  Johan Karlsson


Replaces

DAT270   Dependable computer systems EDA120   Dependable distributed and embedded systems EDA121   Fault-tolerant computer systems


Eligibility:


In order to be eligible for a second cycle course the applicant needs to fulfil the general and specific entry requirements of the programme that owns the course. (If the second cycle course is owned by a first cycle programme, second cycle entry requirements apply.)
Exemption from the eligibility requirement: Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling these requirements.

Course specific prerequisites

Students are expected to have basic knowledge in computer organization, programming and probability theory.

Aim

The course gives an introduction to dependable computing with an emphasis on system-level design of fault-tolerant systems. Dependability and fault tolerance are becoming increasingly important in a wide range of computer applications. Examples include safety-critical control systems for road vehicles, airplanes and medical devices, and business-critical systems for e-commerce, financial transactions and factory automation.

Learning outcomes (after completion of the course the student should be able to)

* Formulate dependability requirements for computer systems used in business-, safety- and mission-critical applications. (Learning goal ensured by written exam, lab classes and laboratory report) * Describe the structure and principles of commonly used system architectures for fault-tolerant computers. (Written exam, lab classes and laboratory report) * Perform probabilistic dependability analysis of computer systems using fault-trees, reliability block diagrams, time-continuous Markov chains and stochastic Petri nets. (Written exam, lab classes and laboratory report) * Describe principles and properties of techniques for error detection, error masking and system recovery. (Written exam) * Master the terminology of dependable computing. (Written exam and laboratory report) * Describe basic concepts in life-cycle models and standards employed in the development of safety-critical systems. (Written exam) * Write a technical report of good quality on the topic of dependability analysis of fault-tolerant computer systems. (Laboratory report)

Content

The course deals with design and analysis of fault-tolerant computer systems.
The content can be divided into five areas:

  1. Terminology and definitions: Includes terms such as dependability, reliability, maintainability, availability and safety, taxonomies for dependable systems, fault and failure models, etc.
  2. Design techniques for error detection and fault tolerance. Hardware redundancy: triple modular redundancy (TMR), dual modular redundancy (DMR), hot and cold standby systems, hybrid redundancy, forward and backward recovery, etc. Software redundancy: N-version programming, recovery blocks and run-time assertions. Time redundancy: Methods for detecting and tolerating transient faults. Fault-tolerance in distributed systems: time-triggered systems, byzantine failures.
  3. Dependability analysis of computer systems: Reliability block diagrams, fault-trees, Markov chain models, failure mode and effects analysis (FMEA), fault tree analysis (FTA), etc. Includes two laboratory classes in which Markov chain models and stochastic Petri nets are used to analyze fault-tolerant systems. The analysis is done using a special computer program. Students are required to document the results of one laboratory class in a technical report.
  4. Development processes: lifecycle models, hazard analysis, risk analysis, safety case, the IEC 61508 and the ISO26262 standards, etc.
  5. System examples: Fault-tolerant systems from areas such as space, aviation, automotive, telecommunication and transaction processing are described, some by guest lecturers from industry.

Organisation

Lectures, exercises and two laboratory classes.

Literature

Neil Storey, Safety-Critical Computer Systems, Prentice Hall, ISBN 0-201-42787-7. Compendium, reprints of articles, compendium of exercises.

Examination

Written exam. Compulsary participation in two laboratory classes. One laboratory report.


Page manager Published: Thu 04 Feb 2021.