Search programme

​Use the search function to search amongst programmes at Chalmers. The programme overview and the programme syllabus relating to your studies are generally from the academic year you began your studies.

​​​

Syllabus for

Academic year
EDA121 - Fault-tolerant computer systems
 
Owner: DCMAS
4,0 Credits (ECTS 6)
Grading: TH - Five, Four, Three, Not passed
Level: A
Department: 37 - COMPUTER SCIENCE AND ENGINEERING


Teaching language: English

Course module   Credit distribution   Examination dates
Sp1 Sp2 Sp3 Sp4 No Sp
0103 Examination 3,0 c Grading: TH   3,0 c   26 Oct 2006 am M,  13 Jan 2007 pm V,  27 Aug 2007 am M
0203 Laboratory 1,0 c Grading: UG   1,0 c    

In programs

DCMAS MSc PROGR IN DEPENDABLE COMPUTER SYSTEMS, Year 1 (compulsory)
TELTA ELECTRICAL ENGINEERING, Year 4 (elective)
TDATA COMPUTER SCIENCE AND ENGINEERING - Computer security, Year 4 (elective)
TDATA COMPUTER SCIENCE AND ENGINEERING - Engineering of Computer-Based Systems, Year 4 (elective)
TDATA COMPUTER SCIENCE AND ENGINEERING - Embedded computer systems engineering, Year 4 (elective)
TTFYA ENGINEERING PHYSICS, Year 4 (elective)
TAUTA AUTOMATION AND MECHATRONICS ENGENEERING, Year 4 (elective)
TITEA SOFTWARE ENGINEERING, Year 4 (elective)
TITEA SOFTWARE ENGINEERING, Year 3 (elective)

Examiner:

Professor  Johan Karlsson


Replaces

EDA120   Dependable distributed and embedded systems


Eligibility:

For single subject courses within Chalmers programmes the same eligibility requirements apply, as to the programme(s) that the course is part of.

Course specific prerequisites

No formal requirements, but the participants are expected to have basic knowledge in computer engineering, programming and probability theory.

Aim

Fault-tolerant systems are used in applications that require high dependability, such as safety-critical control systems in vehicles and airplanes, or business-critical systems for e-commerce, automatic teller machines and financial transactions. This is an introductory course that covers basic techniques for design and analysis of fault-tolerant systems, as well as project management and development processes for safety-critical systems.

Goal

After the course the student shall be able to:

  • Formulate requirements for fault-tolerant computer systems used in business, safety and mission critical applications.
  • Design system architectures for fault-tolerant computer systems from a given requirements specification.
  • Perform probabilistic dependability analysis of fault-tolerant computer system using fault-trees, reliability block diagrams and time-continous Markov chains.
  • Describe the principles and properties of techniques used for error detection, error recovery and errror masking in computer systems.
  • Master the terminology of dependable computing and describe the major elements of relevant standards.

Content

The course covers techniques for tolerating hardware and software faults, analysis of fault-tolerant systems, project management and development processes for safety-critical systems.
The content can be divided into five areas:
1. Terminology and definitions: Includes terms such as dependability, reliability, maintainability, availability and safety, taxonomies for dependable systems, fault models, etc.
2. Design techniques for error detection and fault-tolerance: Fault-tolerance is achieved by introducing redundancy in the design. Various redundancy configurations are described. Hardware redundancy: triple modular redundancy (TMR), active redundancy, hot and cold standby systems, hybrid redundancy, etc. Software redundancy: N-version programming, recovery blocks. Information redundancy: error correcting codes and self-checking circuits. Time redundancy: Methods for detecting and tolerating transient and permanent faults. Fault-tolerance in distributed systems: time-triggered systems, forward recovery, backward recovery, checkpointing, domino effect, byzantine failures, etc.
3. Analysis of fault-tolerant system: Reliability block diagrams, fault-trees, markov chain models, failure mode and effects analysis (FMEA), failure rate prediction for integrated circuits, fault injection, etc. Includes a laboratory class in which markov chain models are used to analyse a fault-tolerant system. The analysis is done using a special computer program.
4. Project management and development processes: Competence models, process models, resource balancing, risk analysis, safety case, the IEC 61508 standard, etc.
5. System examples: Fault-tolerant systems from areas such as space, aviation, automotive, telecommunication and transaction processing are described, some by guest lectures from industry.

Organisation

Lectures, exercises and one laboratory class.

Literature

Neil Storey, Safety-Critical Computer Systems, Prentice Hall, ISBN 0-201-42787-7. Compendium, reprints of articles, compendium of exercises.

Examination

Written exam. Compulsory laboratory class.


Page manager Published: Thu 03 Nov 2022.