ORCID
Ameer A. Mousa: https://orcid.org/0009-0007-1537-6528
Mahdi S. Almhanna: https://orcid.org/0000-0002-3144-5358
Article Type
Original Study
Abstract
Fault tolerance is a critical requirement in distributed systems, as node and network failures can cause significant data loss, service disruption, and performance degradation. Traditional fault-tolerance methods often provide partial solutions, relying on separate recovery models for errors and delayed responses, while also incurring high resource costs. This paper aims to design and implement a more robust and efficient fault-tolerant architecture based on the concept of a shared checkpoint replica. In the proposed system, a master server and multiple slave servers collaboratively process client requests, share checkpoint replicas, and ensure seamless recovery in case of failures. To evaluate the approach, a distributed text extraction system from images was developed as a testbed. The primary objectives are to minimize fault recovery time, reduce network resource consumption, and maintain request integrity even in the presence of failures. Experimental results show that the shared checkpoint replica mechanism significantly outperforms conventional checkpoint (9 seconds recovery) and replica fault tolerance (5 seconds recovery) methods, achieving a recovery time of only 1.5 seconds. Furthermore, it delivers a throughput of 28.5 requests per second with an average latency of 3.5 milliseconds, meeting the research goal of faster and more reliable distributed processing.
Keywords
Fault tolerance, Checkpoint, Replica, Distributed system
How to Cite This Article
Mousa, Ameer A. and Almhanna, Mahdi S.
(2025)
"Enhancing Fault Tolerance in Distributed Systems Using Shared Checkpoint Replica Mechanisms,"
Journal of Intelligent Informatics, Networking, and Cybersecurity: Vol. 1:
Iss.
2, Article 1.
Available at:
https://jiinc.uobabylon.edu.iq/journal/vol1/iss2/1
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.