Skip site navigation (1) Skip section navigation (2)

PLEASE NOTE: This project is dead. The website is kept online for historical reasons.

Peripheral Links

Contact

Header And Logo

The world's most advanced open source database.

Site Navigation

Terms and Definitions for Database Replication (1)

These terms and definitions should help talking about replication. I've compiled them after lots of discussions, mostly with other Postgres developers, but the definitions are supposed to be universally usable and should apply to other RDBMSes as well.

Replication

For database systems, replication is the process of sharing transactional data to ensure consistency between redundant database nodes. This improves fault tolerance, which leads to better reliability of the overall system. Replications systems for databases can also be called distributed databases, especially when combined with other nice features described below. I tend to only count eager multi-master systems as distributed databases, because everything less hurts ACID principles and isn't transparent to the application.

Load Balancing

Replication is often coupled with load balancing to improve read performance. Some replication solutions have an integrated load balancer, which knows about the underlying system. Others rely on OS dependent or third party load balancers.

Replication Methods

Databases can be kept coherent in many different ways. A very common and simple approach is statement based replication, where SQL statements are distributed between the nodes. Non-deterministic functions, like now() or random(), pose some problems for that method. Another very common method is log shipping. Unfortunately the database system's log is often not meant as an exchange format for replication and thus it's hard to do replication of only parts of a database. Thus some replication solutions have their own binary format which is specifically designed for replication.

Divergence

Keeping data coherent across multiple nodes is quite expensive in terms of network latency. Thus many systems try to avoid network delays by allowing the nodes to diverge slightly, meaning they allow conflicting transactions to commit. To revert to a coherent and consistent database, those conflicts need to be resolved, either automatically or manually. Such conflicts violate the ACID property of the database system, so the application needs to be aware of that.