Skip site navigation (1) Skip section navigation (2)

PLEASE NOTE: This project is dead. The website is kept online for historical reasons.

Peripheral Links

Contact

Header And Logo

The world's most advanced open source database.

Site Navigation

Terms and Definitions for Database Replication (3)

Data Partitioning

A common distinction is vertical vs horizontal partitioning. Vertical partitioning splits into multiple tables with fewer columns, much like during the process of normalization. When partitioning horizontaly, the database systems stores different rows in multiple different tables.

In a multi-master system both methods can be used to split data across multiple nodes, so that every node only holds parts of the complete database. This obviously decreases reliability, as fewer nodes store the same tuple. And it requires some sort of distributed querying to be able to reach all data a transaction needs. But it certainly improves total capacity of your cluster and reduces the total load for writing transactions.

Shared-disk vs. Shared-nothing Clusters

A shared-disk cluster describes a bunch of nodes which share a common disk-subsystem, but with otherwise independent hardware. While that term makes sense, "shared-nothing" is often confusing people. It means that the nodes of a cluster share nothing, not even the disks. Of course, both types are commonly connected via some type of packet switching network. While shared-nothing clusters mostly consist of commodity hardware, shared-disk clusters often use specialized and expensive storage systems.

Shared-disk systems can provide are a good base for single-master replication solutions with failover capability. Using a clustered filesystem in a shared-nothing environment can be an inexpensive alternative.

It's a common misconception, that a shared-disk cluster would allow a faster eager multi-master replication systems than a shared-nothing one. The reasoning being, that less data needs to be transferred. But that's no where the problem is, because it's not the network throughput, but the latency that matters. In other words: doing conflict detection using CPU, memory and a network is faster than doing it using CPU, memory and shared disks.

Clustering

While clustering is a very popular term, it does not have a well defined meaning with regard to database systems. Lots of different techniques, like replication, load balancing, distributed querying, etc.. are called clustering here and there.

Grids

With regard to database replication, the very same applies for "grids", only potentiated. It's a plain marketing term, IMO.