The distributed data processing is an effective way to improve reliability, availability and performance of a database system. Efficient allocation of fragments requires a balance between costs (storage, processing and transmission of data), performance (especially response time) and data distribution restrictions. The allocation of fragments is closely related to the replication of data from distributed databases. In addition, we analyzed the cost of fragmentation and replication.
Fragmentation
Fragmentation in distributed database is very useful in terms of usage because usually, applications work with only some of relations rather than entire of it. In data distribution, it is better to work with subsets of relations as the unit of distribution (Ariwa & El-Qawasmeh, 2011).
Distributed database design will have fragmentation because this will give a benefit of efficiency. By using fragmentation, a transaction can be divided into several sub-queries that operate on fragments. So, it will increase the degree of parallelism. Besides, it also good for security as data not required for local applications is not stored globally and it will not be available for unauthorized users.
Horizontal fragmentation will be ideal for companies that has nationally or globally distributed database. We could naturally fragment the data based on the geographical location of either where the branches/ warehouses are or where it will be mostly used or where most customers are located. A partial list of benefits of horizontal fragmentation include different applications access or update only portions of classes so fragmentation will reduce the amount of irrelevant data access by applications, it allows greater concurrency because the lock granularity, it reduces the amount of data transferred when migration is required etc.
Allocation
Ray 2009, states that allocation of fragments is a critical performance issue in the context of distributed database design. Before allocation of fragments into different sites of a distributed system, it is necessary to identify whether the fragments are replicated or not. The allocation of non-replicated fragments can be handled easily by using “best-fit” approach. In best-fit approach, the best cost-effective allocation strategy is selected among several alternatives of possible allocation strategies.
Replication
Some of allocation issues with replicated fragments are the total number of replicas of fragments may vary and it is difficult to design read-only applications, because applications can access several alternative replicates of fragment from different sites of the distributed system.
According to Ozsu & Valduriez 2011, replication as benefits of system availability which is if there is some single point failure by replicating data this could problem will be removed. Replication could also give the system a performance boost to some degree by removing communication overheads by enabling us to locate the data closer to their access points, thereby localizing most of the access that contributes to a reduction in response time. Replication could also help distributed database to become scalable. As the system grows geographically and inters of the number of sites, replication allows for a way to support this growth with acceptable response times.
For a company that has distributed database such as big national retail companies it will be wise to implement both synchronous and asynchronous replication controls. We need to keep in mind the real timeliness of the data and the performance of the system while deciding how much of replicas to have and where to locate them. I recommend keeping the product catalog/inventories table synchronously so that customers would have exact information on the availability of products. We could use asynchronous replication control for other replicas such as order information, customers and warehouse data.
Since having more replication would have adverse effect on the performance of the system I recommend to keep two to three replication of customers profiles and data warehouse information to data centers close to their physical locations.
References
- Ariwa, E. & El-Qawasmeh, E. (2011). Digital Enterprise and Informational Conference, DEIS 2011. London, UK, July 2011 Proceedings. Springer. New York. NY.
- Ozsu, M. T. & Valduriez, P. (2011).Principles of Distributed Database Systems, 3rd Edition. Springer. New York, NY.
- Ray (2009). Distributed Database Systems. Dorling Kindersley (India) Pvt. Ltd. New Delhi, India.