In Part I of this multi-part blog some of the most common issues for synchronisation were discussed.
This second post will describe the strategies available for synchronising data between applications, their advantages and pitfalls.
The strategy is the means to determine when a record should be inserted, updated, or deleted.
Data Structures
It is necessary to first understand the issues surrounding more complex data structures.
Simple single level data structures, such as customer records, can be readily synchronised. However, as data structures become more complex so does the strategy.
The above example is illustrative, both order and order lines must be synchronised; an order without lines is not an order. However, the third level of data – the serial–numbers add several complications:
Is it necessary to synchronise them? If they are used for transactional purposes, yes, but a view-only application may not require them.
What do the serial numbers mean when an order line is updated? Are they a fresh set of serials, or just additions? While this question applies also to lines, the deeper the structure the more pronounced the issue becomes. This is due to the feedback of how the update is made on each level of data.
Means of Identifying Updates
Inserting & Updating Records
Inserts and updates follow a simple pattern where the destination application is queried to identify if it contains an existing record. An update is performed where a match is found; otherwise when no matches are found, an insert is performed.
Deleted Records
Handling deletions is a much harder issue, primarily as the delete needs to be identified. There are three approaches:
Tombstoning
Tombstoning is a technique which instead of actually deleting the record, marks it as deleted, via a field within the record’s table. What makes this a preferred method for handling deletes is that the tombstone makes the delete operation explicit and in turn reduces complexity around synchronisation.
Tombstoning however requires the source application to provide such functionality (usually only a feature when data synchronisation/integration is integral to the application e.g. CRM).
For applications that don’t support tombstoning, it is necessary to record the record deletion(s) in a separate interface table or file, usually by way of a trigger (to be discussed in part 3 of this blog).
Record Comparison
Record comparison involves comparing a set of data from the source application to the set of data in the destination application. A delete is performed when a record is found in the destination but not in the source.
In contrast to ‘Tombstoning’ this suffers from three primary problems:
Complex: Record comparison is implicit; it requires the ‘data synchroniser’ to determine if a record has been deleted.
Auditing: There is a lack of an audit trail where records just disappear, with no facility to check why/when a deletion occurred.
OK for child records not for the primary entity: Using comparison for child records in hierarchical structure, e.g. order lines within an order, can be appropriate where there are limited records and the detection of deleted records is cheap and relatively straight forward.
Using comparison for a primary entity (e.g. the order) requires the examination of a large set of data to determine whether a delete is required.
Re-create Everything
A third alternative is to delete and re-create all records on each ‘sync’. This strategy is easiest to implement, covering inserts, updates and deletes, since it doesn’t require any logic to detect and handle changes. However, its use is only limited to the following scenarios:
Data is loaded at a database level e.g. reporting & data warehouses and CRMs. Applications with a sophisticated data structure and/or API usually frown on this approach as business logic and data validation is side-stepped and can lead to integrity issues.
The destination application is view only and non-transactional.
There are no (or very limited) foreign-key constraints: Foreign keys enforce relationships between records. When a record is deleted so must all the associated records. For example, deleting a customer record also requires deleting all the customer’s associated order & invoice records.
Data size: Deleting and re-inserting all records in a large dataset may take a long time to complete and can be expensive in terms of computing power, particularly with hosted environments where there can be CPU cycle costs. Pruning the dataset, usually by date, is an effective strategy to limit the load size.
Handling Application Logic
When synchronising data between applications it is necessary to negotiate and handle the restrictions or logic the API applies to the data.
Business logic is much stricter than any database enforcement.
Transactional data moves through a process, where the status of the transaction (or line) dictates whether modifications can be made.
Deletes are never easy – Particularly with ERP applications, deletes are handled in a number of ways: master items may need to be marked as inactive (a form of tombstoning); transactions can either be deleted or cancelled.
Strategies for handling application logic
Atomicity: An update should be all or nothing; never have a situation where some updates to a single transaction or master item fail and some succeed. Whilst this should apply whether synchronising at data level or application level, it’s easier to fall into a situation where a partial update is made due to lack of understanding of the API.
Is your synchronisation dumb or smart: You will need to make a choice as to how much logic you embed into your synchronisation, and often they start dumb and evolve to smart. What this means is the amount of logic required to cover certain scenarios or use cases. At the stupid end, there is no logic and a simply try/catch block is placed around your processing to catch any exceptions. Conversely, at the smart end you may check for an entity’s existence, raise workflow events, or handle specific edge cases.
Summary
This post hopefully has helped to explain the most common synchronisation strategies and pitfalls. It is by no means complete and we would like to hear your opinions.
The next post will discuss the various means i.e. file transfer, database, etc. available for data synchronisation.
Contact
Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.
Looking to purchase IMan, please see our resellers here.
Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.
Looking to purchase IMan, please see our resellers here.
Realisable Software Ltd provides code-free, cost-effective applications integration solutions for SMEs. Our core IMan product is designed to integrate almost any application with a number of Sage solutions and online payment processors.
Looking to purchase IMan, please see our resellers here.