Integration and Synchronisation Data Exchange

In this final part of our multi-part post (see part 1 and part 2), I will discuss the various transports which can be used to synchronise data between applications.

As opposed to the previous two parts, the strategies in this post are not limited to synchronisation and apply equally to integration.

File transfer

The old school method for data exchange has existed for as long as computers have interchanged data. File transfer involves one application creating a file and putting it in an agreed location, whilst the receiving application reads, processes and then does something with the file. Pros

Secure: File transfer can be easily locked down, making it a relatively secure means to exchange data. FTP and SSH are both solid protocols meaning remote exchange over the internet is secure.

Cons

Difficult to audit: Compared with other mechanisms files typically don’t have any natural means to audit or to trace audit.
Remote servers require an FTP/SSH solution to exchange files, adding another chink in the chain of events.
Fragile: Easy to get out of sync where a failure can require significant manual intervention to re-establish normal operation.
CSV and other column based formats such as fixed width text and Excel suffer from brittleness, where file schema modifications (field additions and deletions) require significant re-configuration on the receiving end.

Tips for use

One transaction per file: It’s much easier to identify a particular transaction, and re-processing is easy should something go awry.
For CSV and other column based formats, always add new fields to the end of the file; never remove existing ones, better still don’t use these formats!
Use a resilient API based data format such as Xml or JSON. Do not go overboard however.
Try to avoid Xml schema where possible. We feel schemas add unnecessary constraints to the format, where a substantial amount of the inherent flexibility of the format is lost. They add a level of complexity not required in the SME space.
Always use a recognised API/library when consuming and producing Xml/JSON data. This will at minimum ensure the format is standard compliant. We’ve seen several instances where organisations (that should know better) provide us with improperly escaped Xml, simply because they weren’t using an API to produce the Xml.
After files are processed move them to an archive directory; don’t delete.

Database & staging tables

Our preferred means for both integration and synchronisation, the use of databases & staging tables providing a much more robust transport as compared to file exchange.

Staging tables are an intermediate set of database tables where the sender records pending data for the recipient(s) to later process. Pros

SQL: Provides vendor and format independence to query and update a database and table schema(s) changes are usually non-breaking.
Audit and Traceability: SQL/Databases provide an easy means to record and query any key data required for auditing purposes e.g. timestamps, error & warning messages.
Durability and Recoverability: Databases are naturally durable and far less vulnerable to corruption. In the event of a failure recovery can be made quickly assuming a backup is available.

Cons

Security: Where interchange take place over the internet the database needs to be open to the outside world leading to potential attacks. These can be mitigated by restricting the connecting IP addresses, use of complex passwords, amongst others.

Tips for Use

Always have the following fields in a staging table:
- Status: Used to record whether a record is pending or processed. Always include an error status which allows you to identify erroneous transactions quickly. It is recommended that an index is setup on the status field to help speed querying.
- Last Modified Date: A timestamp to record when the last update was made.
- Last Update By: Used to record which application made the last update.
- Reason/error/message: Where there’s an error or a warning generated as part of the integration or synchronisation it should be recorded against the original record. This then gives you a robust audit trail which can be used later to trace problems.
- Foreign Record Id: Where a recipient application generates a transaction or record id (such as an Order, Invoice, or even Customer number) update the staging table with this reference.
Backup frequently: Depending on the volume of data or the location of the database i.e. for high volume sites or where the database is publicly accessible this may be several times a day.
Always update the staging table immediately after writing to the destination application.
Where processing takes a long time, consider setting the status field to an intermediate value to prohibit other integration processes from processing the same data.

Message queue

A message queue is a transport level asynchronous protocol where a sender places a message into a queue until the receiver removes the message to process it. One of the key attributes of a message queue, especially for data synchronisation, is that messages are guaranteed to be delivered in FIFO order.

Message queues have traditionally been limited to larger scale integration and synchronisation and where both sender and receiver exist on the same network. However, with the explosion of cloud computing and an agreed standard (AMQP), there is an increasing prevalence of low cost services:

Windows Azure Service Bus
Amazon SQS
Storm MQ: Currrently in-beta this ‘free’ service looks promising.

Pros

FIFO: Messages are guaranteed to be delivered in order.
Robust and Transactional: Message queues can be backed up, and are naturally transactional.

Cons

Message Queues are a Transport: The data format of the message still needs to be agreed between sender and recipient.

Tips for use

Use a flexible data format such as Xml.
File and database recovery is well known, message queues less so. Understand how to administer your message queue technology and how to recover on the event of a failure.
Prevent message build up by having an application which is constantly listening for incoming messages.
Use different queues for each data flow; don’t try to multiplex multiple message types onto a single queue.

No interchange

In this scenario there is no intermediate data form, instead each application holds its pending data until the recipient application retrieves it.

Unless an application specifically integrates with another this style of interchange requires some form of middleware to mediate the interaction since:

Each application has its own interface i.e. webservice or a COM/Java/.Net API.
The data usually requires some transformation for it to ‘fit’ with the other.

Summary

If a choice can be made, use databases/staging tables as they are by far the easiest and most robust strategy.

We expect an increased use of message queues, especially with hosted infrastructure environments such as Azure and Amazon.

CSV: Dear oh dear! It may be easy to produce and simple to read, interpret and load into Excel. But it’s 2013! And there are more sophisticated and better means for exchanging data, so why are we still using this archaic format?

Integration and Synchronisation Data Exchange

File transfer

Database & staging tables

Message queue

No interchange

Summary

Contact

Request Demo

Access Downloads

Username
Password

	Remember Me Lost your password? Register