Automated Data Validation Using FME

There are many scenarios under which data needs to be validated and FME can be one of the best available tools for not only validating your data, but also for bringing a level of automation to the validation process. If you’re familiar with FME then you’ll already be well aware of just how powerful it is for working with data in just about any format and, also, of the multitude of tools (Transformers) that allow for virtually any type of validation you can dream up. What may not be so familiar is the power of FME Server to automate workflows.

This blog post will provide a birds-eye view of the architecture of a system developed for automatically validating and reporting on incoming data.

Conceptual Process

Process Flow.png

As can be seen in the above diagram, the overall process is simple:

  1. A trigger causes the validation to run.

  2. An FME workspace is run on FME Server to apply the validation rules and verify the submitted data.

  3. If the data is valid, it is uploaded to the corporate data store.

  4. Notifications are sent in the case of validation success and/or failure.

Trigger Mechanisms

FME Server provides many options for triggering a workflow but some of the most common ones include:

  • email - when an email is sent to FME Server it can trigger a process to run

  • file added to folder - FME Server can be configured to monitor a local file folder, FTP folder or Cloud service (e.g. Amazon S3) for incoming files. When a file arrives or is updated, it can trigger a process to run

  • web form submitted - web forms can be developed to integrate with the FME Server REST API. Using the API jobs can be triggered directly.

Notification Mechanisms

As with process triggers, FME Server provides multiple options for sending notifications. The most common is email but other methods such as SNS, JMS and Apple Push are available.

Solution Architecture

Although there are many options within FME Server for triggering processes to run and for sending notifications, one possible solution is to utilise a web form for authorised parties to upload data and submit a validation request.

If the submitter needs to be authorised to upload data then a custom web form will likely need to be developed. If no authorisation is required, and if FME Server 2019 or newer is utilised, then an FME Server App could be used, avoiding the need for custom web development. In both cases, the FME Server REST API is the underlying communication mechanism that triggers the validation process.

The advantage of the web form architecture (versus an alternate trigger mechanism, such as email or folder watch) is that additional metadata about the incoming submission can be captured in a controlled manner. For example, the web form could be used to capture the submitting username and email address, area of interest, coordinate system or data format of the data being uploaded.

Solution architecture.png

Data Validation

If you were paying attention to the solution architecture diagram above you may have noticed the “Rules DB” component that feeds in to FME Server. What is the Rules DB, you ask? Well, that’s the topic of another blog which will be coming shortly!

Suffice to say, the rules DB defines the majority of the validation rules which are applied in the FME workspace and this allows the workspace to be as simple and generic as possible, and helps to minimise development and ongoing maintenance costs. It is a key component of a flexible architecture that allows the validation rules to be defined by subject matter experts and business users, and means that no FME workspace (code) changes are required in the event of validation rule changes.

Previous
Previous

A Rule-Driven Approach to Data Validation

Next
Next

GeoAlberta 2019