FADI - Ingest, store and analyse big data flows

View the Project on GitHub cetic/fadi

FADI User guide

This page provides documentation on how to use the FADI big data framework using a sample use case: monitoring CETIC offices building.

FADI sample use case - building monitoring

In this simple example, we will ingest temperature measurements from sensors, store them and display them in a simple dashboard.

1. Install FADI

To install the FADI framework on your workstation or on a cloud, see the installation instructions.

The components needed for this use case are the following:

Those components are configured in the following sample config file, once the platform is ready, you can start working with it.

The following instructions assume that you deployed FADI on your workstation inside minikube.

Unless specified otherwise, all services can be accessed using the username and password pair: admin / password1 , see the user management documentation for detailed information on how to configure user identification and authorization (LDAP, RBAC, …).

See the logs management documentation for information on how to configure the management of the various service logs.

2. Prepare the database to store measurements

First, setup the datalake by creating a table in the postgresql database.

To achieve this you need to:

3. Ingest measurements

“An easy to use, powerful, and reliable system to process and distribute data.”

Apache Nifi provides ingestion mechanism (to e.g. connect a database, REST API, csv/json/avro files on a FTP, … for ingestion): in this case we want to read the temperature sensors data from our HVAC system and store it in a database.

Temperature measurements from the last 5 days (see HVAC sample temperatures csv extract) are ingested:

2019-06-23 14:05:03.503,22.5
2019-06-23 14:05:33.504,22.5
2019-06-23 14:06:03.504,22.5
2019-06-23 14:06:33.504,22.5
2019-06-23 14:07:03.504,22.5
2019-06-23 14:07:33.503,22.5
2019-06-23 14:08:03.504,22.5
2019-06-23 14:08:33.504,22.5
2019-06-23 14:09:03.503,22.5
2019-06-23 14:09:33.503,22.5
2019-06-23 14:10:03.503,22.5
2019-06-23 14:10:33.504,22.5
2019-06-23 14:11:03.503,22.5
2019-06-23 14:11:33.503,22.5
2019-06-23 14:12:03.503,22.5
2019-06-23 14:12:33.504,22.5
2019-06-23 14:13:03.504,22.5
2019-06-23 14:13:33.504,22.5
2019-06-23 14:14:03.504,22.5

To start, head to the Nifi web interface, if you are using minikube, you can use the following command :

minikube service -n fadi fadi-nifi

Nifi web interface

Now we need to tell Nifi to read the csv file and store the measurements in the data lake.

So, create the following components :

Nifi Ingest CSV and store in PostgreSQL

See also the nifi template that corresponds to this example.

For more information on how to use Apache Nifi, see the official Nifi user guide and this Awesome Nifi resources.

Finally, start the nifi flow in the operate window.

4. Display dashboards and configure alerts

Once the measurements are stored in the database, we will want to display the results in a dashboard.

“Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.”

Grafana provides a dashboard and alerting interface.

Head to the Grafana interface, if you are using minikube, you can use the following command :

minikube service -n fadi fadi-grafana

(the default credentials are admin/password1)

Grafana web interface

First we will define the PostgreSQL datasource. To do that, in the Grafana Home Dashboard

Grafana datasource

Then we will configure a simple dashboard that shows the temperatures captured in the PostgreSQL database:

A pre-filled SQL query is provided and shown in the Queries tab.

You can complete the Where clause with the following expression: Expr: temperature > 20 for example.

To show the dashboard, it is necessary to specify a time frame between 2019-06-23 16:00:00 and 2019-06-28 16:00:00.

Grafana dashboard

Then, a diagram is displayed in the Grafana dashboard.

Grafana dashboard

And finally we will configure some alerts using very simple rules:

For more information on how to use Grafana, see the official Grafana user guide

5. Explore

“BI tool with a simple interface, feature-rich when it comes to views, that allows the user to create and share dashboards. This tool is simple and doesn’t require programming, and allows the user to explore, filter and organise data.”

Apache Superset provides some interesting features to explore your data and build basic dashboards.

Head to the Superset interface, if you are using minikube, you can use the following command :

minikube service -n fadi fadi-superset

(the default credentials are admin/password1):

First we will define the datasource:

Superset datasource

Superset table

Then we will explore our data and build a simple dashboard with the data that is inside the database:

A diagram will be shown.

Superset dashboard

For more information on how to use Superset, see the official Superset user guide

6. Process

“Apache Spark™ is a unified analytics engine for large-scale data processing.”

Jupyter notebooks provide an easy interface to the Spark processing engine that runs on your cluster.

In this simple use case, we will try to access the data that is stored in the data lake.

Head to the Jupyter notebook interface, if you are using minikube, you can use the following command :

minikube service -n fadi proxy-public

Then, you can login using the default credentials admin/password1.

A Jupyter dashboard is shown.

Choose Minimal environment and click on Spawn.

Jupyter web interface

Jupyter exploration

Jupyter results1 Jupyter results2

Jupyter web interface

Jupyter processing

For more information on how to use Superset, see the official Jupyter documentation

7. Summary

In this use case, we have demonstrated a simple configuration for FADI, where we use various services to ingest, store, analyse, explore and provide dashboards and alerts

You can find the various resources for this sample use case (Nifi flowfile, Grafana dashboards, …) in the examples folder

The examples section contains other more specific examples (e.g. Kafka streaming ingestion)