FADI - Ingest, store and analyse big data flows
This page provides documentation on how to use the FADI big data framework using a sample use case: monitoring CETIC offices building.
In this simple example, we will ingest temperature measurements from sensors, store them and display them in a simple dashboard.
To install the FADI framework on your workstation or on a cloud, see the installation instructions. The following instructions assume that you deployed FADI on your workstation inside minikube.
The components needed for this use case are the following:
Those components are configured in the following sample config file, once the platform is ready you can start working with it.
To access services through domain names, open a new terminal and enter this command to give Traefik an external IP address:
minikube tunnel
Update your hosts
file with Traefik’s external IP address:
$ kubectl get svc -n fadi
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fadi-traefik LoadBalancer 10.104.68.59 10.104.68.59 80:31633/TCP,443:30041/TCP 59m
You can find here a user guide for Linux, Mac and Windows. Your host file should look like this:
127.0.0.1 localhost
...
10.104.68.59 grafana.example.cetic.be adminer.example.cetic.be superset.example.cetic.be nifi.example.cetic.be
Unless specified otherwise, all services can be accessed using the username and password pair: admin
/ Z2JHHezi4aAA
, see the user management documentation for detailed information on how to configure user identification and authorization (LDAP, RBAC, …).
See the logs management documentation for information on how to configure the management of the various service logs.
First, setup the datalake by creating a table in the PostgreSQL database.
To achieve this you need to:
Head to the Adminer interface
kubectl port-forward service/fadi-adminer 8081:80
and can access Adminer from your browser at http://localhost:8081Access to the Adminer service and to the PostgreSQL database using the following credentials:
PostgreSQL
fadi-postgresql
admin
Z2JHHezi4aAA
postgres
In the Adminer Browser, launch the Query tool by clicking “SQL command”.
Copy/Paste the table creation script in the Query Editor.
Execute the creation query by clicking on the Execute
command.
Once the previous steps are finished, you can detect that a new table example_basic
is created in the Tables
field of adminer Browser.
“An easy to use, powerful, and reliable system to process and distribute data.”
Apache Nifi provides ingestion mechanism (to e.g. connect a database, REST API, csv/json/avro files on a FTP, … for ingestion): in this case we want to read the temperature sensors data from our HVAC system and store it in a database.
Temperature measurements from the last 5 days (see HVAC sample temperatures csv extract) are ingested:
measure_ts,temperature
2019-06-23 14:05:03.503,22.5
2019-06-23 14:05:33.504,22.5
2019-06-23 14:06:03.504,22.5
2019-06-23 14:06:33.504,22.5
2019-06-23 14:07:03.504,22.5
2019-06-23 14:07:33.503,22.5
2019-06-23 14:08:03.504,22.5
2019-06-23 14:08:33.504,22.5
2019-06-23 14:09:03.503,22.5
2019-06-23 14:09:33.503,22.5
2019-06-23 14:10:03.503,22.5
2019-06-23 14:10:33.504,22.5
2019-06-23 14:11:03.503,22.5
2019-06-23 14:11:33.503,22.5
2019-06-23 14:12:03.503,22.5
2019-06-23 14:12:33.504,22.5
2019-06-23 14:13:03.504,22.5
2019-06-23 14:13:33.504,22.5
2019-06-23 14:14:03.504,22.5
(...)
To start, head to the Nifi web interface, type in your browser the nifi.traefikIngress.host
. E.g. :
Then, you can login using the default credentials username
/changemechangeme
.
A Nifi dashboard is shown.
Now we need to tell Nifi to read the csv file and store the measurements in the data lake.
So, create the following components :
Configure
> Settings
tab > Automatically Terminate Relationships
: all except Response
Configure
> Properties
tab > Remote url: https://raw.githubusercontent.com/cetic/fadi/master/examples/basic/sample_data.csv
Configure
> Scheduling
tab > Run Schedule: 120s (this will download the sample file every 120 seconds)Configure
> Settings
tab > Automatically Terminate Relationships
: allConfigure
> Properties
tab > Record Reader > Create a new service
> CSV Reader
Go To
> Configure
> Properties
>true
Configure
> Properties
tab > Statement Type: INSERT
Configure
> Properties
tab > Database Connection Pooling Service > DBCPConnectionPool
Go To
> Configure
> Properties
>
jdbc:postgresql://fadi-postgresql:5432/postgres?stringtype=unspecified
org.postgresql.Driver
/opt/nifi/psql/postgresql-42.2.6.jar
admin
Z2JHHezi4aAA
Configure
> Properties
tab > Schema Name > public
Configure
> Properties
tab > Table Name > example_basic
Configure
> Properties
tab > Translate Field Names > false
configuration
button.InvokeHTTP
processor to PutDatabaseRecord
Response
success_port
failure_port
Success
Connection:
PutDatabaseRecord
to Output Success Port
success
Failure
Connection:
PutDatabaseRecord
to Output Failure Port
failure
DatabaseRecord
:
retry
Start
.See also the NiFi template that corresponds to this example.
Upload template
in the Operate frame, select the template, and upload it.Configuration
View configuration
of DBCPConnectionPool
controller service.Properties
tab, complete the password
field with Z2JHHezi4aAA
CSVReader
and DBCPConnectionPool
controller services.Start
.For more information on how to use Apache Nifi, see the official Nifi user guide and this Awesome Nifi resources.
Finally, start the nifi flow in the operate window.
Once the measurements are stored in the database, we will want to display the results in a dashboard.
“Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.”
Grafana provides a dashboard and alerting interface.
Head to the Grafana web interface by typing in your browser the grafana.traefikIngress.host
. E.g. :
http://grafana.example.cetic.be
(the default credentials are admin
/Z2JHHezi4aAA
)
First we will define the PostgreSQL datasource. To do that, in the Grafana Home Dashboard
Add data source
,PostgreSQL
,fadi-postgresql:5432
postgres
admin
Z2JHHezi4aAA
disable
10
Then we will configure a simple dashboard that shows the temperatures captured in the PostgreSQL database:
New dashboard
,Choose Visualization
A pre-filled SQL query is provided and shown in the Queries tab.
You can complete the Where
clause with the following expression: Expr: temperature > 20
for example.
To show the dashboard, it is necessary to specify a time frame between 2019-06-23 16:00:00
and 2019-06-28 16:00:00
.
Then, a diagram is displayed in the Grafana dashboard.
And finally we will configure some alerts using very simple rules:
Alert
tab.Create Alert
For more information on how to use Grafana, see the official Grafana user guide
“BI tool with a simple interface, feature-rich when it comes to views, that allows the user to create and share dashboards. This tool is simple and doesn’t require programming, and allows the user to explore, filter and organise data.”
Apache Superset provides some interesting features to explore your data and build basic dashboards.
Head to the Superset web interface by typing in your browser the superset.traefikIngress.host
. E.g. :
http://superset.example.cetic.be
(the default credentials are admin
/Z2JHHezi4aAA
):
First we will define the datasource:
On the top menu of Superset, click on Sources
-> Databases
add a new record
button).
example_basic
postgresql://admin:Z2JHHezi4aAA@fadi-postgresql:5432/postgres
Test Connection
to check to connection to the database.Sources
-> Tables
add a new record
button).example_basic
.example_basic
.Save
.example_basic
, click Edit record
button.List Columns
tab, in the measure_ts
, click on the Edit record
button.measure_ts ::timestamptz
.Save
.Then we will explore our data and build a simple dashboard with the data that is inside the database:
Charts
add a new record
button).example_basic
.Time Series - Line Chart
.Create new chart
.Data
tab
Time
section,
hour
.Last quarter
Query
section
AVG(temperature)
Save
Run Query
.A diagram will be shown.
Save
.
Basic example
Basic example dashboard
Save & go to bashboard
.For more information on how to use Superset, see the official Superset user guide
“Apache Spark™ is a unified analytics engine for large-scale data processing.”
Jupyter notebooks provide an easy interface to the Spark processing engine that runs on your cluster.
In this simple use case, we will try to access the data that is stored in the data lake.
Head to the Jupyter notebook interface by typing in your browser the jupyter.traefikIngress.host
. E.g. :
http://jupyterhub.example.cetic.be
Then, you can login using the default credentials admin
/Z2JHHezi4aAA
.
A Jupyter dashboard is shown.
Choose Minimal environment
and click on Spawn
.
jupyter_exploration.ipynb
module and run the different scripts.
Control panel
Stop my server
Start server
, choose Spark environment
and click on Spawn
.For more information on how to use Jupyter, see the official Jupyter documentation
In this use case, we have demonstrated a simple configuration for FADI, where we use various services to ingest, store, analyse, explore and provide dashboards and alerts
You can find the various resources for this sample use case (Nifi flowfile, Grafana dashboards, …) in the examples folder
The examples section contains other more specific examples (e.g. Kafka streaming ingestion)