Fivetran vs Nifi

What is Data Integration? | TIBCO Software
Data Integration

This article evaluate and demonstrate capabilities of Fivetran and showcase improvements that can be brought in by implementing Fivetran over Nifi for overall ingestion process currently implemented at Regeneron for Data eco system.

Nifi Limitations

Nifi has certain limitations which are highlighted below which indicates the areas where it can be problem in near future w.r.t to growing data needs and streamlining the ingestion process for this growing data into overall Data eco system.

  • No managed services.
  • No alerting mechanism provided by tool itself.
  • No hook for Airflow/MWAA scheduler to send the jobs to spark once ingestion is completed.
  • Nifi scalability issue

Comparative Analysis

Comparative analysis between Nifi and Fivetran based on the high-level capabilities both the products have to offer.

Supported Formats:

NifiFivetran
NIFI can read different file format and convert it to csv.
-Avro
-ConvertRecordProcessor
-getfile
-json
-NiFi
-Separated Value Files (CSV*, TSV, etc.)
-JSON Text files delimited by new lines
-JSON Arrays
-Avro
-Compressed — Zip, tar, GZ
-Parquet
-Excel

Source Integration

NifiFivetran
Nifi can connect with following sources:
– s3
– Google Cloud
– Azure Blob

CData JDBC Driver pair required for the following sources
– Box
– Dropbox
– Google Drive
– OneDrive
– Sharepoint
Sync with below cloud-based storages:
– S3
– Azure Blob
– Google Cloud
– Magic Folder: (Magic Folder connectors sync any supported file from your cloud folder as a table within a schema in your destination.)

Sync supported through Magic Folder:
– Box
– Dropbox
– Google Drive
– OneDrive
– Sharepoint

File Transfer Protocols

NifiFivetran
-FTP
-SFTP
-FTP
-FTPS
-SFTP

Supported Database Sources

NifiFivetran
-MongoDB
-Postgres
-MySql
-Oracle
-MS SQL
-CData JDBC Driver for MariaDB
-MongoDB
-MariaDB
-MySQL
-Oracle
-PostgreSQL
-SQL Server

Logging

NifiFivetran
-nifi-bootstrap.log
-nifi-user.log
-nifi-app.log
-In dashboard
-External Logging service
-In your destination using Fivetran
-Log Connector

Transformations

NifiFivetran
-Jolt (JoltTransformJSON Processor)
-XSLT (TransformXml Processor)
-Data Transformation using Scripts
(ExecuteScript Processor)
-Basic SQL transformations
-dbt transformations
dbt is an open-source software that enables you to perform sophisticated data transformations in your destination using simple SQL statements.

With dbt, you can:
– Write and test SQL transformations
– Use version control with your transformations
– Create and share documentation about your dbt transformations
– View data lineage graphs

Alerting

NifiFivetran
You can use the Monitor Activity processor to alert on changes in flow activity by routing alert to Put Email processorOnly present on dashboard but if sync fails it can send email notification provided its enabled.

NOTE: Tasks describe a problem that keeps Fivetran from syncing your data.

Warnings describe a problem that you may need to fix, but that does not keep Fivetran from syncing your data.

Listener

NifiFivetran
-Maintain state for incremental load using state object
-Event-based is supported
-Scheduling also supported
-Maintain state for incremental load using state object
-Event-based is supported
-Scheduling also supported

Scalability

NifiFivetran
Possible but difficultPossible but difficult

Trigger for Auto-Start Transformation Job

NifiFivetran
No trigger, must rely on scheduling times.Integration with Apache Airflow is Supported. Fivetran’s syncs enable the ability to trigger data transformations from Fivetran syncs.

Destination / Warehouses

NifiFivetran
-S3
-Postgres
-MongoDB
-MySql
-Oracle
-MS SQL
-CData JDBC Driver for MariaDB
Azure Synapse
-BigQuery
-Databricks
-MySQL BETA
-Panoply
Periscope
-PostgreSQL
-Redshift
-Snowflake
SQL Server

Account management

NifiFivetran
-client certificates
-username/password
-Apache Knox
-OpenId Connect
IAM / User Authentication Possible
-Azure AD (BETA)
-Google Workspace (BETA)
-Okta
-OneLogin
-PingOnes

Version Control

NifiFivetran
GitHubGitHub with account having permissions for following GitHub scopes:
-repo
-read:org
-admin:org_hook
-admin:repo_hook

Configuration REST API

NifiFivetran
The configuration API can manage
-Access
-Controller
-Controller Services
-Reporting Tasks
-Flow
-Process Groups
-Processors
-Connections
-FlowFile Queues
-Remote Process Groups
-Provenance
This feature is available only for Standard, Enterprise, and Business Critical accounts
User Management API 
-Group Management API 
-Destination Management API 
-Connector Management API 
-Certificate Management API 

Functions\Templates

NifiFivetran
YesCan write the custom for data source or a private API that fivetran don’t support, you can develop a serverless ELT data pipeline using our Function connectors.

Language Supported

NifiFivetran
-Python
-Java
-Python
-Java
-GO
-Node.JS

Streaming

NifiFivetran
Apache Kafka
-Amazon Kinesis
Apache Kafka
Amazon Kinesis
Snowplow Analytics
-Segment
-Webhooks