Azure Data Factory Interview Questions

Azure Data Factory is a cloud-based Microsoft tool that collects raw business data and further transforms it into usable information. It is a data integration ETL (extract, transform, and load) service that automates the transformation of the given raw data. This Azure Data Factory Interview Questions blog includes the most-probable questions asked during Azure job interviews

Why Azure Data Factory

Cloud Need an Orchestration Tool
Extract Data from Cloud Source
Transform Data using Datafactory workflows
Load/Sink data into Datalake or datawarehouse

Azure Data Factory Concepts

Pipeline: This is list of steps we want to perform in our Data Transformation or Copy task. Steps are called Activities
Activities: Activities represent the processing steps in a pipeline. A pipeline can have one or multiple activities. Activities can be of following types
– Data Movement e.g. Copy
– Data Transformation Activities e.g. Databrick, Azure Functions
– Control Flow Activities e.g. For Each Loop, Lookup, Set Variable
Datasets: Sources of data. In simple words, it is a data structure that holds our data.
Linked services: Connection Strings

Data Factory Concepts Mapping with SSIS

Data Factory	SSIS
Pipeline	Package (*.dtsx) or Control Flow
Activity	SSIS Control flow Components
Linked Services	Connections
Dataset	Connection Mapping
Workflow	Dataflow

ADF Integration Runtime

The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments.
– Dataflow
– Data Movement
– Activity Dispatch
– SSIS Package Execution

Azure Integration Run Time: Azure Integration Run Time can copy data between cloud data stores and it can dispatch the activity to a variety of compute services such as Azure HDinsight or SQL server where the transformation takes place
Self Hosted Integration Run Time: Self Hosted Integration Run Time is software with essentially the same code as Azure Integration Run Time. But you install it on an on-premise machine or a virtual machine in a virtual network. A Self Hosted IR can run copy activities between a public cloud data store and a data store in a private network. It can also dispatch transformation activities against compute resources in a private network. We use Self Hosted IR because Data factory will not be able to directly access on-primitive data sources as they sit behind a firewall.
Azure SSIS Integration Run Time: With SSIS Integration Run Time, you can natively execute SSIS packages in a managed environment.

ETL process in Azure Data Factory

Following are high level steps to create a simple pipeline in ADF. Following example copy data from SQL Server DB to Azure Data Lake.

Steps for Creating ETL

Create a Linked Service for source data store which is SQL Server Database
Create Source Dataset using Source Link Service
Create a Linked Service for destination data store which is Azure Data Lake Store
Create Sink/Target Dataset using link Service created in Step 3
Create the pipeline and add copy activity
Schedule the pipeline by adding a trigger

Schedule a pipeline

You can use the scheduler trigger or time window trigger to schedule a pipeline.
Trigger execute Pipelines on a Schedule e.g. Daily at 6 AM

Pass parameters to a Pipeline

You can define parameters at the pipeline level and pass arguments as you execute the pipeline run on demand or by using a trigger.

Define default values for the Pipeline Parameters?

You can define default values for the parameters in the pipelines.

Can an activity output property be consumed in another activity?

An activity output can be consumed in a subsequent activity with the @activity construct.

Azure Data Factory Interview Questions

Why Azure Data Factory

Azure Data Factory Concepts

Data Factory Concepts Mapping with SSIS

ADF Integration Runtime

ETL process in Azure Data Factory

Schedule a pipeline

Pass parameters to a Pipeline

Define default values for the Pipeline Parameters?

Can an activity output property be consumed in another activity?

Published by planet_goodwill

Leave a Reply Cancel reply

Why Azure Data Factory

Azure Data Factory Concepts

Data Factory Concepts Mapping with SSIS

ADF Integration Runtime

ETL process in Azure Data Factory

Schedule a pipeline

Pass parameters to a Pipeline

Define default values for the Pipeline Parameters?

Can an activity output property be consumed in another activity?

Share this:

Related

Published by planet_goodwill

Leave a Reply Cancel reply