Microsoft Azure – Introduction to Azure Data Factory

Azure data factory as commonly known as ADF is an ETL(Extract-Transform- load ) Tool to integrate data from various sources of various formats and sizes together, in other words, It is a fully managed, serverless data integration solution for ingesting, and preparing, and transforming all your data at scale. The pipelines of Azure data factory are used to transfer the data from the on-premises to the cloud within a certain period of intervals.

Table of Content

  • What Is Azure Data Factory(ADF)?
  • How does Azure Data Factory Works?
  • Azure Data Factory(ADF) Architecture
  • What are the difference between Azure Data Factory and Azure Data Bricks?
  • What are the differences between Azure Data Factory and Azure Data Lakes?
  • Features of Azure Data Factory(ADF)
  • Benefits of Azure Data Factory (ADF)
  • Use Cases And Usage Scenarios of Azure Data Factory
  • Azure Data Factory(ADF) Pricing
  • Microsoft Azure Data Factory (ADF) – FAQs

What Is Azure Data Factory(ADF)?

Azure data factory will help you to automate and manage the workflow of data that is being transferred from on-premises and cloud-based data sources and destinations. Azure data factory manages the pipelines of the data-driven workflows. The Azure data factory stands out when compared to other ETL tools because of features such as easy-to-use, Cost-Effective solution, and Powerful and intelligent code-free service.

As the data is increasing day by day around the world many enterprises and businesses are shifting towards the usage of cloud-based technology to make their business scalable. Because of the increase in cloud adoption, there is a need for reliable ETL tools in the cloud to make the integration.

How does Azure Data Factory Work?

Azure Data Factory (ADF) is a cloud-based data integration service that orchestrates and automates the move​https://www.w3wiki.net/?p=977237​https://www.w3wiki.net/?p=977237​https://www.w3wiki.net/?p=977237ment and transformation of data. It enables you to create data-driven workflows for orchestrating data movement and transforming data at scale. By using a graphical interface, ADF allows for the easy creation of complex ETL (Extract, Transform, Load) processes that can integrate data from various sources and formats. The following are the some of the key points regarding Azure Data Factory:

  • Data Ingestion: Azure Data Factory can connect to a wide range of data sources including on-premises databases, cloud bsaaed storage devices.
  • Data Transformation: By mapping the data flow and increasing various transformation activities ADF can clean, aggregate and transform the data to meet up the business needs using Azure services such as Azure Databricks or Azure HDinsights.
  • Scheduling and Monitoring: It provide strong scheduling capabilities to automate the workflows and monitor the tools for tracking the progress and health of the data pipelines.

Azure Data Factory(ADF) Architecture

The figure below describes the Architecture of the data engineering flow using the Azure data factory. The data flow starts form the The source data can be from a variety of sources, such as on-premises databases, cloud storage services, and SaaS applications.

After the data destination the data is being transferred to the staging area where data is stored for the temporary purpose where the data will arranged in the manner which can be arranged for further processing. After the data is processed it will be with the help of data flows.

  1. Integration run time: It will executes the pipelines which are hosted on-premises or in the cloud.
  2. Linked service: It will connect the data source and destination.
  3. Dataset: A dataset represents the data that is being processed by a pipeline.
  4. Pipelines: A pipeline is a sequence of activities that are executed in order to process data.

Azure data factory will transfer the data from the on-premises data centre to the cloud which is required. For example a company needs to analyzie the data using Azure Synapse Analytics.which has to be done on daily bases so company will creates the three step producer to achieve this by using Azure data factory pipeline.

  1. Copy the data from the on-premises database to a staging area in Azure Blob Storage.
  2. Data flow activity will transfer the data in the staging area.
  3. A copy activity to copy the transformed data from the staging area to the data warehouse in Azure Synapse Analytics.

The pipeline will set on daily basis to be triggered when ever the pipeline gets triggered the data will be transferred from the on-premises to the cloud destination.

What are the difference between Azure Data Factory and Azure Data Bricks?

The following are the differences between Azure Data Factory and Azure Data Bricks:

Aspect Azure Data Factory (ADF) Azure Databricks
Purpose Data integration and orchestration service. Big data analytics and machine learning platform.
Primary Function Orchestrates data workflows, ETL processes. Provides an environment for big data processing and analytics.
Data Transformation Basic transformations using data flows and mapping. Advanced data transformations using Apache Spark.
Development Interface Graphical user interface for creating pipelines. Notebooks for interactive data analysis and development.
Scalability Scales through integration with other Azure services. Highly scalable with built-in Spark clusters.

What are the differences between Azure Data Factory and Azure Data Lakes?

The following are the differences between Azure Data Factory and Azure Data Lakes:

Aspect Azure Data Factory (ADF) Azure Data Lake (ADL)
Purpose Data integration and orchestration service. Storage service optimized for big data analytics.
Primary Function Orchestrates data workflows, ETL processes. Provides scalable storage for structured and unstructured data.
Data Management Manages and automates data movement and transformation. Stores large volumes of raw data for analytics and processing.
Interface Graphical user interface for creating and managing pipelines. Managed via Azure portal, SDKs, and REST APIs for storage operations.
Use Cases ETL processes, data migration, data integration. Data storage for big data analytics, data warehousing, and data lakes.

Features of Azure Data Factory(ADF)

The following are the features of Azure Data Factory:

  1. Data flows: Data flows uses Apache spark to transfer data from the source to destination. Data flow is an code-free way to transform data you can just drag and drop the source and destination of the data to flow it will create an complex pipelines to transfer the data.
  2. Pipelines: Pipelines plays major role in the data transfer it will orchestrate data movement and transformation processes. Pipelines and can triggered by the events occurred or we can schedule based up on the time intervals.
  3. Data Sets: Datasets are simply points or reference the data, which we want to use in our activities as input or output.
  4. Activity: Activities in a pipeline define actions to perform on data. For example, copy data activity can read from one location of Blob storage and loads it to another location on Blob storage.
  5. Integration Runtime: The Integration Runtime(IR) is to compute infrastructure used by ADF to provide capabilities such as  Data Flow, Data Movement, Activity Dispatch, and SSIS Package Execution across different network environments.
  6. Linked Services: Linked services are used to connect to other sources with the Azure data factory. Linked services act as connection strings for resources to connect.

Benefits of Azure Data Factory (ADF)

The following are the benefits of Azure Data Factory:

  1. Scalability and Flexibility: Azure data factory is scalability in the nature because the data which is being transferred from on-premises and cloud-based data sources and destinations is unpredictable some times the volume of the data will be high some time and it may be also less some times to meet this requirements Azure data factory is scalable in nature.
  2. Hybrid data integration: The data which is managed by the on-premises and cloud-based sources can be managed by the Azure data factory.
  3. Data Orchestration: Azure data factory will helps us to manage large amount of data in a centerlized manner which makes it easy to maintain the data.
  4. Intergration with Azure services: A few of the Azure services that work closely with Azure Data Factory include Azure Synapse Analytics, Azure Databricks, and Azure Blob Storage. This makes it simple to create and manage data pipelines that utilise a variety of services.

Use Cases And Usage Scenarios of Azure Data Factory

The following are the usecases and usage scenarios of Azure Data Factory:

  1. Data Integration: Azure Data Factory is commonly used for integrating data from various sources such as on-premises databases, cloud-based storage, and SaaS applications, enabling organizations to consolidate and centralize their data for analysis and reporting.
  2. ETL Processes: Organizations leverage Azure Data Factory to orchestrate Extract, Transform, Load (ETL) processes, automating the movement and transformation of data between different systems to ensure data quality and consistency.
  3. Real-time Data Processing: With its ability to schedule and execute data workflows on-demand or on a schedule, Azure Data Factory is employed for real-time data processing scenarios, enabling organizations to react quickly to changes in data and business requirements.
  4. Hybrid Data Scenarios: Azure Data Factory supports hybrid data scenarios, allowing organizations to seamlessly integrate data from on-premises systems with cloud-based data sources, facilitating hybrid cloud deployments and ensuring data accessibility and consistency across environments.
  5. Analytics and Business Intelligence: By preparing and transforming data for analytics and business intelligence purposes, Azure Data Factory enables organizations to derive insights and make informed decisions based on their data, empowering data-driven decision-making processes.

Azure Data Factory(ADF) Pricing

Data Pipelines: Helps to Integrate data from cloud and hybrid data sources, at scale.  – Pricing starts from ₹72.046 / 1,000 activity runs per month.

SQL Server Integration Services:  Helps to easily move your existing on-premises SQL Server Integration Services projects to a fully-managed environment in the cloud. Pricing for SQL Server Integration Services integration runtime nodes start from ₹60.498 /hour.

  1. No upfront cost
  2. No termination fees
  3. Pay only for what you use
Component Pricing Model Cost
Data Movement Pay-as-you-go based on data volume $0.45 per TB processed
Data Transformation Pay-as-you-go based on activity runs $1 per 1,000 activity runs
Data Flows Pay-as-you-go based on compute usage $1 per 8 vCore hours
Pipelines Pay-as-you-go based on activity runs $0.20 per 1,000 activity runs
Integration Runtimes Pay-as-you-go based on integration Pricing varies by integration

Microsoft Azure Data Factory (ADF) – FAQs

Where is Azure Data Factory Available?

Azure Data Factory service is available in all the regions where the Azure services are available.

What is the SLA for Data Factory?

Azure Data Factory offers a service-level agreement (SLA) of 99.9% for data flow and pipeline availability.

What is the Integration runtime?

Integration Runtime is the compute infrastructure used by Azure Data Factory to provide data integration capabilities across different network environments.

Is Azure Data Factory an ETL?

Yes, Azure Data Factory is an Extract, Transform, Load (ETL) service that orchestrates and automates data movement and transformation workflows.

What is an Azure Data Factory?

Azure Data Factory is a cloud-based data integration service that allows organizations to create, schedule, and manage data pipelines for data movement and transformation.

Is Azure and Azure Data Factory same?

No, Azure is the cloud computing platform provided by Microsoft, while Azure Data Factory is a specific service within Azure for data integration and orchestration.

Is Azure Data Factory a data warehouse?

No, Azure Data Factory is not a data warehouse itself but can be used to orchestrate data movement and transformation workflows between data sources and data warehouses.