Custom logging and auditing of ADF Data Flows

ADF Data Flows Custom Logging and Auditing Video

ADF has a number of built-in capabilities for logging, monitoring, alerting, and auditing your pipelines. There are UI monitoring tools, telemetry logs, and integration with Azure Monitor to provide a rich set of tools for the administration of your ETL and data integration processes.

However, if you’d like to apply additional custom logging and auditing for your ETL data flows, you can use these techniques below which are all based on existing functionality found within ADF natively:

Data Flow pipeline activity output

Image may be NSFW.
Clik here to view. log9

With this technique, you will query the output metrics from your data flow activities in the pipeline and pass in the values you are interested to another data flow. The first data flow (ExecDataFlow) is the data transformation worker and the second data flow (Log Data) is the logger activity.

If you look at the output from your data flow activity execution, you will see the JSON payload returned by the activity.

Image may be NSFW.
Clik here to view. log12

You can pick out different metrics to log such as time for each transformation stage, source rows read, sink rows written, bytes read/written … For this example, I’m going to log the processing time for the Sink (the total time it took to write the rows to the sink) and the number of rows written:

@activity('ExecDataFlow').output.runStatus.metrics.sink1.rowsWritten

@activity('ExecDataFlow').output.runStatus.metrics.sink1.sinkProcessingTime

I am assigning those values to the logger data flow which takes in several integer parameters and simply serves the purpose of writing out those params to a text delimited file with no header to my output folder in ADLS. This makes the data flow very generic and reusable for logging.

The logger data flow uses an ADF data flow technique of pointing to a source CSV file in Blob Store that contains just a single row, a single column, and has no header.

The file content is simply this:

I call this file “dummyfile.txt” and I recommend keeping one of those around in your blob stores with an ADF dataset pointing it. It will allow you to generate data flows that don’t really use the source data. Instead, data flows like this logger, will generate values and use parameterized values via Derived Column transformations.

It’s an important technique to learn and repeat in ADF data flows. This way, I can set my source with this dummy source and then set the incoming parameter values in a Derived Column. Then I can write each logger parameter to a text delimited file.

Image may be NSFW.
Clik here to view. log14

Image may be NSFW.
Clik here to view. log13

Image may be NSFW.
Clik here to view. log11

This data flow can be re-used in other pipelines as your logger.

Row Count aggregation inside the data flow

With this technique, you add logging directly inside of your data flows. Here, you can just log row counts and sink those values to a text file or database table. Use a new branch in your data flow logic to create a separate logging branch. Add an Aggregate transformation with no grouping and use the count() function. Notice in the 2nd example below, I’m writing the counts of each insert, update, and upsert operation coming from my logic to audit and log my database operations. In both cases, using a new branch with a logging branch does not affect your transformation logic. However, this technique requires you to add this as non-reusable logic inside each data flow. The above technique with a separate logger data flow allows for reuse.

Image may be NSFW.
Clik here to view. log5

Image may be NSFW.
Clik here to view. log15

Row Count lookup activity verification

Another common technique is to count the number of rows from your data flow logic and compare it against the actual number of rows written to the database sink. This is important for auditing and validation.

Image may be NSFW.
Clik here to view. log10

In this example, we use the Lookup activity in the pipeline and query the total number of rows so that we compare it to the number of rows reported from our data flow logger.

This pipeline expression is the 3rd parameter sent to our data flow logger:

@activity('GetRowCount').output.firstRow.myrowcount

Now, when we look at the output file from our data flow logger, it shows the number of rows written from the activity, the time it took to execute in milliseconds, and the number of rows counted in the actual database itself, so that we can see the discrepancy:

9128,14913,9125

Custom logging and auditing of ADF Data Flows

Data Flow pipeline activity output

Row Count aggregation inside the data flow

Row Count lookup activity verification

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112