Hortonworks on Windows – Microsoft HDInsight & SQL Server – Part 1
I’m going to start a series here on using Microsoft’s Windows distribution of the Hadoop stack, which Microsoft has released in community preview here together with Hortonworks:...
View ArticlePSSUG November 2012 Presentation: Big Data with SQL Server
Thank you all for coming out on a rainy, snowy, cold evening to join us for this month’s PSSUG meeting! Here is a link to the slides that I used tonight during my presentation of Big Data with SQL...
View ArticleBig Data with SQL Server, part 2: Sqoop
I started off my series on Hadoop on Windows with the new Windows distribution of Hadoop known as Microsoft HDInsight, by talking about installing the local version of Hadoop on Windows. There is also...
View ArticleWhat Makes your Data Warehouse a “Big Data Warehouse”?
I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in...
View ArticleSQL Dude Adventures and BI Blogging
Hi Everyone! My apologies for the gap in blogging … It’s been a long couple of months transition over to the Open Source Software world and relocating to Orlando for Pentaho. It’s been an amazing...
View ArticleOPASS Discussions & Big Data in the Real World
If you are in the Orlando Sanford area tomorrow night (Thursday October 24 @ 6;30 register here) then come join the Orlando PASS gang where I will lead a discussion on Big Data and Big Data Analytics...
View ArticleMigrate Adventure Works Sales Cubes from SSAS MOLAP to Mondrian ROLAP
In the 2 previous blog entries about recreating the SSAS example cubes for Adventure Works sales BI on Pentaho, I focused on generating new departmental-sized ROLAP cubes through the thin client Auto...
View ArticlePentaho and HP Vertica – Big Data Analytics
Anyone going to the HP Vertica conference next week in Boston? (August 12, 2014) If so, stop by and say Hi at the Pentaho booth in the expo center! BTW, I uploaded a quick & short video that I...
View ArticleADF Mapping Data Flows: Optimize for Azure SQL Database
I’m going to use this blog post as a dynamic list of performance optimizations to consider when using Azure Data Factory’s Mapping Data Flow. I am going to focus this only to Azure SQL DB. I will post...
View ArticleADF Mapping Data Flows: Optimize for Azure SQL Data Warehouse
I’m going to use this blog post as a dynamic list of performance optimizations to consider when using Azure Data Factory’s Mapping Data Flow. I am going to focus this only to Azure SQL DW. I will post...
View ArticleADF Mapping Data Flows: Optimize for File Source and Sink
I’m going to use this blog post as a dynamic list of performance optimizations to consider when using Azure Data Factory’s Mapping Data Flow. I am going to focus this only to files. I will post...
View ArticleDynamic SQL Table Names with Azure Data Factory Data Flows
You can leverage ADF’s parameters feature with Mapping Data Flows to create pipelines that dynamically create new target tables. You can set those table names through Lookups or other activities. I’ve...
View ArticleADF Mapping Data Flows Parameters
Using Azure Data Factory Mapping Data Flows, you can make your data transformations flexible and general-purpose by using parameters. Use Data Flow parameters to create dynamic transformation...
View ArticleETL with ADF: Convert Pig to Data Flows
Here’s a brief posting on taking an ETL script written in Pig. I took an ETL example using Pig from the Hortonworks tutorials site and migrating it to ADF using Mapping Data Flows. It took me...
View ArticleADF Data Flows: Distinct Rows
Below is a method to use in ADF’s Mapping Data Flows to reduce the data stream in your data flows to only include distinct rows. This sample is available as a pipeline template here. Choose your...
View ArticleReduce Execution Time for Data Flow Activities in ADF Pipelines
In ADF Mapping Data Flows, there are 2 working modes: Debug mode and Pipeline mode. Debug mode is active when you turn on the Data Flow debug switch and the light is green, showing debug as active. You...
View ArticleCustom logging and auditing of ADF Data Flows
ADF Data Flows Custom Logging and Auditing Video ADF has a number of built-in capabilities for logging, monitoring, alerting, and auditing your pipelines. There are UI monitoring tools, telemetry logs,...
View ArticleMedian function in Azure Data Factory
To perform a median (middle value of a sorted list), you need to put a couple of transformations together. Below are the steps needed to use median in ADF using data flows. Sort your data by the field...
View Article