PowerBI Incremental refresh Parquet files, without a Database.

TL;DR, you can incremental refresh PowerBI using Parquet files stored in an Azure Storage without using any Database in the middle, you can download sample pbix here

I am using this blog post by Gilbert Quevauvilliers which is based on a technique from Rafael Mendonça, Please read it first

Maybe read this, it is using Synapse Serverless , but has a section where you can Partition your data using Python to Parquet

1-Add a new Table, Parquet

make sure it is not loaded, here is the M code

let
     Source = AzureStorage.DataLake("https://xxxxxx.core.windows.net/parquet"),
     #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Folder Path"}),
     #"Inserted Text Between Delimiters" = Table.AddColumn(#"Removed Other Columns", "Text Between Delimiters", each Text.BetweenDelimiters([Folder Path], "D", "/", 1, 0), type text),
     #"Renamed Columns" = Table.RenameColumns(#"Inserted Text Between Delimiters",{{"Text Between Delimiters", "Date"}}),
     #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type datetime}}),
     #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Folder Path"})
 in
     #"Removed Columns"

here is the result

3-Merge using Inner Join

to read the parquet file content we use this function , notice we used inner join in the previous step to avoid reading null Content, which generate errors when you refresh in the service

Parquet.Document([Content])

and here is the final table

we configure incremental refresh to refresh the Last 2 days

4- Testing in PowerBI Service

as you can see the second refresh is way faster then the First one

here is the partition Table

now let’s check the transaction history from Azure storage, I refreshed again just to be sure

The second refresh read substantially less data as only two files are read

I Think with PowerBI desktop supporting Parquet, we will see more exciting scenarios, I can’t wait for Dataflow to support export to Parquet !!!!

if you are still reading, I appreciate a vote on this idea, Having an option in Dataflow to export to a dynamic file name

One thought on “PowerBI Incremental refresh Parquet files, without a Database.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s