The first time I used Direct Lake I was blown away by the performance, it was just too good to be true,import performance with nearly instant refresh time, a month later, I have better understanding, the first impression is still valid but it is more nuanced, there is always a tradeoff

Import Mode

Let’s take a simple analytical pipeline using PowerBI service, reading a csv file from a web site, you will have something like this

Ingestion :

Download the csv
Sort it, and Compress it and save it into PowerBI Proprietary columnar file format in Azure Storage

Query :

When a Query arrive in the service load the columns from PowerBI file in Azure Storage to the Server RAM

When people use import mode they usually means those two stages (Ingestion and Query), refresh in import mode means ingest new data and load it to RAM

Direct Lake Mode

In Direct Lake mode, more or less the Query stage is the same, that’s a great technical achievement from the Vertipaq team, instead of scanning PowerBI storage file, you do the same for Parquet

But here is the catch: refresh is nearly instantaneous because it is not doing much, the ingestion of new data did not magically disappear but has to be done elsewhere.

Who does the Ingestion then ?

You can use Spark, Dataflow Gen2 , Fabric DWH etc to create and update the Parquet files, or if you are lucky and your system produce Delta lake tables with the right file size etc, you hit the jackpot

Why you may not want to use Direct Lake

It is a Fabric only feature.( edit : you can use shortcut to an existing Azure storage bucket , but you still need OneLake running, Hopefully one day, we can run Queries directly from an object store using PowerBI Desktop)
You use calculated column, calculated table
If your users interact with the data using only PowerBI reports then it is not worth the added complexity.
It does not work with PowerBI desktop, my understanding, it will fall back to Fabric SQL Engine.
Storing all the tables and metadata in one file is a very powerful concept and works very well in the desktop, which is still the main development environment for users.
Not specific to PowerBI but usually for pure performance , proprietary Storage files are faster for the simple reason, they are super optimized to the Query Engine and have no compatibility requirement.

Why you may want to use Direct Lake

If you have a use case where import mode was painful or literally did not work.
Somehow you have a system that produces Delta Lake Tables with the right file size and row groups.
Because ingestion is done outside PowerBI, you may get less memory pressure which is still the biggest bottleneck in PowerBI.
You want to share Data to non PowerBI users.

What will be really nice.

Today Both import and Direct Lake don’t push down filters to the storage file, I think there is an expectation that BI Query Engine should support data that don’t fit in memory.

It will be nice too if Direct Lake support more table format like Hive and Iceberg

	Querying a Fabric La… on Writing to SQL Server using…
	Benjamin on Running DuckDB at 10 TB s…
	mim on Running DuckDB at 10 TB s…
	Benjamin on Running DuckDB at 10 TB s…
	Running DuckDB at 10… on Running DuckDB at 10 TB s…

PowerBI Direct Lake misconception

Import Mode

Direct Lake Mode

Who does the Ingestion then ?

Why you may not want to use Direct Lake

Why you may want to use Direct Lake

What will be really nice.

Leave a comment Cancel reply

Import Mode

Direct Lake Mode

Who does the Ingestion then ?

Why you may not want to use Direct Lake

Why you may want to use Direct Lake

What will be really nice.

Share this:

Leave a comment Cancel reply