First Look at Incremental Framing in Power BI

TL;DR: Incremental framing is like CDC to RAM 🙂 It significantly improves cold-run performance of Direct Lake mode in some scenarios, there is an excellent documentation that explain everything in details

What Is Incremental Framing?

One of the most important improvements to Direct Lake mode in Power BI is incremental framing.

Power BI’s OLAP engine, VertiPaq (probably the most widely deployed OLAP engine, though many outside the Power BI world may not know it) relies heavily on dictionaries. This works well because it is a read-only database. another core trick is its ability to do calculation directly on encoded data. This makes it extremely efficient and embarrassingly fast  ( I just like this expression for some reason ).


Direct Lake Breakthrough

Direct Lake’s breakthrough is that dictionary building is fast enough to be done at runtime.

Typical workflow:

  1. A user opens a report.
  2. The report generates DAX queries.
  3. These queries trigger scans against the Delta table.
  4. VertiPaq scans only the required columns.
  5. It builds a global dictionary per column, loads the data from Parquet into memory, and executes queries.

The encoding step happens once at the start, and since BI data doesn’t usually change more that much, this model works well.


The Problem with Continuous Appends

In scenarios where data is appended frequently (e.g., every few minutes), the initial approach does not works very well. Each update requires rebuilding dictionaries and reloading all the data into RAM, effectively paying the cost of a cold run every time ( reading from remote storage will be always slower).


How Incremental Framing Fixes This

Incremental framing solves the problem by:

  • Incrementally loading new data into RAM.
  • Encoding only what’s necessary.
  • Removing obsolete Parquet data when not needed.

This substantially improves cold-run performance. Hot-run performance remains largely unchanged.


Benchmark: Australian Electricity Market

To test this feature, I used my go-to workload: the Australian electricity market, where data is appended every 5 minutes—an ideal test case.

  • Incremental framing is on by default, I turn it off using this bog
  • For benchmarking, I adapted an existing tool , Direct Lake load testing( I just changed writing the results to Delta instead of CSV), I used 8 concurrent users, the main fact Table is around 120 M records, the queries reflect a typical user session , this is a real life use case, not some theoretical benchmark.

Results

P99

P99 (the 99th percentile latency, often used to show worst-case performance):

  • Improvement of 9x–10x, again, your results may varied depending on workload, Parquet layout, and data distribution.

P90

P90 (90th percentile latency):

  • Less dramatic but still strong.
  • Improved from 500 ms → 200 ms.
  • Faster queries also reduce capacity unit usage.

Geomean

just for fun and to show how fast Vertipaq is, let’s see the geomean, alright went from 11 ms to 8 ms, general purpose OLAP engines are cool, but specialized Engines are just at another level !!!

This does not solve Bad Table layout problem

This feature improves support for Delta tables with frequent appends and deletes. However, performance still degrades if you have too many small Parquet row groups.

VertiPaq does not rewrite data layouts—it reads data as-is. To maintain good performance:

  • Compact your tables regularly.
  • In my case, I backfill data nightly. The small Parquets added during the day don’t cause major issues, but I still compact every 100 files as a precaution.

If your data is produced inside Fabric, VOrder helps manage this. For external engines (Snowflake, Databricks, Delta Lake with Python), you’ll need to actively manage table layout yourself.

First Look at Geometry Types in Parquet

Getting different parties in the software industry to agree on a common standard is rare. Most of the time, a dominant player sets the rules. Occasionally, however, collaboration happens organically and multiple teams align on a shared approach. Geometry types in Parquet are a good example of that.

In short: there is now a defined way to store GIS data in Parquet. Both Delta Lake and Apache Iceberg have adopted the standard ( at least the spec). The challenge is that actual implementation across engines and libraries is uneven.

  • Iceberg: no geometry support yet in Java nor Python, see spec
  • Delta:  it’s unclear if it’s supported in the  open source implementation (I need to try sedona and report back), nothing in the spec though ?
  • DuckDB: recently added support ( you need nightly build or wait for 1.4)
  • PyArrow: has included support for a few months, just use the latest release
  • Arrow rust : no support, it means, no delta python support 😦

The important point is that agreeing on a specification does not guarantee broad implementation. and even if there is a standard spec, that does not means the initial implementation will be open source, it is hard to believe we still have this situation in 2025 !!!

Let’s run it in Python Notebook

To test things out, I built a Python notebook that downloads public geospatial data, merges it with population data, writes it to Parquet, and renders a map using GeoPandas, male sure to install the latest version of duckdb, pyarrow and geopandas

!pip install -q duckdb  --pre --upgrade
!pip install -q pyarrow --upgrade
!pip install geopandas  --upgrade
import sys
sys.exit(0)

At first glance, that may not seem groundbreaking. After all, the same visualization could be done with GeoJSON. The real advantage comes from how geometry types in Parquet store bounding box coordinates. With this metadata, spatial filters can be applied directly during reads, avoiding the need to scan entire datasets.

That capability is what makes the feature truly valuable: efficient filtering and querying at scale, note that currently duckdb does not support pushing those filters, probably you need to wait to early 2026 ( it is hard to believe 2025 is nearly gone)

👉Workaround if your favorite Engine don’t support it .

 A practical workaround is to read the Parquet file with DuckDB (or any library that supports geometry types) and export the geometry column back as WKT text. This allows Fabric to handle the data, albeit without the benefits of native geometry support, For example PowerBI can read WKT just fine

duckdb.sql("select geom, ST_AsText(geom) as wkt  from '/lakehouse/default/Files/countries.parquet' ")

For PowerBI support to wkt, I have written some blogs before, some people may argue that you need a specialized tool for Spatial, Personally I think BI tools are the natural place to display maps data 🙂

Using gpt-oss 20B for Text to SQL 

TL;DR : 

As a quick first impression, I tested  Generating SQL Queries based on a YAML Based Semantic model, all the files are stored here , considering i have only 4 GB of VRAM, it is not bad at all !!!

to be clear, this is not a very rigorous benchmark, I just used the result of the last runs, differents runs will give you slightly different results, it is just to get a feeling about the Model, but it is a workload I do care about, which is the only thing that matter really.

The Experimental Setup

The experiment uses a SQL generation test based on the TPC-DS dataset (scale factor 0.1), featuring a star schema with two fact tables (store_sales, store_returns) and multiple dimension tables (date_dim, store, customer, item). The models were challenged with 20 questions ranging from simple aggregations to complex analytical queries requiring proper dimensional modeling techniques.

The Models Under Test

  1. O3-Mini (Reference Model): Cloud-based Azure model serving as the “ground truth”
  2. Qwen3-30B-A3B-2507: Local model via LM Studio
  3. GPT-OSS-20B : Local model via LM Studio

Usually I use Ollama, but moved to LM studio because of MCP tools support, for Qwen 3, strangely code did not perform well at all, and the thinking mode is simply too slow

Key Technical Constraints

  • Memory Limit: Both local models run on 4GB VRAM, my laptop has 32 GB of RAM
  • Timeout: 180 seconds per query
  • Retry Logic: Up to 1 attempt for syntax error correction
  • Validation: Results compared using value-based matching (exact, superset, subset)

The Testing Framework

The experiment employs a robust testing framework with several features:

Semantic Model Enforcement

The models were provided with a detailed semantic model that explicitly defines:

  • Proper dimensional modeling principles
  • Forbidden patterns (direct fact-to-fact joins)
  • Required CTE patterns for combining fact tables
  • Specific measure definitions and business rules

Multi-Level Result Validation

Results are categorized into five match types:

  • Exact Match: Identical results including order
  • Superset: Model returns additional valid data
  • Subset: Model returns partial but correct data
  • Mismatch: Different results
  • Error: Execution or generation failures

The Results: Not bad at all

Overall Performance Summary

Both local models achieved 75-85% accuracy, which is remarkable considering they’re running on consumer-grade hardware with just 4GB VRAM. The GPT-OSS-20B model slightly outperformed Qwen3  with 85% accuracy versus 75%. Although it is way slower

I guess we are not there yet for interactive use case, it is simply too slow for a local setup, specially for complex queries.

Tool calling

a more practical use case is tools calling, you can basically use it to interact with a DB or PowerBI using an mcp server and because it is totally local, you can go forward and read the data and do whatever you want as it is total isolated to your own computer.

The Future is bright

I don’t want to sounds negative, just 6 months ago, i could not make it to works at all, and now I have the choice between multiple vendors and it is all open source, I am very confident that those Models will get even more efficient with time.

AI is Coming for Us

There are moments in life when you know things will never be the same. I remember distinctly when Gary showed me PowerPivot 10 years ago, and I knew that working with data would become as easy as playing with Excel. Another such moment was two days ago when I connected Claude Desktop to a database and asked, “What do you think?”

It was a strange experience. It wasn’t your typical “chat with your data and give me a nice chart” interaction. It was more like talking to a human and asking them to create a report. The LLM started by listing all the tables, examining the data, and making sense of what the dataset was about. Somehow, it figured out that the power generation figures were in MW and that to convert them to MWh, you need to divide by 12.

There’s a simple reason why this approach is so powerful compared to a typical chat with your data workflow: the LLM has read access to the data. It’s still secure and can only read what you’re authorized to access. As far as I know, these LLMs don’t auto-learn and don’t use the data for training, at least when you use an enterprise API.

Another interesting observation: as a non-programmer, I watched AI’s progress in coding with great excitement and never felt much sympathy for human coders. I thought they were exaggerating the threat. Somehow, my reaction changed when I noticed that AI will get very good at analytics too.

Note: I’ll refer to LLMs as AI for simplicity. Kurt has an excellent blog post worth reading, and thanks to Pawel for telling me about this whole MCP thing.

Typical “Chat with Your Data” Workflow

The important thing here is that AI doesn’t have access to your data at all. You collect the maximum knowledge about your data and send your questions with that knowledge. You get back SQL or DAX statements that you send to your server to get answers. if the question is not clear enough then they will ask for clarification, for example, what is the biggest country in the world, AI will reply, is it per size, by GDP etc, It’s much more complex in real life, but that’s the core idea.

Basically, we spend a lot of effort making sure AI can’t see your data. Sometimes, as a user, you wonder why this AI can’t answer some very obvious questions. Just imagine: as a data analyst, if someone asked you to give them a report without even seeing any numbers!

Using MCP

In this setup, the AI is unleashed. It can read the data directly (again, using only what you’re allowed to access and ideally read only), basically AI acts like an agent and has more autonomy, it is not limited only to your metadata.

Example Using Data from OneLake

I have this data in OneLake, and it’s cleansed data:

Because we don’t have an MCP server yet for Fabric DWH, I used the DuckDB MCP server to read the data from OneLake. For convenience, instead of using direct query, I imported the data into a local DuckDB file:

import duckdb

con = duckdb.connect()
con.sql("ATTACH 'aemo_delta.duckdb' AS db; USE db")

for tbl in ['duid', 'summary', 'calendar', 'mstdatetime']:
    con.sql(f"""
        CREATE OR REPLACE TABLE {tbl} AS 
        FROM delta_scan('abfss://serving@onelake.dfs.fabric.microsoft.com/datamart.Lakehouse/Tables/aemo/{tbl}')
    """)

con.close()

You need to install MCP and configure the connection with Claude Desktop. To be clear, it should work with any MCP client, but so far, that’s the best I could find. Who knows, maybe one day Power BI Desktop will act as an MCP client (I literally made up this idea; this is not a hint or anything).

Then you add this config to Claude Desktop:

{
  "mcpServers": {
    "mcp-server-motherduck": {
      "command": "uvx",
      "args": [
        "mcp-server-motherduck",
        "--db-path",
        "/tmp/llm/aemo_delta.duckdb"
      ]
    }
  }
}

For me, it feels like ODBC for AI. The protocol is getting adopted by everyone.

The Experience

Since the data is public, I shared the whole chat. What I really like is how AI approaches the problem, first by looking at the tables. This is very human-like behavior.

If you read the chat, you’ll see it’s not perfect. It casually skipped hydro from the renewable conversation and didn’t calculate MWh correctly, although it did yesterday.

Some Observations

  • Even for a simple use case, you still need a semantic model. If I had a measure MWh = MW/12, the AI would always use it, at least in theory. For a complex model, it’s even more critical, Having said that, AI can do modeling just fine 🙂 do we need human for that ?
  • surprisingly in that simple workflow, i can replace every compute , what’s really critical is storage !!!
  • All my data is publicly available, so I wasn’t worried about security. For any enterprise work, you can’t really use something like Claude Desktop, but rather solutions like Azure AI Foundry.
  • For now, most models don’t acquire new knowledge during serving, but who knows what will happen in the next 10 years? You can imagine an AI that learns just from interaction with users and data, which opens all kinds of new questions. Do you need specific models for every tenant, for every user ? We’re not there yet, it is something we will have to deal with it.
  • Never give MCP write access to anything