Querying a Onelake Table with RLS and CLS Using DuckDB’s MSSQL Extension

Onelake supports Row Level Security and Column Level Security. These protections work when you use trusted engines such as Power BI or Spark running inside Microsoft Fabric. In those environments, the compute engine operates within a controlled boundary, so security rules can be enforced properly.

However, if you try to access the storage directly from a Python notebook or a local engine running on your laptop, including open source Spark, direct access is blocked. Otherwise, Row Level Security and Column Level Security would be meaningless. Security only works when the engine itself is trusted and governed.

This blog show a workaround by laveraging SQL Endpoint, These policies are enforced at the SQL endpoint, meaning any external tool that connects through it — including DuckDB — automatically respects them.

Let’s walk through a quick example.

The Data

We have a power.duid table containing 626 rows of Australian power generation facilities. Columns include DUID, Region, FuelSourceDescriptor, Participant, State, latitude, and longitude.

Configuring Row-Level Security

In the Lakehouse role readsometables, we add an RLS rule that restricts visibility to a single region:

SELECT * FROM power.duid WHERE Region='WA1'

Members of this role will only see rows where Region = 'WA1'.

Configuring Column-Level Security

On the same role, we enable CLS and grant Read visibility only to specific columns: DUID, Region, FuelSourceDescriptor, State, latitude, and longitude. The Participant column is excluded.

Querying with DuckDB’s MSSQL Extension

From any Python environment, we can connect to the SQL endpoint using DuckDB’s community MSSQL extension and Azure authentication:

import duckdb
from azure.identity import DefaultAzureCredential

conn = duckdb.connect()
token = DefaultAzureCredential().get_token("https://database.windows.net/.default").token

conn.sql(f"""
    ATTACH IF NOT EXISTS
    'Server=<your-sql-endpoint>;Database=data'
    AS data (TYPE mssql, ACCESS_TOKEN '{token}')
""")

if you are running it inside Fabric notebook, first you need to updgrade duckdb,

!pip install duckdb --upgrade
import sys
sys.exit(0)

the run this code

import duckdb
conn = duckdb.connect()
token = notebookutils.credentials.getToken('sql')
conn.sql(f"""
install mssql from community ;
ATTACH if not exists
'Server=SQL_endpoint;Database=data'
AS data (TYPE mssql, ACCESS_TOKEN '{token}')
""")



Now when we query, RLS and CLS are enforced server-side:

conn.sql("SELECT DISTINCT(Region) FROM data.power.duid").show()

Only WA1 — the RLS filter is working. And if we select all columns:

conn.sql("SELECT * FROM data.power.duid LIMIT 4").show()

you get an error, that you can not select Participant

No Participant column — CLS is doing its job, now if you remove it , everything works fine

First Look at Dynamic M Query parameter using SQL Server

Edit 26 May 22 Please vote to have table function available in Datamart

With PowerBI February 2022 release , finally we can use M Dynamic parameter with SQL Server in Direct Query Mode, I was really excited, I had a couple of patterns where I used M Parameter with BigQuery to do calculation on the fly that iare not supported natively in PowerBI, for example Geospatial calculation.

My first example was dynamic changing of dimension, it just works as it is relatively simple, see example here, very excited it works.

Then I tried to port this example from BigQuery, basically you select some points in a map, and you get back the polygon and the area, The calculation has to be done on the fly, pre calculating the results is not practical, generating all possible calculation is just too much.

The first step of getting the points selected as a nice list was very easy, see code here

let
TagsList =  if Type.Is(Value.Type(tag_selection), List.Type) then 

     Text.Combine({"'" , Text.Combine(tag_selection, ",") , "'"})

    else

Text.Combine({"'" , tag_selection , "'"}),



finalQuery= "select 1 as poly, value from string_split("& TagsList&",',') ",  

      
Source = Sql.Database("XXXXXXXX", "DB", [Query=finalQuery])

in
    Source

I selected some points in Icon Map and dynamic M parameter get populated, I was really excited, The hard part is done and all I need is to write some T-SQL

T-SQL Rabbit hole

I am no SQL expert by any means, by some weird coincidence, my first Database was BigQuery, (I used MS Access long time ago ), so this is the first time I tried to use T-SQL in a non trivial way ( at work I use T-SQL to retrieve data, maybe doing some joins and stuff in that nature but no GIS for sure).

The Good thing is , the amount of resources available on SQL Server is phenomenal, I got some indication on Stack overflow, but something weird happen.

I start writing T-SQL code in SSMS and it works fine, when I copy it to PowerBI, it generate errors, I was really angry and can’t understand what’s going on, I thought it is something weird about PowerBI.

I know that PowerBI, embed any custom SQL inside a Subquery, that’s very standard, actually Tableau does the same as well as Google Data Studio.

Turn out, SQL Server don’t support CTE inside a subquery

Chris has blogged about it here, that was very kind of him, basically his points is just write a view in a database, it is better to have the logic upstream anyway, which totally make sense, except it is not a realistic solution, Business users don’t just get write access to the database, actually they are very lucky to get even read access.

Update : we now have Datamart which will change everything.