powerquery – Page 2 – Small Data And self service

Share Data from a PowerBI Dataset using Dataflow

Using PowerBI Dataflow to share data between departments is increasingly getting more popular, with the rise of managed self service offering, it is not unusual to find a non centralised Data warehouse but rather every department managing it is own data, I don’t know enough to say anything useful on which model works better in the long term.

this blog is about one use case I encountered recently, and thought it maybe worth sharing and to be very clear it is always good practise to share the data from the upstream Data source , but in this case, there was a non trivial business logic applied inside PowerBI dataset.

Usually we have two options.

A user can just create a new thin report assuming he has build permission
With Direct Query on composite model, a user can mix between live connection to a dataset and import different sources.

Both approach are fine, but in this case, the data consumer from another departement is just interested in 1 table (a couple of dimension and some measures) and because there is a non trivial logic in the Model, we can’t get the data from the source system, and it is rather important not to provide Build permission.

In my humble opinion sometimes giving access to a whole Data model with a lot of tables, relationship etc does seems an overkill and potentially confusing , instead I am suggesting an easier approach, sharing a dataflow that run a DAX Query against the model, and return exactly the agreed schema,basically a flat table which will not change even if the structure of Model change.

it is very straightforward and works with both Pro and premium ( Premium works better with large models, the timeout duration is more generous, and anyway PPU is very cheap)

Build a visual that show the required data

I will use a random dataset as an example, let’s say the data consumer wants this data updated daily ( real life example is substantially more complex)

click on performance analyzer , refresh visual then copy the Query

Clean the Query in DAX Studio

Paste the Query in DAX studio, don’t get intimidated by the apparent complexity , The main Query start by VAR __DS0Core

Now in a simplified format

// DAX Query

  EVALUATE 

    SUMMARIZECOLUMNS(
      'MstDate'[SETTLEMENTDATE],
      'REGION'[REGION],
      TREATAS({"QLD"}, 'REGION'[REGION]),
      TREATAS({"5 Minutes"}, 'unit'[description]),
      "average_Price", '00-Measures'[average_Price],
      "INITIALMW_Sum", '00-Measures'[INITIALMW_Sum]
    )

Alternatively you can build The Query straight in DAX Studio using the Query Builder

Connect to PowerBI service endpoint using PowerBI Desktop

you can read the documentation here, basically you connect to SQL Server Analysis Services which is the Engine of PowerBI,

Note : I am using my personal tenant.

Copy PowerQuery code to Dataflow

Dataflow don’t have a connector to Analysis Services at least in the UI, but coping the Query just works ( notice the double Quotes in QLD)

let
    Query1 = AnalysisServices.Database("powerbi://api.powerbi.com/v1.0/myorg/aemo_data", "aemo_data",
     [Query=
    "
    // DAX Query

  EVALUATE 

    SUMMARIZECOLUMNS(
      'MstDate'[SETTLEMENTDATE],
      'REGION'[REGION],
      TREATAS({""QLD""}, 'REGION'[REGION]),
      TREATAS({""5 Minutes""}, 'unit'[description]),
      ""average_Price"", '00-Measures'[average_Price],
      ""INITIALMW_Sum"", '00-Measures'[INITIALMW_Sum]
    )
    "
    ,Implementation="2.0"])
in
    Query1

I am not suggesting it is a good practise, in an ideal world everything is in a nice Cloud DWH with business logic and all, and the BI tool is just a front end to run aggregation and show nice charts, but real life is not perfect and it is useful to have the option when you need it.

Edit : The original title of this blog was using Dataflow as Poor Man’s Data mesh, but I had to be realistic as of this writing Dataflow can only be consumed by a PowerBI Client, not even Excel is supported and I know about Azure Storage, but with the current paranoia about security, it is extremely hard to have that option on for a lot of companies , Hopefully Dataflow product team open up a little bit for other Client too.

Calculate the shortest Path in a network in PowerBI

This blog is just another use case where we can leverage M parameter to perform calculation that can not be done in PowerBI, see example here for clustering and Area calculation

To be clear, it is always possible to precalculate outside PowerBI and just import the results, but in the current case, it can be tricky, let’s say you have 1000 points and you want to check the distance between two points using an existing route network, you will need to calculate 1000 X 1000 combination, instead the idea here, you select two points then M Parameter will send the selection to a Database using Direct Query to do the calculation and get the result back.

For this example I am using BigQuery, but you can use any Database that support M Parameter (Snowflake, Azure ADX etc)

Import all the bus stop location in a particular Area

Because of Covid situation, the Openstreetmap dataset is free to Query, here is the SQL Query, I am just using Brisbane as a reference

CREATE OR REPLACE TABLE
  test-187010.gis_free.Brisbane_Bus_Stop AS
SELECT
  ST_CENTROID(geometry) AS center_location,
  tags.key,
  tags.value
FROM
  `bigquery-public-data.geo_openstreetmap.planet_features`,
  UNNEST(all_tags) tags
WHERE
  key = 'highway'
  AND value='bus_stop'
  AND ST_INTERSECTS(geometry,
    -- Selecing Brisbane area
    ST_GEOGFROMtext("POLYGON((152.9432678222656 -27.33776203832722,153.2563781738281 -27.33776203832722,153.2563781738281 -27.594864493271448,152.9432678222656 -27.594864493271448,152.9432678222656 -27.33776203832722))"))

you can use this handy website to generate a polygon from a map

Import Road network

The same we will use a subset of Openstreetmap dataset , first we select Key= Highway and to improve performance we only the main values ( primary, secondary road etc)

CREATE OR REPLACE TABLE
  `test-187010.gis_free.brisbane_Road_Network`
CLUSTER BY
  geometry AS
SELECT
  geometry,
  tags.key,
  tags.value
FROM
  `bigquery-public-data.geo_openstreetmap.planet_ways`,
  UNNEST(all_tags) tags
WHERE
  key = 'highway'
  AND value IN ("motorway",
    "motorway_link",
    "primary",
    "primary_link",
    "secondary",
    "secondary_link",
    "tertiary",
    "tertiary_linkt",
    "runktrunk_link")
  AND ST_INTERSECTS(geometry,
    --Select you Area Here:
    ST_GEOGFROMtext("POLYGON((152.9432678222656 -27.33776203832722,153.2563781738281 -27.33776203832722,153.2563781738281 -27.594864493271448,152.9432678222656 -27.594864493271448,152.9432678222656 -27.33776203832722))"))

jut to get an idea he is how the road network looks

Time for calculation

Unfortunately as of this writing BigQuery GIS does not support the function to find the shortest path between two points in a network, luckily I find this user defined function written in Javascript, the good news it works as expected but javascript will always be slower compared to a native SQL function, anyway here is the SQL Query

WITH
  initial_parameter AS (
  SELECT
    *
  FROM
    UNNEST(['POINT(153.023194 -27.563979)','POINT(152.979212 -27.49549)'] ) AS element ),
  mynetwork AS (
  SELECT
    ARRAY_AGG(geometry) roads
  FROM
    `test-187010.gis_free.brisbane_Road_Network_cluster` ),
  calculation AS(
  SELECT
    `libjs4us.routing.geojson_path_finder`(roads,
      st_geogfromtext(a.element),
      st_geogfromtext(b.element)) AS tt
  FROM
    mynetwork,
    initial_parameter a,
    initial_parameter b
  WHERE
    a.element>b.element
  LIMIT
    100)
SELECT
  ST_ASTEXT (tt.path) AS GEO,
  tt.weight AS len
FROM
  calculation

And here’s the result, a linestring and the length in Km, the Query took 7 second, to be honest I have no idea about the calculation complexity, so not sure if it is fast or not 🙂

PowerBI M Parameter

After we make sure the Query works with two fixed points, now we need just to make it interactive, so the user can select any two points, and that exactly what M parameter do

The Table path is using Direct Query

The Table Bus_Stop is import mode, which is used to as the Parameter filter

The Parameter is Tag_Selection , for a very detailed explanation, Please read this blog first

and here is the M Query

let
TagsList = 
    if 
    //check to see if the parameter is a list
      Type.Is(
        Value.Type(tag_selection), 
        List.Type
      ) then 
        //if it is a list
        let
          //add single quotes around each value in the list
          AddSingleQuotes = List.Transform(
              tag_selection, 
              each "'" & _ & "'"
            ),
          //then turn it into a comma-delimited list
          DelimitedList = "[" & Text.Combine(
              AddSingleQuotes, 
              ","
            ) &"]"
          
        in
          DelimitedList
    else 
      //if the parameter isn't a list
      //just add single quotes around the parameter value
      "['" & tag_selection & "']",
    Source = 
    Value.NativeQuery(GoogleBigQuery.Database([BillingProject="testing-bi-engine"]){[Name="test-187010"]}[Data],
     "WITH
  initial_parameter AS (
  SELECT
    *
  FROM
    UNNEST("& TagsList &" ) AS element ),
  mynetwork AS (
  SELECT
    ARRAY_AGG(geometry) roads   FROM   `test-187010.gis_free.brisbane_Road_Network_cluster` ),
  calculation AS(
  SELECT
    `libjs4us.routing.geojson_path_finder`(roads, st_geogfromtext(a.element), st_geogfromtext(b.element)) AS tt
  FROM
    mynetwork,    initial_parameter a,    initial_parameter b
  WHERE
    a.element>b.element
  LIMIT
    100)
SELECT
  1 AS DUMMY,  CASE     WHEN ARRAY_LENGTH("& TagsList &") =2 THEN ST_ASTEXT (tt.path)  ELSE  NULL END  AS GEO,
  CASE    WHEN ARRAY_LENGTH("& TagsList &") =2 THEN tt.weight  ELSE  0 END   AS len
FROM  calculation"
       , null, [EnableFolding=true])
   
    in
       Source

Notice I added this condition ARRAY_LENGTH(“& TagsList &”) =2 then 0, just to reduce the calculation when the user select only 1 point, Currently in PowerBI, there is no way to have Query reduction option for cross filtering

Icon Map

Icon map is the only visual that can render WKT geometry in PowerBI, this previous blog explain how we simulate multi layer interaction

Performance

The Performance unfortunately it is a bit disappointing, around 20 second, javascript UDF is slow and PowerBI is very chatty , which is a nice way to say, PowerBI send a lot of SQL Queries,everytime I select two points, PowerBI send 4 Queries

The first Query is when I select the first point, hopefully one day we will have an option to action cross filtering only after we finish the selection

Query 2 and 3 are identical and are used to check the field type of the Table, I wonder why PowerQuery is sending the same Query twice

Query 4 is the real Query that bring the result

you can download the pbix here

Edit : Carto which is the same company that released the javascript function is now offering a native SQL functions which should be substantially faster, I have not used it as it is commercial, but if you have a massive network, maybe it is worth it, just to be clear I have no affiliation with them.

Edit : added the same report but now using Tableau

Using PowerBI M Parameter to calculate a polygon Area

TL;DR : PowerBI does not support GIS Area calculation, in this blog we use M Parameter to leverage third party Database to do the calculation, Works only In Direct Query Mode.

ink to Public report

With the Update of August 2021, M Parameter support multi value selection, see this Previous blog for a little background, this open some interesting new use cases that was not possible before in PowerBI.

Again, before you get too excited, it works only with Direct Query and Database that support M Parameter ( BigQuery, Snowflake, Azure ADX etc)

One example is calculating the area of a group of arbitrary points

In Icon Map, I select a group of points (Notice here, Icon map is acting as Parameter Action filter )
The List of points are used in an M Parameter
A SQL Query is sent back to the Database that support M Parameter in my case I am using BigQuery
BigQuery generate a wkt and calculate the area , you can calculate the distance too or any metrics that use geometry
Plot the results in another Icon Map

The Query is straightforward, and the best part because we are not running a Query against any table in BigQuery , it does not cost anything, you can register here for free and no credit card is required

Here is the M Query

let
TagsList = 
    if 
    //check to see if the parameter is a list
      Type.Is(
        Value.Type(tag_selection), 
        List.Type
      ) then 
        //if it is a list
        let
          //add single quotes around each value in the list
          AddSingleQuotes = List.Transform(
              tag_selection, 
              each "'" & _ & "'"
            ),
          //then turn it into a comma-delimited list
          DelimitedList = "[" & Text.Combine(
              AddSingleQuotes, 
              ","
            ) &"]"
          
        in
          DelimitedList
    else 
      //if the parameter isn't a list
      //just add single quotes around the parameter value
      "['" & tag_selection & "']",
      
Source = Value.NativeQuery(GoogleBigQuery.Database([BillingProject="xxxxxx"]){[Name="xxxxx"]}[Data], "WITH
  xxx AS (
  SELECT
    *
  FROM
    UNNEST( "& TagsList &" ) AS element),
  yyy AS (
  SELECT
    ST_CONVEXHULL( ST_UNION_AGG(ST_GEOGFROMTEXT(element))) AS geo
  FROM
    xxx)
SELECT
  ST_ASTEXT(geo) AS WKT,
  st_area(geo) AS area
FROM
  yyy", null, [EnableFolding=true])
in
    Source

And here is the result.

Future Improvement

One thing I would really like is the possibility to show the result in the same Map, unfortunately to the best of my knowledge a Table in DAX can not filter itself, see this example

in Icon map it is possible to display wkt and points at the same time , but as you can see from the screenshot wkt geometry in the table does not change based on internal filter selection, the other viz works fine.

M Parameters has a very interesting application, and I am excited to try other tricks 🙂

Edit : I appreciate some votes here for the option to pass filter selection to the same Visual

PowerBI Incremental refresh Parquet files, without a Database.

TL;DR, you can incremental refresh PowerBI using Parquet files stored in an Azure Storage without using any Database in the middle, you can download sample pbix here

I am using this blog post by Gilbert Quevauvilliers which is based on a technique from Rafael Mendonça, Please read it first

Maybe read this, it is using Synapse Serverless , but has a section where you can Partition your data using Python to Parquet

1-Add a new Table, Parquet

make sure it is not loaded, here is the M code

let
     Source = AzureStorage.DataLake("https://xxxxxx.core.windows.net/parquet"),
     #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Folder Path"}),
     #"Inserted Text Between Delimiters" = Table.AddColumn(#"Removed Other Columns", "Text Between Delimiters", each Text.BetweenDelimiters([Folder Path], "D", "/", 1, 0), type text),
     #"Renamed Columns" = Table.RenameColumns(#"Inserted Text Between Delimiters",{{"Text Between Delimiters", "Date"}}),
     #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type datetime}}),
     #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Folder Path"})
 in
     #"Removed Columns"

here is the result

3-Merge using Inner Join

to read the parquet file content we use this function , notice we used inner join in the previous step to avoid reading null Content, which generate errors when you refresh in the service

Parquet.Document([Content])

and here is the final table

we configure incremental refresh to refresh the Last 2 days

4- Testing in PowerBI Service

as you can see the second refresh is way faster then the First one

here is the partition Table

now let’s check the transaction history from Azure storage, I refreshed again just to be sure

The second refresh read substantially less data as only two files are read

I Think with PowerBI desktop supporting Parquet, we will see more exciting scenarios, I can’t wait for Dataflow to support export to Parquet !!!!

if you are still reading, I appreciate a vote on this idea, Having an option in Dataflow to export to a dynamic file name

	Power BI with DuckDB… on Using DuckDB with PowerBI
	Querying a Fabric La… on Writing to SQL Server using…
	Benjamin on Running DuckDB at 10 TB s…
	mim on Running DuckDB at 10 TB s…
	Benjamin on Running DuckDB at 10 TB s…

Build a visual that show the required data

Clean the Query in DAX Studio

Connect to PowerBI service endpoint using PowerBI Desktop

Copy PowerQuery code to Dataflow

Share this:

Import all the bus stop location in a particular Area

Import Road network

Time for calculation

PowerBI M Parameter

Icon Map

Performance

Share this:

Future Improvement

Share this:

1-Add a new Table, Parquet

3-Merge using Inner Join

4- Testing in PowerBI Service

Share this: