Expanded Table Behavior in DAX and Malloy

Expanded tables are a core concept in DAX, Malloy has something similar although with a default behavior:).

To see the difference let’s build the same Model in DAX and Malloy and see where it is the same and where it differ.

The Model is based on TPC-H Dataset, it is a simple model as it contains only 1 Base Table ” Lineitem”

The Same Model using Malloy

you can download the Malloy here : it is just a text file

Count the Number of customers

Malloy : results 999 982

Query: custoners_bought_something is {  
    aggregate: cnt is count( distinct customer.C_CUSTKEY)
                 }

DAX : 1 500 000

I know the table contains 1.5 M, so why Malloy is giving me wrong results, it turn out , it is by design, Malloy consider only the customers that bought something in lineitem, you can see it from the SQL Generated

DAX by default ignore the “graph” if the measure target only 1 table, to get the number of customers who bought an item, you need something like this

Take away

Maybe I am biased but I think DAX behavior make more sense, if I target only a table then the graph should be ignored, I think the relationship should be used only when I use fields from different tables.

How an Open Source PowerBI Desktop May Look like

Was playing with Malloy and DuckDB and had this silly idea, can I just reproduce something Like PowerBI Desktop Just using Open Source Tools, obviously I am talking about the Overall architecture, not exact functionalities.

For a Typical Data Analytical Workload, Broadly Speaking you get something like this :

Using a Combination of Python script for ETL, DuckDB as a SQL Engine and Malloy as a Data Model you get this nice folder of files, Currently Malloy Don’t support DuckDB as a storage format, instead I am just using a Parquet file as a local Cache, but it is possible to Query the remote DB Directly, you can find the files here

And here is a nice chart in Visual Studio Code, as of this writing, Malloy extension in VSC don’t support interactivity, they have a Viewer App, but it is not available in Windows yet.

Obviously it is a nice theoretical exercise, but one can imagine, if all those assets were bundled together in a zip file and a viewer that can natively read it. who knows, we may have something like an Open Source PowerBI Desktop equivalent.

I know it is a weird blog post, what I am trying to convey , Building a full stack BI Application is doable, it is just a matter of a lot engineering effort, and contrary to 10 years ago, we have a lot of open source building Block, Tableau had to pay for hyper, now not only a very fast SQL Engine is available as open source but a Modeling layer too, and if you care about analytics, it is a very good thing to the overall market.

Building Complex Data Model using Nested Data in Malloy

Last year, I wrote a blog on how to build a nested complex Data Model using BigQuery , please read it first, unfortunately using standard SQL to nest and unnest Data was a painful exercise, maybe it was due to my lack of experience, or simply my brain is wired to think only about Tabular Data. happy to say, using the new Data Modeling language Malloy the experience became way simpler.

for a start Malloy will show a nice expended view of the nest table

The Model is self explanatory.

Now you need just to write Queries, Malloy will generate a SQL behind the scene

For example Total values of all measures :

No Idea what the SQL Means to be honest

Now Grouped by Category

Now More tricky, let’s group by year, notice, the Budget has no date values, the results is not correct !!!!

instead, you first group by year then nest the second level of grouping something like this

Malloy has a native export to nested Data Structure, but for this exercise I want a tabular results, you can easily flatten the previous Query using Project, notice how you can simply reference a previous Query, the measure Progress is simply Actual/Budget

I think it is simple enough that a “Normal” BI person can work it, you can find the Model here, The Model currently works only with BigQuery, found some bugs with DuckDB connector, hopefully it will be fixed in the next update of Malloy.

if you works with nested data, Malloy make it nearly easy 🙂

Update :

Malloy just added a semantic “All”, to ignore a group by dimension, see the same Query using the new syntax , this is Amazing !!!!

First Look at Google Malloy

Malloy is a new Modeling Language created by the original Author of Looker, it was released last year under an Open Source license, and made available to Windows users just last week, as someone who is enthusiast about Data Modeling I thought it is worth having a look at it.

Currently, Malloy is available under a free extension in Visual Studio Core, Malloy don’t have any “calculation Engine”, all it does, you build your model in a text where you define Source Tables, measures, relationship and dimension and then you write Queries, Malloy will parse those Queries and Generate SQL Code, it is more or less how every BI tools works behind the scene.

Interestingly, Malloy extension has DuckDB installed by default, it means, we have a full semantic layer and a fast OLAP engine as an open source offering, that’s a very big deal !!!

I spend sometime building a simple model, just one fact table and two dimensions, you can see the code here

When you run a Query, it will show the results in tabular format and you can see the SQL generated, you can even define some basic Visual like bar chart et

BI Engine not supported

Malloy has excellent support for BigQuery ( for obvious reason) and does support PostgreSQL too, but the SQL Generated is non trivial, BigQuery default Engine render the SQL extremely fast, no problem with that, but BI Engine struggle, basically any non trivial calculation are not supported. ( cross join and correlated variable etc)

Measure Behavior

let’s create three trivial measures

red is Quantity { where:  color = 'red' } 
black is Quantity { where:  color = 'BLACK' } 
red_or_black is Quantity { where:  color = 'red' or  color = 'black'}

then I run a simple group by

As a DAX users the results are rather unexpected

  • Measure Black : Return 0, it is case sensitive , maybe Malloy should maybe add a setting for DuckDB to ignore case sensitivity ( or it is by design, I don’t know)
  • Measure Red : return 2 only for red and 0 elsewhere, in DAX by default it will return 2 everywhere
  • red_or_black : was expecting to see the sum of both red and blacked which is 4 repeated in all rows

here is the same using DAX ( you can change that behavior by using keepfilters but I am interested only in the default behavior)

I don’t know enough about the language yet, but it would have being useful to have an option in the measure to ignore the group by ( it seems it is coming )

Why You Should care

For some historical reason, all Modeling language were proprietary and based only on vendor implementation, as far as I know, this is the first fully open source implementation, I am sure Google has a long term vision for Malloy and will show up in more service, I would not be surprised if BigQuery somehow integrated Malloy as a free semantic layer, after all it works well in a consumption model, I have not used it enough to have any good intuition, but I like the direction of the product, and Good on Google for making it Open Source, and DuckDB for being such an awesome SQL Engine.

%d bloggers like this: