Dynamic Geospatial Clustering using BigQuery GIS

I was reading this blog post and thought of a new use case, using OpenstreetMap Data and generate polygons based on the user Selection

First to reduce cost, we will select only all a subset of OpenstreetMap Data, you can use this post as a reference

my base table is OPENSTREETMAPAUSTRALIAPOINTS , which contains 614,111 rows

The idea is to provide some tag selection ( School, cafe etc) and let BigQuery generate a new polygons on the fly, the key function in this SQL script is ST_CLUSTERDBSCAN

WITH
  z AS (
  SELECT
    *
  FROM
    `test-187010.GIS.OPENSTREETMAPAUSTRALIAPOINTS`
  WHERE
    value IN UNNEST(@tags_selection)),
  points AS (
  SELECT
    st_geogpoint(x,
      y) AS geo_point,
    value AS type
  FROM
    z ),
  points_clustered AS (
  SELECT
    geo_point,
    type,
    st_clusterdbscan(geo_point,
      200,
      @ct) OVER() AS cluster_num
  FROM
    points),
  selection AS (
  SELECT
    cluster_num AS spot,
    COUNT(DISTINCT(type))
  FROM
    points_clustered
  WHERE
    cluster_num IS NOT NULL
  GROUP BY
    1
  HAVING
    COUNT(DISTINCT(type))>=@ct
  ORDER BY
    cluster_num)
SELECT
  spot AS Cluster,
  st_convexhull(st_union_agg(geo_point)) as geo_point,
  "Cluster" as type
FROM
  selection
LEFT JOIN
  points_clustered
ON
  selection.spot=points_clustered.cluster_num
  group by 1
union all
SELECT
  spot AS Cluster,
  geo_point ,
type
FROM
  selection
LEFT JOIN
  points_clustered
ON
  selection.spot=points_clustered.cluster_num

Usually I use PowerBI to Visualize the results using the Excellent icon map custom Viz , but as of this writing, PowerBI does not support yet Passing Parameters to a SQL Query when using BigQuery

Technically you can hardcode the values for Tags, but the whole point is to have a dynamic selection

I will use Data Studio instead and because the Query is not accelerated by BI Engine , and in order to reduce the cost, I made only 6 Tags available for user selection and hard code the distance between two points to 200 m.

Here is an example when selecting the tags (restaurant, school and fuel), I get 136 cluster

here when I zoom on 1 location, the result are pretty accurate

I am really looking forward to reproduce the same report using PowerBI when they add custom SQL Queries to BigQuery connector, it seems it is coming soon.

I think it is a good use case for parameters, GIS calculation are extremely heavy and sometimes all you need from PowerBI is to send Parameter values to a Database and get back the result.

you can play with the report here

Drill Down to another page in Google Data Studio, the easy way

I have blogged already how to drill down to another page using Parameter, today, I find another approach, it is extremely easy, and a bit embarrassed why I have not thought about before !!!

I will be using administrative division of Algeria as an example, see the report here, please notice the pattern is not only for maps, personally I use drill down to another page extensively with finance reports, for example drilling down by Project, vendor, user etc

1- Create a map at the region Level

2- Add a new empty tab

3- Make the Map at the report level

4-make sure apply filter is on

5- in the second tab , hide the map either using send to back or add a big rectangle

6- add a new map at the city level

7- add a new button to navigate between page 1 and 2

basically when you create a report level map, the selection from page 1 get synchronized to page 2 , then the hidden map in page 2 will filter the map at the city level.

Three years to finish a Dashboard

in 2017, at my previous job, we were using PowerBI Desktop as our reporting solution, but there was a big limitation, we couldn’t use the service, so sharing the reports was either in Excel or pdf.

I remember I did try different solutions (Rstudio, Qlik, SSRS), they were great Products, but you need some kind of server to share the reports. At that time all I wanted is a simple web app where people can click on a slicer and get a fancy charts.

At that time Google made their reporting solution free, I was really excited about Data Studio, a free product, extremely easy to share but unfortunately a bit slow and lacked some basic functionality, I still managed to build something but it was not really good

It is all history now, moved to another job, we have PowerBI service ( and Tableau), but still for some reason, I felt like a missed opportunity, what if Data Studio became a good enough to be used as a free report tool.

If I remember correctly 2017 and 2018, there was no major progress but then they released custom viz, which basically means you can port any javacript library relatively easily , I managed to build a custom viz see example here

and in sept 2019, BI Engine showed up !!

It was really a big Deal, BI Engine is an analytics in-memory Database , and it is fast and they gave away 1 GB for free, it means you can connect your data from BigQuery and pay nothing ( with a fair limit of course), this made this report possible

In May 2020, they finally released Google Map Integration , although with a limit of 10K points, it was not useful for my use cases ( Solar farm needs a lot of point around 40k to 60K)

That was great and all, but still I couldn’t write complex measures easily ( or maybe did not know how), but something changed in August 2020

At last we have Proper support for parameter, that changed everything, now you can write any complex business logic using SQL in BigQuery and visualize the results using Google Data Studio, and you can do a lot of fancy stuff see those examples

Still there was still a major bug, Pivot table in Data Studio show 0 for null values needless to say, it is extremely annoying although you can build workaround, it was a hack and not sustainable.

That was fixed last week

So yes, it took me three years to finish this report, BI Engine + Parameter + Custom Viz and a bug fix in the Pivot Table to make this report possible

I added a workflow explanation in the report, but basically create a reporting dataset as large flat fact table and show the results in BigQuery with further control by SQL Parameter, if the native visual are not satisfying, you can show pretty much anything using Vega-lite custom viz.

One aspect was impossible to do without Parameter is the dynamic grouping of dates, in the time series, the weekend update dynamically based on the cut off selected.

Please don’t get me wrong, there is still a lot of work to be done, but the foundation of the product is already there, I can see clearly the vision of the product team, hopefully they keep investing but faster this time ( Parameter Action, support for BigQuery geography field, analytics functions, Tiles for Custom Viz ……)

Take away:

– If you need near real time reports

– You want a reporting solution and don’t have a decent budget.

– Used Data Studio in 2017 and dismiss it.

I have a good news for you, BigQuery/Data Studio is a viable option now, and you get 1 TB free for BigQuery and 1 GB compressed in -memory for BI Engine, that’s a lot of free resources, and there is no catch, you can share securely with anyone, again totally free.

Although I am a PowerBI developer and I love it, I think it is very healthy for the industry to have more choices, 2021 will be exciting !!!

Row Level Security Using Parameter in Google Data Studio

saw this on twitter (Thanks Pablo), and I had an idea of using Regex_Match as currently comparing dimension does not work !!!

How RLS Works in Data Studio

Data Studio support RLS using google account ( can work with any email address not only gmail), so when you login using your account, Data Studio get your email adress which is unique to you, then you can use that email to filter rows .

in Some cases, maybe you want to implement security using username/Password, it turn out using parameter, it is trivial to implement, notice we are using Google sheet as a data Source, if you need more control, using SQL Parameter is even more Powerful

Warning : Storing password as plain text is dangerous

now for a very simple scenario, you have data and you want people to have access only to some rows

1- Create two parameters

usernameParameter and PasswordParameter

let’s start with usernameParameter, it has to be any value, and default value null or none etc

do the same for PasswordParameter

2- Create calculated field RLSfilter

basically, if both the values typed in the parameter control match then return 1 else 0

3- Create a Filter using the calculated field RLSfilter

4- Apply the filter to Visual

go to file, report setting, and add the filter, so it will automatically apply to all the visual in your report, or you can apply to every visual where it make sense only ( you may have multiple dataset where it does not apply)

you can add an image on the filter control to hide the password when the user type it.

You can see the report here

Limitation

This RLS apply at the report level, it means report editors can see everything.

if you want the report editors not to have total access, then you need to push down the filter upstream, either using email, or leveraging SQL Parameter in BigQuery