BigQuery GEOGRAPHY Support in Data Studio

Google Data Studio added recently support for BigQuery Geography field, which is a fantastic development and open all kind of new scenarios for creating free to share Maps.

It is straightforward, you just add a geography field and it just render, for example I have this dataset that contains polygons, lines and points

and here is the result

Some Observations

  • It seems the initial focus of the dev team was on polygons which are fully supported
  • You can’t fill color for lines yet
  • Although points support color coding, I could not make them smaller ( The Piles in a Solar Farm are obviously much smaller)

BI Engine to the rescue

Now the confusing part, BI Engine for Data Studio does not accelerate GEOGRAPHY yet , so you will incur BigQuery Cost, but if you connect the new BI Engine SQL interface, the Query will be accelerated, according to the devs, the BI Engine used in Data Studio is to be considered Version 1 , SQL Interface as the next version and they will be merged together eventually. ( This should be in the Documentation)

here is an example of a Query generated by Data Studio, I would say it is very complex SQL Query with analytical functions, UNNEST, you name it, and the New BI Engine support it just fine ( I really like Bytes billed 0 B)

A more interesting use case

I came across this excellent dataset https://openflights.org/data.html, and thought let’s try it with Data Studio, first I imported the two tables airports.dat and routes.dat  then using this SQL Query to generate the routes, which is a line between Source coordinates and Destination Coordinates

WITH
  xxxx AS (
  SELECT
    yy.Name AS source_name,
    yy.City AS source_city,
    yy.Country AS source_country,
    zz.Name AS destination_name,
    zz.City AS destination_city,
    zz.Country AS destination_country,
    DestinationairportID,
    SourceairportID,
    ST_MAKELINE(st_geogpoint(yy.Longitude,
        yy.Latitude),
      st_geogpoint(zz.Longitude,
        zz.Latitude)) AS route
  FROM
    `testing-bi-engine.test.airportroute` xx
  LEFT JOIN
    `testing-bi-engine.test.airport` yy
  ON
    SourceairportID= AirportID
  LEFT JOIN
    `testing-bi-engine.test.airport` zz
  ON
    DestinationairportID = zz.AirportID),
  ttt AS (
  SELECT
    source_name,
    source_city,
    source_country,
    destination_name,
    destination_city,
    destination_country,
    DestinationairportID,
    SourceairportID,
    ST_ASTEXT(route) AS route_wkt
  FROM
    xxxx
  GROUP BY
    1,
    2,
    3,
    4,
    5,
    6,
    6,
    7,
    8,
    9)
SELECT
  *,
  ST_GEOGFROMTEXT(route_wkt) AS route,
  ROW_NUMBER() OVER() AS id
FROM
  ttt

I save the Query in a table, then plotted using Data Studio

and here is the Result, which i share it in Reddit 🙂

I think it is fair to say, people love maps, and a lot of users appreciate that you can download the data straight from Data Studio , you can play with the report here

Really Excellent Works by Data Studio Team.

Drill Down to another page in Google Data Studio, the easy way

I have blogged already how to drill down to another page using Parameter, today, I find another approach, it is extremely easy, and a bit embarrassed why I have not thought about before !!!

I will be using administrative division of Algeria as an example, see the report here, please notice the pattern is not only for maps, personally I use drill down to another page extensively with finance reports, for example drilling down by Project, vendor, user etc

1- Create a map at the region Level

2- Add a new empty tab

3- Make the Map at the report level

4-make sure apply filter is on

5- in the second tab , hide the map either using send to back or add a big rectangle

6- add a new map at the city level

7- add a new button to navigate between page 1 and 2

basically when you create a report level map, the selection from page 1 get synchronized to page 2 , then the hidden map in page 2 will filter the map at the city level.

How to plot Digital Elevation Model in Data Studio.

TL;DR : a sample dataset with x,y,z,red,green,blue and a custom Viz in Google Data Studio Using Deck.GL point Cloud, see example here

I added a new dataset , so you can test it yourself, you can either load it using BigQuery or use the load file connector in Data studio.

section explain how we got the data, if you are only interested in testing the visual go to section 2.

1-How to get the Data

for some reason it it is extremely painful to get a dataset with x,y,z,r,g,b

luckily a couple of days ago, I was in twitter and saw this tweet by Michael Sumner

it turn out extracting coordinated and elevation is extremely easy using R, all you need is the center location and the dimension of the area you are interested in, and R ceramic will extract x,y,z automatically in a nice dataframe, then I took that data and uploaded it to BigQuery using the package bigrquery then plot using a custom Viz I built using Deck.gl ( see the linked report)

here is a script I used

library(raster)
library(ceramic)
library(bigrquery)
bq_auth("XXXXXXXX.json")
Sys.setenv(MAPBOX_API_KEY = "DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD")
cc <- ceramic::cc_location(cbind(14.428778,40.822973), buffer = c(2000, 2000), zoom = 15)
el <- ceramic::cc_elevation(cc)
el1 <- resample(el, cc, method = "bilinear")

df1 <- as.data.frame(cc,xy=TRUE)
df2 <- as.data.frame(el1,xy=TRUE)
df <- merge(x = df1, y = df2, by = c("x", "y"), all.x = TRUE)

df <-transform(df, lng=x/100000,lat=y/100000,red=layer.1,blue=layer.2,green=layer.3)
df <- df[c("lng", "lat","layer","red","blue","green")]
job <-  insert_upload_job("PROJECT_ID",
                "GIS",
                "VOLCANO",
                df,
                create_disposition = "CREATE_IF_NEEDED",
                write_disposition = "WRITE_TRUNCATE")
wait_for(job)

2-Plot the Data using Point Cloud Viz

the Custom Viz address is

or you can just copy the report and use your own data

all fields are required except tooltips, by default it will show coordinates

I used Mount Tahat as an example, it is a highest Moutain in the south of Algeria, extremely beautiful area

Data Studio limit the number of rows passed to a custom visual to 1 Million, here I made sure it is less than 750K as it is the maximum that can be downloaded from the visual

3-The end Results

Mount Uluru in Australia

Volcano Vesuvius in Italy

Using the new Convexhull function in BigQuery to reduce Geometry complexity

BigQuery recently introduced two new GIS functions, ST_CONVEXHULL and ST_DUMP

Read the announcement here , when I saw the announcement I already thought about this use case.

The Problem

Although showing map in BI software has improved dramatically in the last couple of years, still unless you use Tableau, there is always a hard limit how much data you can show in a map, even if you can show more, it is better to reduce the volume of data just for performance sake, users are so spoiled those day that they complain when their report does not show up in less than 2 second.

Although the example used here is very specified, I am sure it can be extended to other uses cases.

Let’s say you want to show a lot of points  with one colour coded attribute, in a lot of cases, the end user wants only to see the distribution of the attribute not the individual points, see here  

That’s a lot of points ( my real case is 58 Thousand)

Convexhull to the rescue

Convexhull is very handy the input will be group of points and the output will be a closed polygons, I use it a lot in QGIS, but the killer feature here, because it is SQL and the attribute are dynamic, (in my use case they changed daily), you can write a Query that dynamically generate new geometries, either polygons or linestring or even  keep the original points if they can’t be grouped.

Now the trick is we group by status and existing grouping, for example in this dataset.

  1. Check if in one area all the status is the same using count distinct, if in one area it is the same attribute, it will generate a polygons.
  2. if one area has multiple status and hence multiple colours then fine, we jump to the row level and generate line strings.
  3.  If one line string has multiple colors then we jump to points.

I built this SQL View with the help of  Mikhail Berlyant, the source data is here, replace “xxxxx.SolarFarm ” with your table.

WITH
  source AS (
  SELECT
    *,
    ST_GEOGFROMTEXT(CONCAT( "POINT (",x," ", y,")")) AS POINT,
    COUNT(DISTINCT status) OVER (PARTITION BY ROW) AS multiple_status,
    COUNT(DISTINCT status) OVER (PARTITION BY area) AS multiple_status_area
  FROM
    `xxxxxxx.SolarFarm`),
  tt AS (
  SELECT
    id, pole_nr,color,area, ROW,status, POINT,
    CASE
      WHEN multiple_status_area=1 THEN area
      WHEN multiple_status=1 THEN row
    ELSE
    CAST (id AS string)
  END
    AS newgroup
  FROM
    source),
  ff AS (  SELECT newgroup, ST_ASTEXT(ST_CONVEXHULL(ST_UNION_AGG(POINT))) AS WKT
  FROM
    tt
  GROUP BY
    1),
  xx AS (
  SELECT
    tt.newgroup,
    wkt,
    tt.status
  FROM
    tt
  LEFT JOIN
    ff
  ON
    tt.newgroup = ff.newgroup)
SELECT
  newgroup,
  wkt,
  status
FROM
  xx
GROUP BY
  1,
  2,
  3

and here is the result side by side with the original data from 3528 rows to 283 rows, that’s a big improvement,

as of July 2020, Google Data Studio does not support Geometry, and the total number of points is limited to 10K, you can use other custom Visual but currently tiles are blocked.

if you are using PowerBI to view the data, you need to use the excellent Icon Map as it support WKT geometry