Today was playing with a report in PowerBI and I got this idea of exporting data to BigQuery from PowerQuery, let me tell you something, it is very easy and it works rather well, PowerQuery is an amazing technology ( and it is free).
in PowerBI,you can export from R or Python visuals but there are a limitation of 150K rows, but if you use PowerQuery, there is no limitation ( I tried with a table of 23 Millions records and it works)
here is the code using Python, but you can use R
import pandas as pd
from google.cloud import bigquery
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:/BigQuery/test-990c2f64d86d.json"
client = bigquery.Client()
dataset_ref = client.dataset('work')
table_ref = dataset_ref.table('test')
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.schema = [
job = client.load_table_from_dataframe(dataset, table_ref, job_config=job_config)
job.result() # Waits for table load to complete.
interesting after the step in Python we get a table, simply expand it
here is the total rows of the table in PowerBI
the results in BigQuery
ok, PowerQuery flow can execute many times, it is a black magic knowledge that’s only a handful of people knows, but in this cases, it does not matter, the BigQuery job truncate the tables every time, so there is no risk of data duplication.
probably you may ask why do that if there are a lot of data preparation tools that natively support BigQuery, based on my own experience, most of my data sources are Excel files and PowerQuery is just very powerful and versatile specially if you deal with “dirty” format
the second question is probably what’s the added value ? just load the data directly into PowerBI, the answer is very easy, data ubiquity
I want everyone to be able to access the data, regardless of the front end tools.