DuckDB is one of the most promising OLAP Engine in the market, it is open Source, very lightweight, and has virtually no dependencies and work in-Process (think the good Old MS Access ) and it is extremely fast, specially in reading and querying parquet files. and has an Amazing SQL Support
The ODBC driver is getting more stable, I thought it is an opportunity to test it with PowerBI, notice JDBC was always supported and can be used with SQL frontend like DBeaver and obviously Python and R has a native integration
I download the ODBC driver using the latest version 0.3.3, you need to check always the latest release and make sure it is the right file.
Installing the binary is straightforward, but unfortunately you need to be an administrator
Select ODBC, if the driver was installed correctly, you should see an entry for DuckDB
As of this writing there is a bug in the driver, if you add a path to the DuckDB database file, the driver will not recognise the tables and views inside it, Instead I selected
And defining the base Table as a CTE, reading Directly from a folder of parquet files
Just for fun, I duplicated the parquet file just to reach the 1 Billion Rows mark
The total size is 30 GB compressed.
1 Billion rows in a Laptop
And here is the results in PowerBI
The size of the PowerBI report is only 48 KB, as I import only the results of the query not the whole 30 GB of data, yes separation of Storage and Compute make a lot of sense in this case.
Although the POC in this blog was just for fun, the query take 70 seconds using the ODBC driver in PowerBI ( which is still in an Alpha stage), The same query using dbeaver take 19 second using the more mature JDBC driver, and it works only with import, for Direct Query you need a custom connector and the use of the Gateway, But I see a lot of potential.
There are a lot of people doing very interesting scenarios, like Building extremely fast and cheap ETL pipeline just using Parquet, DuckDB running on a cloud Functions. I think we will hear more about DuckDB in the coming years.
One thought on “Using DuckDB with PowerBI”