iceberg – Page 2 – Small Data And self service

I recently had a conversation about this topic and realized that it’s not widely known that Snowflake can read Delta tables hosted in OneLake. So, I thought I’d share this in a blog post.

Fundamentally, this process is similar to how XTable in Fabric works, but in reverse—it converts a Delta table to Iceberg by translating the table metadata ( AFAIK, Snowflake don’t use Xtable but an internal tool)

How It Works

External Volume and File Section

When creating an external volume in Snowflake that points to OneLake, only the Files section is supported. This isn’t an issue because you can simply add a shortcut that points to a schema.

SQL Code to Set Up External Volume and Map an Existing Table

CREATE OR REPLACE EXTERNAL VOLUME onelake_personal_tenant_delta
   STORAGE_LOCATIONS =
      (
         (
            NAME = 'onelake_personal_tenant_delta',
            STORAGE_PROVIDER = 'AZURE',
            STORAGE_BASE_URL = 'azure://onelake.dfs.fabric.microsoft.com/python/data.Lakehouse/Files',
            AZURE_TENANT_ID = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
         )
      );

DESC EXTERNAL VOLUME onelake_personal_tenant_delta;

One-Time Setup: Authentication

You’ll need to complete a one-time authentication step:

Copy the AZURE_CONSENT_URL from the output.
Open it in your browser: "AZURE_CONSENT_URL":"https://login.microsoftonline.com/xx/oauth2/authorize?client_id=yy&response_type=code"
Ensure you select the correct email address that has access to your tenant.
This will create an Azure multi-tenant app with a service principal.
Add this service principal to your workspace so that Snowflake can access the data.

Setting Up the Iceberg Table

CREATE OR REPLACE CATALOG INTEGRATION delta_catalog_integration
  CATALOG_SOURCE = OBJECT_STORE
  TABLE_FORMAT = DELTA
  ENABLED = TRUE;

CREATE DATABASE IF NOT EXISTS ONELAKE;

CREATE SCHEMA IF NOT EXISTS delta;

CREATE ICEBERG TABLE ONELAKE.delta.test
  CATALOG='delta_catalog_integration'
  EXTERNAL_VOLUME = 'onelake_personal_tenant_delta'
  BASE_LOCATION = 'aemo/test/';

SELECT COUNT(*) FROM ONELAKE.delta.test;

and here is the result

Limitations

Deletion Vectors Are Not Supported:
If your table contains deletion vectors, make sure to run OPTIMIZE to handle them properly.

This is a quick guide on correctly reading Iceberg tables from BigQuery. Currently, there are two types of Iceberg tables in BigQuery, based on the writer:

BigQuery Iceberg Table

This is the table written using BigQuery engine

Iceberg Tables Written Using the BigQuery Metastore

Currently, only Spark is supported. I assume that, at some point, other engines will be added. The implementation is entirely open source but currently supports only Java (having a REST API would have been a nice addition).

How OneLake Iceberg Shortcuts Work

OneLake reads both the data and metadata of an Iceberg table from its storage location and dynamically generates a Delta Lake log. This is a quick and cost-effective operation, as it involves only generating JSON files. See an example here

The Delta log is added to OneLake, while the source data remains read-only. Whenever you make changes to the Iceberg table, new metadata is generated and translated accordingly. The process is straightforward.

BigQuery Iceberg Doesn’t Publish Metadata Automatically

BigQuery uses an internal system to manage transactions. When querying data from the BigQuery SQL endpoint, the results are always consistent. However, reading directly from storage may return an outdated state of the table.

For BigQuery Iceberg tables, you need to manually run the following command to update the metadata:

EXPORT TABLE METADATA FROM dataset.iceberg_table;

you can run it on a schedule, or make it the last step in an ETL pipeline.

Iceberg Tables Using the BigQuery Metastore (Written by Spark)

If the Iceberg table is written using the BigQuery metastore (e.g., by Spark), no additional steps are required. The metadata is automatically updated.

The interesting part about Iceberg’s translation to a Delta table in OneLake is that it is completely transparent to Fabric workloads. For example, Power BI simply recognizes it as a regular Delta table. 😊

	Querying a Fabric La… on Writing to SQL Server using…
	Benjamin on Running DuckDB at 10 TB s…
	mim on Running DuckDB at 10 TB s…
	Benjamin on Running DuckDB at 10 TB s…
	Running DuckDB at 10… on Running DuckDB at 10 TB s…

Category: iceberg

Reading a Delta Table Hosted in OneLake from Snowflake

Recommended Documentation

How It Works

External Volume and File Section

SQL Code to Set Up External Volume and Map an Existing Table

One-Time Setup: Authentication

Setting Up the Iceberg Table

Limitations

Recommended Documentation

How It Works

External Volume and File Section

SQL Code to Set Up External Volume and Map an Existing Table

One-Time Setup: Authentication

Setting Up the Iceberg Table

Limitations

Share this:

BigQuery Iceberg Table

Iceberg Tables Written Using the BigQuery Metastore

How OneLake Iceberg Shortcuts Work

BigQuery Iceberg Doesn’t Publish Metadata Automatically

Iceberg Tables Using the BigQuery Metastore (Written by Spark)

Share this: