duckrun

duckrun is a package I built using AI exclusively, to solve pain points I hit when using Fabric Python notebooks. I like DuckDB very much, but I was tired of manually discovering table names every time and writing long Python deltalake code just to write a Delta table, so I combined those two packages under one helper package to make my workflow smoother. Recently I discovered how awesome dbt is, so why not add a dbt adapter too. Then someone trolled me about the lack of snapshot isolation when doing read-modify-write on the same table. Luckily DuckDB now exposes the read version in delta_scan (before, you had to do the weird attach thing), so I have something that might actually be useful. The rest of the blog is written by AI. It wrote my code, so I don’t see the issue; it’ll even write my blog.

I also added a web page with some projects I built: dbt projects I ported, and some non-trivial SQL statements with proper snapshot isolation.

https://djouallah.github.io/duckrun

So yes, it’s a hack, until Iceberg matures or we get a better Delta write story in DuckDB.

At its core, duckrun is a four-part split: DuckDB executes the SQL, Arrow streams the result, delta-rs commits it to Delta, and dbt (optionally) orchestrates the whole DAG.

It’s storage-agnostic, running anywhere DuckDB and delta-rs can reach: local filesystem, S3, GCS, ADLS, OneLake. In practice I test it on the local filesystem and Microsoft Fabric OneLake. S3 and GCS use the same code path, but I barely touch them, so treat them as untested.

The gap it fills

DuckDB reads Delta well (delta_scan). It does not write it well: its Delta support is blind INSERT only (no UPDATEDELETE, or MERGE), and its trajectory points at writing through Unity Catalog, which defeats the point of filesystem-native Delta. So if you want DuckDB’s engine but need upserts on Delta tables, there is no single tool that does both today.

ApproachReads DeltaWrites Delta (merge/update/delete)DuckDB SQL engine
DuckDB alone❌ (blind INSERT only)
delta-rs alone
duckrun

duckrun’s answer is a split:

  • DuckDB runs all SQL and model logic, and reads Delta through delta_scan views.
  • delta-rs handles every write: overwrite, append, merge, delete, update.
  • Arrow bridges the two: a DuckDB relation is streamed to delta-rs over the C-stream interface.
  • Snapshot isolation ties it together: each read pins a Delta version, and each read-modify-write commits against the version it read, so a concurrent commit errors instead of silently clobbering.

That’s the whole architecture. The README calls it glue, and that’s accurate: each layer does only the one thing it’s set up to do.

Two costs come with this. Two engines split one RAM budget with no shared allocator, and the Arrow handoff isn’t zero-copy: DuckDB’s native vector format isn’t Arrow, so each batch is decoded and re-encoded as it streams across the Arrow C Data Interface (a batch-at-a-time ArrowArrayStream, so even a large write never fully materializes in memory). duckrun manages this with a cgroup-aware memory split, sampled per job so it doesn’t get OOM-killed on Fabric/k8s, where DuckDB otherwise sees the whole node. The fractions are of the effective limit: on a merge, 0.3 to DuckDB and 0.6 to delta-rs spill, leaving 0.1 slack; on a plain write, 0.85 to DuckDB. The split leans toward delta-rs because that’s where the memory goes: profiling a merge attributes ~99% of resident memory to delta-rs and only ~15 MB to DuckDB. Keeping every write behind delta-rs also means the bridge (and the memory juggling) can be deleted the day DuckDB ships a real Delta writer, without touching the read or state model.

connect: read first

connect() is read-only by default, so you can point it at a lakehouse and explore with no chance of an accidental write. Tables are discovered for you, with no manual name bookkeeping:

import duckrun
conn = duckrun.connect("abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>/Tables/dbo")
conn.sql("SHOW TABLES").show()
conn.sql("select status, count(*) from orders group by status").show()
df = conn.table("orders").toPandas() # or .toArrow() for a streaming reader
# time travel
from duckrun import DeltaTable
DeltaTable.forName(conn, "orders").history() # newest-first: version, timestamp, operation
conn.read.format("delta").option("versionAsOf", 0).load(".../Tables/dbo/orders").show()

Multiple catalogs: attach

Attach more lakehouses and query across them by three-part name. This is where the data warehouse case shows up: in Fabric a Warehouse is just a write-locked Lakehouse, so you attach it read_only=True next to a writable Lakehouse and join the two:

conn.attach("abfss://…/warehouse.Warehouse/Tables", name="warehouse", read_only=True)
conn.attach("/data/reference", name="local")
conn.sql("""
select *
from warehouse.mart.facts f
join local.dbo.lookup l on l.id = f.id
""").show()

Same code against a local path, s3://gs://, or az://.

Writing: DML and merge

Opt into writes with read_only=False. Then just write SQL: plain DML routes straight to delta-rs, with no Python deltalake boilerplate:

conn = duckrun.connect("abfss://…/Tables/dbo", read_only=False)
conn.sql("create or replace table clean_orders as select * from orders where amount > 0")
conn.sql("insert into clean_orders select * from late_orders")
conn.sql("update clean_orders set status = 'shipped' where status = 'packed'")
conn.sql("delete from clean_orders where amount = 0")

MERGE works the same way; reference the literal target / source aliases:

conn.sql("""
merge into clean_orders as target
using updates as source
on target.id = source.id
when matched then update set *
when not matched then insert *
""")

…or, if you’d rather build it, the DeltaTable API mirrors Delta’s:

from duckrun import DeltaTable
src = conn.sql("select * from updates")
DeltaTable.forName(conn, "clean_orders").merge(src, "target.id = source.id") \
.whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

conn.sql accepts CREATE [OR REPLACE] TABLE ASINSERTUPDATEDELETEALTER … ADD COLUMNMERGE, and DROP (a soft tombstone: delta-rs has no drop, so data files persist until you purge them). CREATE TEMP TABLE and CREATE VIEW stay native DuckDB. Things that can’t be honored cleanly, like multi-statement strings or UPDATE … FROM, are rejected rather than silently mishandled.

The surface is small. It mirrors the Delta/DeltaTable API so notebook code reads familiarly, but there is no fluent transform builder and no second SQL engine. Transforms are SQL, run by DuckDB.

Snapshot isolation

This started as someone trolling the project over read-modify-write on the same table, and it turned into the guarantee I rely on most. Honestly, I come from a Dataflow Gen1 background (basically a single writer), so all this concurrency stuff never made much sense to me. In a Lakehouse, where anyone can write to a table, even accidentally, it suddenly becomes a real factor. (Thanks Raki for the harsh feedback, lol.) A lakehouse has no transaction manager and no single-writer guarantee: two pipelines, a double-fired job, or a notebook racing a scheduled run can all commit to the same table. The dangerous shape is read → compute → write: if someone commits in between, a naïve write at HEAD silently overwrites them, a lost update with no error. duckrun’s job is to turn that into a CommitFailedError.

What’s fenced, and what isn’t. deleteupdate, and merge are read-modify-writes (they read the current rows, compute a change, then commit), so duckrun pins each to the version it read and delta-rs’s OCC validates the commit over (read, HEAD]. A conflicting concurrent commit makes them fail. Plain append and overwrite are not fenced, by design: they match Spark’s SaveMode. An append rebases onto HEAD (appends don’t conflict), and an overwrite is last-writer-wins. So the fence is automatic exactly where a lost update is possible, and absent where it isn’t.

How the OCC works. There’s no lock manager anywhere. A Delta commit is the atomic creation of the next log entry, _delta_log/…{N+1}.json, so if a racing writer already wrote it, your put-if-absent loses and delta-rs raises CommitFailedError. Optimistic, filesystem-native, no coordinator. The gap OCC alone leaves: it only checks the commit instant, not the version you read.

Pinning the version you read, the same whether you write SQL or a DataFrame. A DeltaTable handle captures the version at forName() (call it vB), and every merge() / delete() / update() through it commits against vB, so OCC validates the whole (vB, HEAD] window, not just the instant of the commit. conn.sql("delete …" / "update …" / "merge …") funnels into the exact same engine path, pinned the same way: there is no second code path for SQL. And DuckDB exposing the read version in delta_scan(…, version => vB) (the reason for the duckdb >= 1.5.4 floor) lets the read sit on vB too, so a read-modify-write split across statements still lands on one snapshot. Spark/Delta fences only the commit instant; this fences the version you actually read.

append_if_unchanged / overwrite_if_unchanged are the fenced siblings of plain append/overwrite. I had to coin the terms, because Delta/Spark has no built-in fenced append (you’d hand-write a MERGE for it). They’re a version compare-and-swap: load the table at the version you read and pass max_commit_retries=0 so delta-rs won’t rebase. If anything committed since, the target version is already taken and the commit fails. For the watermark/idempotent-append case this is cheaper than a merge, with no target scan and no key join. (safeappend is the deprecated alias.)

The dbt adapter

A thin wrapper over dbt-duckdb that adds Delta-backed table and incremental materializations; everything else (views, seeds, sources, tests, plugins) is inherited. Point a profile at a lakehouse and dbt run:

my_project:
outputs:
dev:
type: duckrun
root_path: "abfss://<ws>@onelake.dfs.fabric.microsoft.com/<lh>/Tables"

dbt with no catalog. Normally dbt leans on a metastore to know what exists and to resolve {{ this }}ref(), and is_incremental(); duckrun has none, and state still survives across separate dbt build processes. At run start the adapter discovers Delta tables on disk (glob for local/az/s3/gs; the OneLake DFS REST API for abfss, since DuckDB can’t glob it reliably) and registers each as a delta_scan view named to match dbt’s database.schema.identifier. That view is what makes those references resolve against real Delta tables. Each materialization pre-registers its own {{ this }} view before running, then recreates the view after the delta-rs write, since that write lands a new Delta version and the old view would otherwise point at stale files. The namespace is rebuilt from storage on every run instead of read from a catalog.

Incremental strategies:

StrategyBehavior
merge (default with unique_key)upsert
insertinsert new keys only
append (default without unique_key)blind append
append_if_unchanged / safeappendappend, commit only if version unchanged: cheap, no target scan, errors on conflict
microbatchdelete+insert per event_time window

Compaction and 7-day vacuum run automatically (every run for overwrites; past a file-count threshold for incrementals).

Two limits worth knowing up front. First, writes are single-threaded within a run (in-process delta-rs isn’t thread-safe; cross-process concurrency is fully supported). Second, constraints are enforced at the write boundary, not stored in the table: a contract with not_null columns is checked by a guard query before the write, so a null fails with NOT NULL constraint failed and the prior Delta version is left untouched. Two caveats there: delta-rs can’t persist column constraints into Delta metadata, and timestampNtz columns can’t be written yet.

On versions, duckrun is deliberately conservative because the underlying libraries move fast and break: duckdb >= 1.5.4 (first stable with delta_scan(version => N)) and deltalake == 1.5.0, the first release with MERGE max_spill_size. On Microsoft Fabric, pip install --upgrade and restart the kernel, since the bundled DuckDB is older than the floor.

Testing: the only way to trust AI code

If the AI writes the code, what makes it trustworthy? Not that it compiles, and not that the unit tests are green: an AI will happily write a test that passes for the wrong reason. The only signal I actually trust is integration testing: run a real dbt project end to end and check the tables it lands on real storage.

So that’s where most of the test weight sits. duckrun runs a few hundred tests across 26 files, but the ones that matter most are the 8 integration projects that build for real, most against live Microsoft Fabric OneLake (abfss://), not a mock. To keep them honest, with real models rather than toys I wrote to flatter the adapter, I ported existing dbt projects from the web: other people’s models kept as close to original as I could, with attribution:

  • sde_dbt_tutorial, a port of josephmachado/simple_dbt_project: raw tables → bronze typing → a Delta-backed SCD2 customer snapshot → a merge-incremental clickstream fact → an orders_obt gold mart.
  • coffee, ported from JosueBogran/coffeeshopdatageneratorv2: CSV ingest over https, a deduped SCD2 product dim, a region-partitioned fact, a revenue mart.
  • aemo, my own dbt_fabric_python_delta, built against live OneLake. The full run is published as browsable dbt docs: fct_scada is a 360M-row Delta table you can inspect yourself, not a screenshot.
  • snapshot_pin — a concurrent-writer test that asserts the guarantee above end to end: one writer reads a version, a second commits underneath it, and the first writer’s stale commit is rejected on real storage rather than silently overwriting.
  • plus a TPCH merge/append/overwrite spill benchmark, a connection-API demo on live NYC TLC taxi data, and a multi-catalog lakehouse + warehouse + local join.

The rendered catalogs for these (real Delta stats, row counts, last-modified) are on the project page.

On top of the projects, the adapter runs the official dbt adapter test suite (dbt-tests-adapter, the same conformance suite every dbt adapter is measured against) at 126/135 passing (93%), regenerated on every push to main. The documented failures are deliberate choices: no persistent views in open Delta, and rejecting merge configs that would silently diverge.

The evidence that matters is tables on real storage, hit the same way a user would, not a passing test count and not “the AI said it works.”

What does it mean to build a package you don’t understand?

I should be honest: I don’t understand the code in detail. But that was always true. duckrun is glue over DuckDB and delta-rs (written in C++ and Rust), and I don’t have the slightest idea how those work internally either. Almost nobody who builds on a library understands its guts. So what does “writing a package” actually mean?

For me, two things. Expressing the problem, knowing the pain well enough to say exactly what should happen, and making the design decisions that follow: delta-rs for every write, delta_scan views for reads, snapshot isolation as the contract. The AI writes the code; I own the problem and the shape of the solution.

The third thing is what makes it real: tests, and a lot of them. duckrun runs an extensive unit and integration suite, but I only actually trust it when I see tables land in OneLake. Code that passes locally and code that materializes correctly on real storage are not the same claim.

One trick I’ve learned: use a second agent to verify the first one’s work, not the agent that wrote it. The catch is that AI still cheats to make a test pass: it’ll weaken or game the check even when it plainly knows that isn’t the right thing. I hope that improves. Until it does: don’t trust anything it produces. Verify it against reality.

How far Python alone can take you on Delta

1. delta-rs is an ACID Delta writer

delta-rs implements the Delta Lake protocol natively. mergeupdate, and delete go through optimistic concurrency control on every commit. No external coordinator, no catalog service. Two writers race for the same version of the log, one wins, the other retries.

All you need is a path. No metastore to provision, no catalog endpoint, no JDBC connection, no warehouse to wake up. A folder on disk (or on ADLS / S3 / GCS) is the whole interface.

Setup: B is a Delta table being fed a series of CSV batches (batch_001.csvbatch_002.csv, …). Each merge should ingest only files B hasn’t seen yet.

A naming note: the project is delta-rs but the Python package is deltalake (pip install deltalake). On Fabric, stick with what’s preinstalled — Python notebooks already ship with deltalake and OneLake access configured.

From the notebook:

# Bootstrap target B with batch_001 already ingested
write_deltalake(Target_PATH, pa.table({...}), mode="overwrite")
vB = DeltaTable(Target_PATH).version() # v0
# Compute the rows to ingest from the target's current state
con.sql(f"ATTACH '{Target_PATH}' AS tgt (TYPE delta, VERSION {vB});")
our_rows = con.sql("""
SELECT s.id, s.value, parse_filename(s.filename) AS filename
FROM read_csv_auto('source_csv/*.csv', filename=true) s
WHERE parse_filename(s.filename) NOT IN (SELECT DISTINCT filename FROM tgt)
""").arrow()
# → 80 new rows from batch_002..005
# First merge: 80 inserts, commits cleanly
DeltaTable(Target_PATH).merge(
source=our_rows,
predicate="t.filename = s.filename",
source_alias="s", target_alias="t",
).when_not_matched_insert_all().execute()
# Same merge re-run: 0 inserts. The predicate is idempotent.
DeltaTable(Target_PATH).merge(...).when_not_matched_insert_all().execute()

Two commits, both correct. The second run does nothing because the predicate already sees the rows. The transaction model travels with the table itself: move the folder, open it from another machine, and the next writer continues from the last commit.

write_deltalake(mode="append") and write_deltalake(mode="overwrite") are blind on purpose. Blind append means N concurrent appenders all succeed and the result is the union of their rows — exactly what you want for event streams or log ingestion. Blind overwrite means the new data wins and whatever was there is gone — what you want when the writer is the authoritative source for the table. OCC only kicks in for operations that actually read the target (mergeupdatedelete), since those are the only ones where a concurrent change can invalidate what you just computed.

2. I want the full read-to-write transaction, Python API is fine

A common pattern: DuckDB or Polars reads, transforms, and hands an Arrow table to delta-rs to commit. The notebook above is exactly that shape — DuckDB computes “filenames not yet in B” and delta-rs merges the result.

Inside delta-rs, OCC still works. What it cannot see is the read on the other side of the engine boundary. delta-rs knows about the merge it is about to commit; it does not know that DuckDB read B at version vB thirty seconds ago.

Carry the snapshot across the boundary by pinning both sides to the same version:

vB = DeltaTable(Target_PATH).version()
import duckdb
con = duckdb.connect()
con.sql(f"ATTACH '{Target_PATH}' AS tgt (TYPE delta, VERSION {vB});")
our_rows = con.sql("SELECT ...").arrow()
DeltaTable(Target_PATH, version=vB).merge( # ← pinned
source=our_rows,
predicate="t.filename = s.filename",
source_alias="s", target_alias="t",
).when_not_matched_insert_all().execute()

The OCC check now compares against vB instead of HEAD. If another process touched B in the meantime — say a parallel job deleted batch_001.csv — the pinned merge raises:

Failed to commit transaction: Commit failed: a concurrent transaction deleted data this operation read.

Catch it, recompute the diff against fresh state, retry. On the Polars side, pl.read_delta(path, version=vB) accepts the same pin, so the pattern works for any reader that exposes versioned reads.

The pin is just a number. No new infrastructure, no shared coordinator, still path-based.

3. I don’t want the Python API, I want SQL only

If you would rather write SQL — say, drive the pipeline from dbt — your options on Delta today are Spark and Fabric Data Warehouse. Both have supported dbt adapters and work great in production. I have to admit, I was hoping DuckDB would fill that gap, since it is a database and SQL-level transactions are what you expect from a database. The market went the other way: investment is going into catalog-based lakehouse formats (DuckLake, Iceberg), and the DuckDB Delta writer that does exist is tied to Unity Catalog and limited to blind appends. I don’t see them investing in a file-based conflict resolver any time soon 🙂 Lakesail seems interested in this use case, but it is still too early to call.

Takeaway

I personally use delta-rs for CSV ingestion, appends, and recording results from high-concurrency performance tests — it is fast, cheap, and bullet-proof in those scenarios. The open source maintainers are very helpful and care deeply about the product, as they use it themselves in production. But it is not the right tool for every case; Data Warehouse and Spark are more appropriate for complex workloads. With time you intuitively pick the tool that makes sense for a particular job and how much compute you can spend. None of that has to be an either/or: at the end of the day it is a lakehouse, and the whole concept of a lakehouse is having the option to choose the engine. That option matters — if we say only one engine (open source or not) is blessed for writes, then there is no point in the concept of a lakehouse.


Notebook: https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/TableFormat/delta/occ.ipynb

Thanks Raki for keeping me honest:)

Thanks to Ion for explaining how version worked when doing merge: https://www.linkedin.com/in/ionkoutsouris/

Edit : how about Spark

Thanks to Frithjof for explaining Spark behaviour : The merge fixes one snapshot at transaction start (current HEAD = post-delete) and uses it for both its scan and its conflict check. Internally consistent — but bound to HEAD-at-merge-start, which Spark chose, not to the state our read saw, same behaviour when using delta_rs with a lazy dataframe : https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/TableFormat/delta/occ_spark.ipynb

Power BI with DuckDB, 4 years later

Four years ago I wrote a blog about using DuckDB with Power BI in DirectQuery. It got a fair number of likes on LinkedIn 🙂 along with the one comment I didn’t want to hear: how does this work in production? (Craig, if you’re reading this, you were right.)

Back then I thought the technology was the hard part and the rest would sort itself out. It didn’t.

The ODBC driver never really worked in any non-trivial setup. Filters didn’t push down, decimal precision was buggy. It has gotten better since, but two show stoppers remained:

  • DuckDB is in-process, so the driver is the database. There’s no warm, long-running session. Every query starts from scratch.
  • I don’t think those drivers can realistically be certified (personal opinion). And Power BI Service, or any hosted BI service for that matter, is not going to host an in-process engine for free. An on-prem data gateway is not really a good option either.

In 2026 things are way better. MotherDuck (DuckDB’s SaaS) shipped a PostgreSQL endpoint. Problem solved: Power BI speaks Postgres, and it works out of the box.

Then last week DuckDB released Quack. For my own sanity I’ll just call it “DuckDB Server.” It is just an extension; a single function call and you have a server !!

My first reaction was annoyance. Four years of waiting, and they shipped a proprietary wire protocol. I was hoping for pg wire. I want my driver to work. I don’t really care about a 2x improvement if nothing interoperates.

Luckily I was partially wrong. Within two days there was an ADBC driver from gizmodata/adbc-driver-quack, and, to my surprise, a Power BI custom connector from Curt Hagenlocher (think of him as the Linus of Power Query). my understanding it is a side project, not official Microsoft.

And somehow, the whole thing worked. It was beautiful.

But lesson learned from last time: this is experimental, with no guarantee the connector will ever be certified.

The main change from the 2022 post is that instead of pointing at parquet files, I’m pointing at a catalog and getting tables back, like an actual database instead of a pile of files and duckdb got way better.

High level architecture

  1. OneLake Iceberg Catalog — OneLake exposes data as tables. You need three things:
    • Endpoint: https://onelake.table.fabric.microsoft.com/iceberg
    • An Entra ID auth token
    • Path to the Lakehouse/Warehouse: workspace_name/Lakehouse_name.Lakehouse

  1. DuckDB + iceberg extension — reads the catalog and the underlying parquet over HTTPS.

  1. Entra IDaz account get-access-token --resource https://storage.azure.com/ mints a short-lived bearer token. No service principal, no app registration. I have a script that grabs the token, and I opened duckdb-azure#170 hoping to make this much simpler.

  1. DuckDB Endpoint — turns the engine into a TCP server on 127.0.0.1:9494, speaking DuckDB’s native wire protocol (whatever that means).

  1. The ADBC Driver — Python client and Power BI share the same DLL, you need to manually install it from curt github page

You can download all the files here

Power BI

Let’s just share a video. Yes, 600M rows, warm run in my laptop

Python Notebook

TPC-H SF=10 (10 GB), 22 queries, run twice in the same session via client.ipynb. Numbers are seconds, copied straight from the notebook output.

ColdWarm
Total~5 min 29 s~30 s

Cold time is dominated by parquet I/O over HTTPS from OneLake. Bandwidth and seek count, not CPU. Warm runs hit DuckDB’s in-process buffer cache, Onelake endpoint is in another continent and my internet provider is horrible 🙂

Optimization on this stack should target bytes read and seeks (codec, row-group size, predicate pushdown, range prefetch), not query plans.

This is exactly why server mode make sense, as the warm cache is shared by all client (notebook, Power BI, AI Agent)

Not production ready

  • The Entra token has a ~1h TTL. As far as I can tell, DuckDB has no way to auto-refresh tokens.
  • The driver is not certified, so it can’t be used in the service, if you want it added to PowerBI, create an idea in Fabric forum and vote
  • DuckDB Server is new. Don’t expect SQL Server maturity yet 🙂
  • DuckDB’s remote file cache is RAM only. When you restart DuckDB, you lose it and have to deal with the cold-run pain again and egress fees 😦
  • The DuckDB Azure extension is still pretty rough in places. To be fair, they’ve openly said they don’t have the bandwidth.

Hopefully it won’t take another four years to make this production ready.

Still, seeing DuckDB as a single binary serving a 600M row table to Power BI was genuinely fun. and The Iceberg catalog is awesome !!!

Ensuring safe single-writer for DuckLake on OneLake using file lease

DuckLake supports multi-writer just fine — but only if your catalog is a real database, like Postgres (there’s some interest in SQL Server support too). But if all you have is object storage and a SQLite or DuckDB file as the catalog, you’re stuck with single-writer: object stores aren’t real filesystems, so the DB file can’t be locked. Nothing stops two processes from writing to it at the same time and corrupting it.

If single-writer is enough for you (one notebook, one pipeline, one user), you don’t need to stand up a database server. You just need accidental concurrent runs to fail fast.

The trick: take a blob lease

OneLake speaks the ADLS API, so you can take a lease on a blob — a mutex for free (it seems S3 needs DynamoDB and GCS needs a homemade lock object). Each run does:

  1. Acquire a lease on metadata.db in abfss://.
  2. Download it to local disk of the notebook.
  3. Point DuckLake at the local copy and do the work.
  4. Upload the modified file under the lease.
  5. Release the lease.

A second notebook that starts while the lease is held fails immediately on acquire_lease. It can’t even read a stale copy. and you can’t delete the file using the UI , I can see already some uses cases here:)

What about crashed runs?

ADLS leases are either 15–60 seconds fixed, or infinite. Fixed leases need a heartbeat — annoying inside a notebook. Infinite leases work until something crashes — then the file is stuck.

The fix: take an infinite lease, but stamp acquired_at = <utc iso> into the blob’s own metadata when you acquire. When the next run hits a lease conflict, read that timestamp. Older than 12 hours? Call break_lease and re-acquire. A crashed run self-heals within 12 hours. You can shorten that window, or break the lease manually with a one-line script if you can’t wait — there’s a snippet in the README.

Code is here.