dlt.destinations.impl.ducklake.ducklake
DuckDB object hierarchy
Here are short definitions and relationships between DuckDB objects. This should help disambiguate names used in Duckdb, DuckLake, and dlt.
TL;DR:
- scalar < column < table < schema (dataset) < database = catalog
 - Typically, in duckdb, you have one catalog = one database = one file
 - When using 
ATTACH, you're addingCatalogto yourDatabase- Though if you do 
SHOW ALL TABLES, the result column "database" should be "catalog" to be precise 
 - Though if you do 
 
Hierarchy:
- A 
Tablecan have manyColumn - A 
Schemacan have manyTable - A 
Databasecan have manySchema(corresponds to dataset in dlt) - A 
Databaseis a single physical file (e.g.,db.duckdb) - A 
Databasehas a singleCatalog - A 
Catalogis the internal metadata structure of everything found in the database - Using 
ATTACHadds aCatalogto the 
In dlt:
- dlt creates a duckdb 
Databaseper pipeline when usingdlt.pipeline(..., destination="duckdb") - dlt stores the data inside a 
Schemathat matches the name of thedlt.Dataset - when setting the pipeline destination to a specific duckdb 
Database, you can store multipledlt.Datasetinside the same instance (each with its own duckdbSchema). 
DuckLake object hierarchy
TL;DR:
- scalar < column < table < schema < snapshot < database = catalog
 
Hierarchy:
- A 
Catalogis an SQL database to store metadata- In duckdb terms, it's a duckdb 
Databasethat implements the duckdbCatalogfor the DuckLake 
 - In duckdb terms, it's a duckdb 
 - A 
Cataloghas many Schemas (namespaces if you compare it to Iceberg) that correspond to dlt.Dataset - A 
Storageis a file system or object store that can store parquet files - A 
Snapshotreferences to theCatalogat a particular point in time- This places 
Snapshotat the top of the hierarchy because it scopes other constructs 
 - This places 
 
Using the ducklake extension, the following command in duckdb
ATTACH 'ducklake:{catalog_database}' (DATA_PATH '{data_storage}');
adds the ducklake Catalog to your duckdb database
DuckLakeCopyJob Objects
class DuckLakeCopyJob(DuckDbCopyJob)
metrics
def metrics() -> Optional[LoadJobMetrics]
Generate remote url metrics which point to the table in storage
DuckLakeClient Objects
class DuckLakeClient(DuckDbClient)
Destination client to interact with a DuckLake
A DuckLake has 3 components:
- ducklake client: this is a 
duckdbinstance with theducklakeextension - catalog: this is an SQL database storing metadata. It can be a duckdb instance (typically the ducklake client) or a remote database (sqlite, postgres, mysql)
 - storage: this is a filesystem where data is stored in files
 
The dlt DuckLake destination gives access to the "ducklake client". You never have to manage the catalog and storage directly; this is done through the ducklake client.