Data Writers
The problem: You’ve queried your data, now you need to export it. Do you write CSV? Excel? JSON? Load it into another database? Each format requires different code and libraries.
The solution: DBTK writers provide a unified interface for exporting to any format. All writers accept either a cursor or materialized results (lists of Records/namedtuples/dicts), making it trivial to export the same data to multiple formats.
Basic Usage
from dbtk import writers
# CSV export
writers.to_csv(cursor, 'northern_tribe_waterbenders.csv', delimiter='\t')
# Excel workbooks with multiple sheets
writers.to_excel(cursor, 'fire_nation_report.xlsx', sheet='Q1 Intelligence')
# JSON output
writers.to_json(cursor, 'air_temples/meditation_records.json')
# NDJSON (newline-delimited JSON) for streaming
writers.to_ndjson(cursor, 'battle_logs.ndjson')
# XML with custom elements
writers.to_xml(cursor, 'citizens.xml', record_element='earth_kingdom_citizen')
# Fixed-width format for legacy systems
from dbtk.utils import FixedColumn
columns = [FixedColumn('name', 1, 20), FixedColumn('region', 21, 35), FixedColumn('population', 36, 45)]
writers.to_fixed_width(cursor, columns, 'ba_sing_se_daily_announcements.txt')
# Direct database-to-database transfer
source_cursor.execute("SELECT * FROM water_tribe_defenses")
count = writers.cursor_to_cursor(source_cursor, target_cursor, 'intel_archive')
print(f"Transferred {count} strategic records")
Export Once, Write Everywhere
If your result set fits comfortably in memory you can fetch it once and export to multiple formats:
# Fetch once, write many times
data = cursor.fetchall()
writers.to_csv(data, 'output.csv')
writers.to_excel(data, 'output.xlsx')
writers.to_json(data, 'output.json')
For large result sets, skip the fetchall() entirely and pass the cursor directly to a writer — it streams row-by-row without materializing anything:
cursor.execute("SELECT * FROM large_table")
writers.to_csv(cursor, 'output.csv') # Cursor consumed once, no list in memory
Quick Preview to Stdout
Pass None as the filename to preview data to stdout — perfect for debugging or quick checks:
# Preview first 20 records to console before writing to file
cursor.execute("SELECT * FROM soldiers")
writers.to_csv(cursor, None) # Prints to stdout
# Then export the full dataset
cursor.execute("SELECT * FROM soldiers")
writers.to_csv(cursor, 'soldiers.csv')
Compressed Output
All file writers support transparent compression. By default compression='infer' detects the format from the file extension — no extra code required:
writers.to_csv(cursor, 'archive.csv.gz') # gzip
writers.to_csv(cursor, 'archive.csv.bz2') # bz2
writers.to_csv(cursor, 'archive.csv.xz') # lzma
writers.to_ndjson(cursor, 'events.ndjson.gz') # gzip
writers.to_json(cursor, 'data.json.gz') # gzip
Pass an explicit value to override extension inference, or None to disable it:
# Force gzip even though the extension doesn't say so
writers.to_csv(cursor, 'output.csv', compression='gzip')
# Write plain text despite the .gz extension
writers.to_csv(cursor, 'output.csv.gz', compression=None)
Supported values: 'infer' (default), 'gzip', 'bz2', 'lzma', None.
Compression also works with batch writers — the file is opened compressed once on entry and closed on exit:
from dbtk.writers import CSVWriter
with CSVWriter(file='large_archive.csv.gz') as writer:
while batch := cursor.fetchmany(10_000):
writer.write_batch(batch)
Writing in Batches
The to_* helper functions are single-shot: they create a writer, write all data, then close and discard the writer.
For incremental writes — pagination, chunked ETL, or appending from multiple sources — you need to instantiate a BatchWriter
subclass directly and call write_batch() in a loop. Calling a to_* function multiple times with the same file will overwrite
the previous file. The exception is to_excel which will overwrite a worksheet if it already exists but not the entire workbook.
Supported batch writers: CSVWriter, NDJSONWriter, ExcelWriter, LinkedExcelWriter, XMLStreamer, FixedWidthWriter, EDIWriter.
from dbtk.writers import CSVWriter
# Open writer once, write in pages, close at the end
with CSVWriter(file='large_export.csv') as writer:
while batch := cursor.fetchmany(10_000):
writer.write_batch(batch)
The first write_batch() call writes the header row; subsequent calls append data rows without repeating it.
Why not just use to_csv() for this?
to_csv(cursor, 'output.csv') works fine for a single cursor — it streams row-by-row internally. But if your data comes
from multiple queries, or multiple cursors, or you need to write large datasets to multiple targets, write_batch() is the way to go.
from dbtk.writers import ExcelWriter
with ExcelWriter(file='combined.xlsx') as writer:
for region in ['north', 'south', 'east', 'west']:
cursor.execute("SELECT * FROM sales WHERE region = :r", {'r': region})
writer.write_batch(cursor, sheet_name=region)
Multiple Sheets and Excel Formatting
ExcelWriter keeps the workbook open across write_batch() calls and saves on context exit, making it the right tool for multi-sheet reports. By default it auto-sizes columns by sampling the first 15 rows, bolds and freezes the header row, and applies date formatting automatically — no configuration needed. For column styles, auto-rotating headers, hyperlinked reports, and the full formatting dict reference, see Excel Reports.
from dbtk.writers import ExcelWriter
with ExcelWriter(file='monthly_report.xlsx') as wb:
cursor.execute("SELECT * FROM sales WHERE month = 'January'")
wb.write_batch(cursor, sheet_name='Sales')
cursor.execute("SELECT * FROM expenses WHERE month = 'January'")
wb.write_batch(cursor, sheet_name='Expenses')
# Workbook saved and closed automatically
For hyperlinked reports with internal navigation or external URLs, see LinkedExcelWriter.
Streaming XML with XMLStreamer
For large XML exports, XMLStreamer writes records incrementally without building the entire tree in memory:
from dbtk.writers import XMLStreamer
# Stream millions of records to XML
with XMLStreamer(file='large_export.xml', root_element='records',
record_element='item') as writer:
for batch in data_source.batches(10000):
writer.write_batch(batch)
This is memory-efficient for large datasets where to_xml() would consume too much memory building the DOM.
XMLStreamer vs to_xml():
Feature |
XMLStreamer |
to_xml() |
|---|---|---|
Memory usage |
Constant (streaming) |
O(n) — loads all in memory |
Best for |
Millions of records |
< 100K records |
Control |
Fine-grained batching |
Single operation |
Speed |
Slower (incremental writes) |
Faster (bulk write) |
When to use XMLStreamer:
Exporting > 100K records to XML
Memory-constrained environments
Long-running exports that need progress tracking
Need to process multiple cursors into one XML file
Fixed-Width Files with FixedWidthWriter
FixedWidthWriter writes fixed-width text files driven by a List[FixedColumn] schema — the same schema used by FixedReader on the read side. Each column definition specifies position, width, alignment, and padding, so the writer handles all formatting automatically.
from dbtk.utils import FixedColumn
from dbtk.writers import FixedWidthWriter, to_fixed_width
COLS = [
FixedColumn('record_type', 1, 2),
FixedColumn('account', 3, 14, align='right', pad_char='0'),
FixedColumn('amount', 15, 24, align='right', pad_char='0', column_type='int'),
FixedColumn('description', 25, 54),
]
# Single-shot
to_fixed_width(records, COLS, 'output.txt')
# Batch / streaming
with FixedWidthWriter(file='output.txt', columns=COLS) as w:
for batch in source.batches(10_000):
w.write_batch(batch)
Input records can be FixedWidthRecord instances (written directly via to_line()), dicts, lists, tuples, or namedtuples — all are cast positionally into the column schema.
By default truncate_overflow=True silently truncates values that exceed their column width. Pass truncate_overflow=False to raise ValueError instead.
Building a typed record class with fixed_record_factory
When you’re generating fixed-width output rather than transforming existing records, fixed_record_factory lets you define a named record type from a compact column spec — similar to collections.namedtuple. Pass a list of (name, width) tuples (positions are assigned automatically) or FixedColumn objects (used as-is), or mix both.
from dbtk import fixed_record_factory
AchDetail = fixed_record_factory([
('record_type', 1),
('priority_code', 2),
('routing_number', 9),
('account_number', 17),
('amount', 10),
], name='AchDetail')
record = AchDetail('6', '22', '123456789', '00012345678', 100)
print(record.to_line())
# '622123456789000123456780000000100'
For columns that need explicit alignment, padding, or type coercion, drop in a FixedColumn — positions auto-advance past it:
from dbtk import fixed_record_factory
from dbtk.utils import FixedColumn
AchHeader = fixed_record_factory([
FixedColumn('record_type', 1, 1),
('priority_code', 2),
FixedColumn('routing_number', 4, 12, column_type='int', align='right'),
('filler', 39),
])
The returned class is a full FixedWidthRecord subclass — you can pass its instances directly to FixedWidthWriter or call .to_line() yourself.
EDI (Electronic Data Interchange) Fixed-Width with EDIWriter
EDIWriter is the write-side counterpart to EDIReader. It handles Electronic Data Interchange files where different record types have different layouts — NACHA ACH, COBOL bank extracts, X12 835 remittances, and similar formats.
The schema is a Dict[str, List[FixedColumn]] mapping type codes to column definitions. The type code is always the first field of each record; EDIWriter reads it to select the right layout for each row.
Read-modify-write EDI Files:
from dbtk.readers.fixed_width import EDIReader
from dbtk.writers.fixed_width import EDIWriter
from dbtk.formats.edi import ACH_COLUMNS
with open('in.ach') as fp, EDIWriter(file='out.ach', columns=ACH_COLUMNS) as w:
batch = []
for record in EDIReader(fp, ACH_COLUMNS):
# records are FixedWidthRecord — modify fields, then write
if record[0] == '6': # Entry Detail
record.update(amount=record.amount + 100)
batch.append(record)
w.write_batch(batch)
Single-shot from a list:
from dbtk.writers import to_edi
records = list(EDIReader(open('in.ach'), ACH_COLUMNS))
to_edi(records, ACH_COLUMNS, 'out.ach')
Pre-built schemas for common formats are in dbtk.formats.edi:
from dbtk.formats.edi import ACH_COLUMNS, COBOL_BANK_EXTRACT_COLUMNS, X12_835_COLUMNS, FORMATS
By default truncate_overflow=False — EDI files are length-strict and silent truncation could result in data loss. Pass truncate_overflow=True only if you know what you’re doing.
Performance Comparison
For large datasets, here’s when to use each writer:
Records |
CSV |
Excel |
JSON |
NDJSON |
XML |
XMLStreamer |
|---|---|---|---|---|---|---|
< 10K |
✅ Fast |
✅ Fast |
✅ Fast |
✅ Fast |
✅ Fast |
❌ Overkill |
10K – 100K |
✅ Fast |
✅ OK |
✅ OK |
✅ OK |
⚠️ Slow |
⚠️ Better |
100K – 1M |
✅ Fast |
⚠️ Slow |
⚠️ Slow |
✅ OK |
❌ Memory |
✅ Use this |
> 1M |
✅ Fast |
❌ Not Allowed |
❌ Memory |
✅ OK |
❌ Memory |
✅ Use this |
Recommendations:
CSV: Best for all sizes, especially large datasets
Excel: Great for < 100K records, business reports
JSON: Good for < 100K records, API integration
NDJSON: Good for all sizes, streaming/log formats, API integration
XML: Use XMLStreamer for > 100K records
ExcelWriter / LinkedExcelWriter: Multi-sheet reports (any size per sheet < 1M)
Fixed-width / EDI: Legacy system integration, NACHA ACH, mainframe extracts
Writing Additional Formats
For formats DBTK doesn’t cover natively — parquet, avro, Arrow IPC, and anything else polars or pandas supports — no adapter code is needed. DBTK Records are dict-compatible, so both libraries consume them directly:
import polars as pl
import pandas as pd
# polars — write parquet, avro, Arrow IPC, and more
df = pl.from_dicts(cursor)
df.write_parquet('output.parquet')
df.write_avro('output.avro')
# pandas — write parquet, feather, HDF5, and more
with dbtk.readers.get_reader('data.csv') as reader:
df = pd.DataFrame(reader)
df.to_parquet('output.parquet')