# Data Writers

**The problem:** You've queried your data, now you need to export it. Do you write CSV? Excel? JSON? Load it into another database? Each format requires different code and libraries.

**The solution:** DBTK writers provide a unified interface for exporting to any format. All writers accept either a cursor or materialized results (lists of Records/namedtuples/dicts), making it trivial to export the same data to multiple formats.

## Basic Usage

```python
from dbtk import writers

# CSV export
writers.to_csv(cursor, 'northern_tribe_waterbenders.csv', delimiter='\t')

# Excel workbooks with multiple sheets
writers.to_excel(cursor, 'fire_nation_report.xlsx', sheet='Q1 Intelligence')

# JSON output
writers.to_json(cursor, 'air_temples/meditation_records.json')

# NDJSON (newline-delimited JSON) for streaming
writers.to_ndjson(cursor, 'battle_logs.ndjson')

# XML with custom elements
writers.to_xml(cursor, 'citizens.xml', record_element='earth_kingdom_citizen')

# Fixed-width format for legacy systems
from dbtk.utils import FixedColumn
columns = [FixedColumn('name', 1, 20), FixedColumn('region', 21, 35), FixedColumn('population', 36, 45)]
writers.to_fixed_width(cursor, columns, 'ba_sing_se_daily_announcements.txt')

# Direct database-to-database transfer
source_cursor.execute("SELECT * FROM water_tribe_defenses")
count = writers.cursor_to_cursor(source_cursor, target_cursor, 'intel_archive')
print(f"Transferred {count} strategic records")
```

## Export Once, Write Everywhere

If your result set fits comfortably in memory you can fetch it once and export to multiple formats:

```python
# Fetch once, write many times
data = cursor.fetchall()
writers.to_csv(data, 'output.csv')
writers.to_excel(data, 'output.xlsx')
writers.to_json(data, 'output.json')
```

For large result sets, skip the `fetchall()` entirely and pass the cursor directly to a writer — it streams row-by-row without materializing anything:

```python
cursor.execute("SELECT * FROM large_table")
writers.to_csv(cursor, 'output.csv')  # Cursor consumed once, no list in memory
```
## Quick Preview to Stdout

Pass `None` as the filename to preview data to stdout — perfect for debugging or quick checks:

```python
# Preview first 20 records to console before writing to file
cursor.execute("SELECT * FROM soldiers")
writers.to_csv(cursor, None)  # Prints to stdout

# Then export the full dataset
cursor.execute("SELECT * FROM soldiers")
writers.to_csv(cursor, 'soldiers.csv')
```


## Compressed Output

All file writers support transparent compression. By default `compression='infer'` detects the format from the file extension — no extra code required:

```python
writers.to_csv(cursor, 'archive.csv.gz')       # gzip
writers.to_csv(cursor, 'archive.csv.bz2')      # bz2
writers.to_csv(cursor, 'archive.csv.xz')       # lzma
writers.to_ndjson(cursor, 'events.ndjson.gz')  # gzip
writers.to_json(cursor, 'data.json.gz')        # gzip
```

Pass an explicit value to override extension inference, or `None` to disable it:

```python
# Force gzip even though the extension doesn't say so
writers.to_csv(cursor, 'output.csv', compression='gzip')

# Write plain text despite the .gz extension
writers.to_csv(cursor, 'output.csv.gz', compression=None)
```

Supported values: `'infer'` (default), `'gzip'`, `'bz2'`, `'lzma'`, `None`.

Compression also works with batch writers — the file is opened compressed once on entry and closed on exit:

```python
from dbtk.writers import CSVWriter

with CSVWriter(file='large_archive.csv.gz') as writer:
    while batch := cursor.fetchmany(10_000):
        writer.write_batch(batch)
```


## Writing in Batches

The `to_*` helper functions are single-shot: they create a writer, write all data, then close and discard the writer. 
For incremental writes — pagination, chunked ETL, or appending from multiple sources — you need to instantiate a `BatchWriter` 
subclass directly and call `write_batch()` in a loop. Calling a `to_*` function multiple times with the same file will overwrite
the previous file. The exception is `to_excel` which will overwrite a worksheet if it already exists but not the entire workbook.

Supported batch writers: `CSVWriter`, `NDJSONWriter`, `ExcelWriter`, `LinkedExcelWriter`, `XMLStreamer`, `FixedWidthWriter`, `EDIWriter`.

```python
from dbtk.writers import CSVWriter

# Open writer once, write in pages, close at the end
with CSVWriter(file='large_export.csv') as writer:
    while batch := cursor.fetchmany(10_000):
        writer.write_batch(batch)
```

The first `write_batch()` call writes the header row; subsequent calls append data rows without repeating it.

**Why not just use `to_csv()` for this?**
`to_csv(cursor, 'output.csv')` works fine for a single cursor — it streams row-by-row internally. But if your data comes 
from multiple queries, or multiple cursors, or you need to write large datasets to multiple targets, `write_batch()` is the way to go.

```python
from dbtk.writers import ExcelWriter

with ExcelWriter(file='combined.xlsx') as writer:
    for region in ['north', 'south', 'east', 'west']:
        cursor.execute("SELECT * FROM sales WHERE region = :r", {'r': region})
        writer.write_batch(cursor, sheet_name=region)
```
## Multiple Sheets and Excel Formatting

`ExcelWriter` keeps the workbook open across `write_batch()` calls and saves on context exit, making it the right tool for multi-sheet reports. By default it auto-sizes columns by sampling the first 15 rows, bolds and freezes the header row, and applies date formatting automatically — no configuration needed. For column styles, auto-rotating headers, hyperlinked reports, and the full `formatting` dict reference, see **[Excel Reports](06b-excel.md)**.

```python
from dbtk.writers import ExcelWriter

with ExcelWriter(file='monthly_report.xlsx') as wb:
    cursor.execute("SELECT * FROM sales WHERE month = 'January'")
    wb.write_batch(cursor, sheet_name='Sales')

    cursor.execute("SELECT * FROM expenses WHERE month = 'January'")
    wb.write_batch(cursor, sheet_name='Expenses')
# Workbook saved and closed automatically
```

For hyperlinked reports with internal navigation or external URLs, see [`LinkedExcelWriter`](06b-excel.md#hyperlinked-reports-with-linkedexcelwriter).

## Streaming XML with XMLStreamer

For large XML exports, `XMLStreamer` writes records incrementally without building the entire tree in memory:

```python
from dbtk.writers import XMLStreamer

# Stream millions of records to XML
with XMLStreamer(file='large_export.xml', root_element='records',
                 record_element='item') as writer:
    for batch in data_source.batches(10000):
        writer.write_batch(batch)
```

This is memory-efficient for large datasets where `to_xml()` would consume too much memory building the DOM.

**XMLStreamer vs to_xml():**

| Feature      | XMLStreamer                 | to_xml()                   |
|--------------|-----------------------------|----------------------------|
| Memory usage | Constant (streaming)        | O(n) — loads all in memory |
| Best for     | Millions of records         | < 100K records             |
| Control      | Fine-grained batching       | Single operation           |
| Speed        | Slower (incremental writes) | Faster (bulk write)        |

**When to use XMLStreamer:**
- Exporting > 100K records to XML
- Memory-constrained environments
- Long-running exports that need progress tracking
- Need to process multiple cursors into one XML file

## Fixed-Width Files with FixedWidthWriter

`FixedWidthWriter` writes fixed-width text files driven by a `List[FixedColumn]` schema — the same schema used by `FixedReader` on the read side. Each column definition specifies position, width, alignment, and padding, so the writer handles all formatting automatically.

```python
from dbtk.utils import FixedColumn
from dbtk.writers import FixedWidthWriter, to_fixed_width

COLS = [
    FixedColumn('record_type',  1,  2),
    FixedColumn('account',      3, 14, align='right', pad_char='0'),
    FixedColumn('amount',      15, 24, align='right', pad_char='0', column_type='int'),
    FixedColumn('description', 25, 54),
]

# Single-shot
to_fixed_width(records, COLS, 'output.txt')

# Batch / streaming
with FixedWidthWriter(file='output.txt', columns=COLS) as w:
    for batch in source.batches(10_000):
        w.write_batch(batch)
```

Input records can be `FixedWidthRecord` instances (written directly via `to_line()`), dicts, lists, tuples, or namedtuples — all are cast positionally into the column schema.

By default `truncate_overflow=True` silently truncates values that exceed their column width. Pass `truncate_overflow=False` to raise `ValueError` instead.

### Building a typed record class with `fixed_record_factory`

When you're *generating* fixed-width output rather than transforming existing records, `fixed_record_factory` lets you define a named record type from a compact column spec — similar to `collections.namedtuple`. Pass a list of `(name, width)` tuples (positions are assigned automatically) or `FixedColumn` objects (used as-is), or mix both.

```python
from dbtk import fixed_record_factory

AchDetail = fixed_record_factory([
    ('record_type',    1),
    ('priority_code',  2),
    ('routing_number', 9),
    ('account_number', 17),
    ('amount',         10),
], name='AchDetail')

record = AchDetail('6', '22', '123456789', '00012345678', 100)
print(record.to_line())
# '622123456789000123456780000000100'
```

For columns that need explicit alignment, padding, or type coercion, drop in a `FixedColumn` — positions auto-advance past it:

```python
from dbtk import fixed_record_factory
from dbtk.utils import FixedColumn

AchHeader = fixed_record_factory([
    FixedColumn('record_type', 1, 1),
    ('priority_code', 2),
    FixedColumn('routing_number', 4, 12, column_type='int', align='right'),
    ('filler', 39),
])
```

The returned class is a full `FixedWidthRecord` subclass — you can pass its instances directly to `FixedWidthWriter` or call `.to_line()` yourself.

## EDI (Electronic Data Interchange) Fixed-Width with EDIWriter

`EDIWriter` is the write-side counterpart to `EDIReader`. It handles Electronic Data Interchange files where different record types have different layouts — NACHA ACH, COBOL bank extracts, X12 835 remittances, and similar formats.

The schema is a `Dict[str, List[FixedColumn]]` mapping type codes to column definitions. The type code is always the first field of each record; `EDIWriter` reads it to select the right layout for each row.

**Read-modify-write EDI Files:**

```python
from dbtk.readers.fixed_width import EDIReader
from dbtk.writers.fixed_width import EDIWriter
from dbtk.formats.edi import ACH_COLUMNS

with open('in.ach') as fp, EDIWriter(file='out.ach', columns=ACH_COLUMNS) as w:
    batch = []
    for record in EDIReader(fp, ACH_COLUMNS):
        # records are FixedWidthRecord — modify fields, then write
        if record[0] == '6':   # Entry Detail
            record.update(amount=record.amount + 100)
            batch.append(record)
    w.write_batch(batch)
```

**Single-shot from a list:**

```python
from dbtk.writers import to_edi

records = list(EDIReader(open('in.ach'), ACH_COLUMNS))
to_edi(records, ACH_COLUMNS, 'out.ach')
```

**Pre-built schemas** for common formats are in `dbtk.formats.edi`:

```python
from dbtk.formats.edi import ACH_COLUMNS, COBOL_BANK_EXTRACT_COLUMNS, X12_835_COLUMNS, FORMATS
```

By default `truncate_overflow=False` — EDI files are length-strict and silent truncation could result in data loss. Pass `truncate_overflow=True` only if you know what you're doing.


## Performance Comparison

For large datasets, here's when to use each writer:

| Records    | CSV    | Excel         | JSON     | NDJSON    | XML      | XMLStreamer |
|------------|--------|---------------|----------|-----------|----------|-------------|
| < 10K      | ✅ Fast | ✅ Fast        | ✅ Fast   | ✅ Fast    | ✅ Fast   | ❌ Overkill  |
| 10K – 100K | ✅ Fast | ✅ OK          | ✅ OK     | ✅ OK      | ⚠️ Slow  | ⚠️ Better   |
| 100K – 1M  | ✅ Fast | ⚠️ Slow       | ⚠️ Slow  | ✅ OK      | ❌ Memory | ✅ Use this  |
| > 1M       | ✅ Fast | ❌ Not Allowed | ❌ Memory | ✅ OK      | ❌ Memory | ✅ Use this  |

**Recommendations:**
- **CSV**: Best for all sizes, especially large datasets
- **Excel**: Great for < 100K records, business reports
- **JSON**: Good for < 100K records, API integration
- **NDJSON**: Good for all sizes, streaming/log formats, API integration
- **XML**: Use XMLStreamer for > 100K records
- **ExcelWriter / LinkedExcelWriter**: Multi-sheet reports (any size per sheet < 1M)
- **Fixed-width / EDI**: Legacy system integration, NACHA ACH, mainframe extracts

## Writing Additional Formats

For formats DBTK doesn't cover natively — parquet, avro, Arrow IPC, and anything else polars or pandas supports — no adapter code is needed. DBTK Records are dict-compatible, so both libraries consume them directly:

```python
import polars as pl
import pandas as pd

# polars — write parquet, avro, Arrow IPC, and more
df = pl.from_dicts(cursor)
df.write_parquet('output.parquet')
df.write_avro('output.avro')

# pandas — write parquet, feather, HDF5, and more
with dbtk.readers.get_reader('data.csv') as reader:
    df = pd.DataFrame(reader)
    df.to_parquet('output.parquet')
```