# Database Connections

DBTK provides a unified interface for connecting to multiple database types 
with consistent APIs and smart cursor handling.

## Quick Start

```python
import dbtk

# From configuration file
db = dbtk.connect('production_db')

# Direct connection
from dbtk.database import postgres, oracle, mysql, sqlserver, sqlite
# non-standard connection parameters are automatically mapped, standard ports are defaulted
db = postgres(user='admin', password='secret', database='mydb', host='localhost')
db = oracle(user='admin', password='secret', database='ORCL', host='localhost')
db = mysql(user='admin', password='secret', database='mydb')
db = sqlserver(user='admin', password='secret', database='mydb', host='localhost')
db = sqlite('path/to/database.db')
```

## Supported Databases

DBTK supports multiple database drivers with automatic detection and fallback:

| Database   | Driver          | Install Command                      | Notes                                                     |
|------------|-----------------|--------------------------------------|-----------------------------------------------------------|
| PostgreSQL | psycopg2        | `pip install psycopg2-binary`        | Recommended, most mature                                  |
| PostgreSQL | psycopg (3)     | `pip install psycopg-binary`         | Newest version, async support                             |
| PostgreSQL | pgdb            | `pip install pgdb`                   | DB-API compliant                                          |
| Oracle     | oracledb        | `pip install oracledb`               | Thin mode - no Oracle client required                     |
| Oracle     | cx_Oracle       | `pip install cx_Oracle`              | Requires Oracle client installation                       |
| MySQL      | mysqlclient     | `pip install mysqlclient`            | Fastest option, C extension, module name MySQLdb          |
| MySQL      | mariadb         | `pip install mariadb`                | Official MariaDB connector, C extension, MySQL compatible |
| MySQL      | mysql.connector | `pip install mysql-connector-python` | Official MySQL connector                                  |
| MySQL      | pymysql         | `pip install pymysql`                | Pure Python, lightweight                                  |
| SQL Server | pyodbc          | `pip install pyodbc`                 | ODBC driver required on system                            |
| SQL Server | pymssql         | `pip install pymssql`                | Lightweight, no ODBC needed                               |
| SQLite     | sqlite3         | Built-in                             | No installation needed                                    |

**Driver priority:** DBTK automatically selects the best available driver based on priority. Override with `driver='driver_name'` in your connection config or function call.

## The Database Object

The `Database` class wraps database connections and provides a consistent interface:

```python
db = dbtk.connect('my_database')

# Connection info
print(db.database_type)  # 'postgres', 'oracle', 'mysql', 'sqlserver', 'sqlite'
print(db.database_name)  # Database/schema name
print(db.driver)         # The underlying driver module (psycopg2, oracledb, etc.)
print(db.placeholder)    # Parameter placeholder for this driver

# Create cursors
cursor = db.cursor()

# Transaction management
db.commit()
db.rollback()
db.close()

# Parameter style help
db.param_help()  # Shows this driver's parameter style with examples
# psycopg2's parameter style is "pyformat"
# "SELECT * FROM people WHERE name = %s AND age > %s", ("Smith", 30)
# "SELECT * FROM people WHERE name = %(name)s AND age > %(age)s", {"name": "Smith", "age": 30}
```

### Context Managers

```python
# Connection automatically closed
with dbtk.connect('production_db') as db:
    cursor = db.cursor()
    cursor.execute("SELECT * FROM users")
    users = cursor.fetchall()

# Transaction - auto-commit on success, rollback on exception
with db.transaction():
    cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
    cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
```

## Access the Full Stack

DBTK maintains a clean reference hierarchy for accessing the underlying driver:

```python
cursor = dbtk.connect('imdb').cursor()

# The Cursor maintains a reference to the Database connection
print(cursor.connection.connection_name) # imdb

# The Database maintains a reference to the driver
print(cursor.connection.driver.__name__)       # 'psycopg2', 'oracledb', etc.

# Access the wrapped connection or cursor
cursor.connection._connection
cursor._cursor

# Use driver exceptions
try:
    cursor.execute(sql)
except cursor.connection.driver.DatabaseError as e:
    cursor.connection.rollback()
    logger.error(f"Database error: {e}")
```

## Cursors and Records

All DBTK cursors return **Record** objects - a hybrid data structure that is memory efficient like at tuple but the 
functionality of dict or an object.

```python
cursor = db.cursor()
cursor.execute("SELECT id, name, email FROM users WHERE status = :status",
               {'status': 'active'})

for user in cursor:
    user['name']            # Dict-style access
    user.email              # Attribute access
    user[0]                 # Index access
    user[:2]                # Slicing
    id, name, email = user  # Tuple unpacking
```

Records store both the original column names, and also normalize column names for attribute access, so `row.employee_id` (and `row['employee_id']`) works whether the source column is `Employee_ID`, `EMPLOYEE ID`, or `employee_id`. This makes your Table field mappings resilient to source naming inconsistencies.

See [Record Objects](04-record.md) for full documentation on access patterns, normalization, mutation, and performance characteristics.

### Cursor Configuration

```python
cursor = db.cursor(
    batch_size=5000,           # Rows per batch in bulk operations
    debug=True,                # Print SQL queries and bind variables
    return_cursor=True,        # execute() returns cursor for chaining
)

# With return_cursor=True, you can chain calls
user = cursor.execute("SELECT * FROM users WHERE status = 'active'").fetchone()
```

Default cursor settings can be configured per-connection in the YAML config file or passed to `dbtk.connect()`.
See [Configuration](02-configuration.md#database-connections) for detailed connection configuration documentation.

```yaml
connections:
  my_database:
    type: postgres
    host: localhost
    database: mydb
    user: myuser
    cursor:
      batch_size: 4000
      debug: false
      return_cursor: true
```

## Parameter Styles

### Native Parameter Styles 

| Database            | Dictionary Style | Positional Style | Placeholder     |
|---------------------|------------------|------------------|-----------------|
| PostgreSQL          | pyformat         | format           | `%(name)s` `%s` |
| Oracle              | named            | numeric          | `:name` `:1`    | 
| MySQL               |                  | format           | `%s`            |
| SQL Server (pyodbc) |                  | qmark            | `?`             |
| SQLite              |                  | qmark            | `?`             |  


## Cursor Methods

### Executing Queries

```python
# MySQL — %s (format) paramstyle
mysql_cur.execute("SELECT * FROM users WHERE id = %s", (42,))
# Oracle — :named paramstyle
ora_cur.execute("SELECT * FROM users WHERE id = :id", {'id': 42})
# MS SQL Server — ? (qmark) paramstyle, convert_params rewrites :named → ?
mssql_cur.execute("SELECT * FROM users WHERE id = :id", {'id': 42}, convert_params=True)

# Multiple rows (batch)
cursor.executemany(
    "INSERT INTO users (name, email) VALUES (:name, :email)",
    [
        {'name': 'Alice', 'email': 'alice@example.com'},
        {'name': 'Bob', 'email': 'bob@example.com'}
    ]
)

# Execute from SQL file with portable parameter conversion
cursor.execute_file('queries/create_schema.sql', {'status': 'active'})
```

### Prepared Statements
For queries executed repeatedly with different parameters, PreparedStatement transforms the SQL to match the database's `paramstyle` and caches the parameter mapping.

```python
import dbtk

# From a query string — mirrors cursor.prepare_file() for inline SQL
orders_stmt = cursor.prepare_query("SELECT * FROM orders WHERE customer_id = :id")

# From a query file
movies_stmt = cursor.prepare_file('queries/list_movies.sql')

# Execute many times efficiently
for user in users:
    orders_stmt.execute({'id': user.id})
    order = orders_stmt.fetchone()
    process(user, order)
```

`PreparedStatement` is also available directly from `dbtk` or `dbtk.cursors` if you need to construct one explicitly:

```python
import dbtk
stmt = dbtk.PreparedStatement(cursor, query="SELECT * FROM orders WHERE customer_id = :id")
stmt = dbtk.PreparedStatement(cursor, filename='queries/get_user.sql')
```
### Parameter Conversion

DBTK has tools to handle different parameter styles. You can use _named_ (`:name`) or _pyformat_ (`%(name)s`) in queries - DBTK can convert to the driver's native style.

By default, `cursor.execute()` passes the query and parameters directly to the underlying driver — no rewriting is done. Use `convert_params=True` for a one-off conversion. `cursor.execute_file()` and `PreparedStatement` convert automatically.

Oracle and PostgreSQL support both dictionary and positional parameters. Their default (db.driver.paramstyle) will be 
the dictionary style. If you want force positional mode, you can override the paramstyle as in the example below.   

```python
from dbtk.utils import process_sql_parameters, ParamStyle
sql = 'SELECT * FROM users WHERE name = :name AND age > :age'
# query automatically rewritten and parameters formatted to match paramstyle
statement = dbtk.cursors.PreparedStatement(cursor, sql)

# Use a positional style if you are doing a large executemany,
positional_style = ParamStyle.get_positional_style(cur.paramstyle)
# `process_sql_parameters` to manually convert a query and get parameter order
query, param_names = process_sql_parameters(sql, positional_style)
print(param_names) # ['name', 'age']
# extra params ignored, missing defaulted
params = {'name': 'Aang', 
          'rank': 'Avatar'} 
# `prepare_params` converts params to whatever is needed by the query
bind_vars = cursor.prepare_params(param_names, params, 
                                  paramstyle=positional_style)
# ['Aang', None] or {'name': 'Aang', 'age': None} depending on paramstyle
```
### Fetching Results

```python
# Fetch one record
user = cursor.fetchone()

# Fetch many records
users = cursor.fetchmany(100)

# Fetch all records
all_users = cursor.fetchall()

# Iterate (memory efficient for large result sets)
for user in cursor:
    process(user)

# Fetch exactly one (raises error if 0 or >1 rows)
user = cursor.selectinto("SELECT * FROM users WHERE id = :id", {'id': 42})
```

### Cursor Properties

```python
# Column names (after executing a query)
columns = cursor.columns()                   # Original names
normalized = cursor.columns(normalized=True) # Sanitized for Python identifiers

# Row count (if available)
count = cursor.rowcount

# Description (DB-API standard)
desc = cursor.description
```

## Transaction Management

### Manual Transactions

```python
try:
    cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
    cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
    cursor.connection.commit()
except Exception:
    cursor.connection.rollback()
    raise
```

### Context Manager Transactions

```python
with db.transaction():
    cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
    cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
    # Commits automatically on success, rolls back on exception
```

## Error Handling

```python
try:
    cursor.execute("SELECT * FROM nonexistent_table")
except db.driver.DatabaseError as e:
    print(f"Database error: {e}")
except db.driver.IntegrityError as e:
    print(f"Integrity constraint violated: {e}")
```

**Benefits of SQL files:**
- Keep SQL separate from Python for better organisation and editor syntax highlighting
- Test queries independently before integration
- Reuse the same query across different scripts
- Write once, run on any database

## Best Practices

1. **Use context managers** - Ensures connections are properly closed
2. **Use named/pyformat parameters** - More readable and portable across databases
3. **Iterate large result sets** - Don't `fetchall()` on millions of rows
4. **Use `transaction()` context manager** - Safe commit/rollback handling
5. **Use configuration files** - Keep credentials out of code
6. **Use `cursor.execute_file()`** - Portable SQL with automatic parameter conversion
7. **Use `PreparedStatement`** - Portable SQL to be executed repeatedly

## See Also

- [Record Objects](04-record.md) - Full documentation on DBTK's universal data structure
- [Configuration & Security](02-configuration.md) - YAML config files and password encryption
- [ETL: Table & Transforms](07-table.md) - Using cursors with Table and DataSurge
- [ETL: Tools & Logging](09-etl-tools.md) - IdentityManager and ValidationCollector
- [Readers](05-readers.md) - Reading data from files and databases
- [Writers](06-writers.md) - Writing data to files and databases