# Database Connections DBTK provides a unified interface for connecting to multiple database types with consistent APIs and smart cursor handling. ## Quick Start ```python import dbtk # From configuration file db = dbtk.connect('production_db') # Direct connection from dbtk.database import postgres, oracle, mysql, sqlserver, sqlite # non-standard connection parameters are automatically mapped, standard ports are defaulted db = postgres(user='admin', password='secret', database='mydb', host='localhost') db = oracle(user='admin', password='secret', database='ORCL', host='localhost') db = mysql(user='admin', password='secret', database='mydb') db = sqlserver(user='admin', password='secret', database='mydb', host='localhost') db = sqlite('path/to/database.db') ``` ## Supported Databases DBTK supports multiple database drivers with automatic detection and fallback: | Database | Driver | Install Command | Notes | |------------|-----------------|--------------------------------------|-----------------------------------------------------------| | PostgreSQL | psycopg2 | `pip install psycopg2-binary` | Recommended, most mature | | PostgreSQL | psycopg (3) | `pip install psycopg-binary` | Newest version, async support | | PostgreSQL | pgdb | `pip install pgdb` | DB-API compliant | | Oracle | oracledb | `pip install oracledb` | Thin mode - no Oracle client required | | Oracle | cx_Oracle | `pip install cx_Oracle` | Requires Oracle client installation | | MySQL | mysqlclient | `pip install mysqlclient` | Fastest option, C extension, module name MySQLdb | | MySQL | mariadb | `pip install mariadb` | Official MariaDB connector, C extension, MySQL compatible | | MySQL | mysql.connector | `pip install mysql-connector-python` | Official MySQL connector | | MySQL | pymysql | `pip install pymysql` | Pure Python, lightweight | | SQL Server | pyodbc | `pip install pyodbc` | ODBC driver required on system | | SQL Server | pymssql | `pip install pymssql` | Lightweight, no ODBC needed | | SQLite | sqlite3 | Built-in | No installation needed | **Driver priority:** DBTK automatically selects the best available driver based on priority. Override with `driver='driver_name'` in your connection config or function call. ## The Database Object The `Database` class wraps database connections and provides a consistent interface: ```python db = dbtk.connect('my_database') # Connection info print(db.database_type) # 'postgres', 'oracle', 'mysql', 'sqlserver', 'sqlite' print(db.database_name) # Database/schema name print(db.driver) # The underlying driver module (psycopg2, oracledb, etc.) print(db.placeholder) # Parameter placeholder for this driver # Create cursors cursor = db.cursor() # Transaction management db.commit() db.rollback() db.close() # Parameter style help db.param_help() # Shows this driver's parameter style with examples # psycopg2's parameter style is "pyformat" # "SELECT * FROM people WHERE name = %s AND age > %s", ("Smith", 30) # "SELECT * FROM people WHERE name = %(name)s AND age > %(age)s", {"name": "Smith", "age": 30} ``` ### Context Managers ```python # Connection automatically closed with dbtk.connect('production_db') as db: cursor = db.cursor() cursor.execute("SELECT * FROM users") users = cursor.fetchall() # Transaction - auto-commit on success, rollback on exception with db.transaction(): cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1") cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2") ``` ## Access the Full Stack DBTK maintains a clean reference hierarchy for accessing the underlying driver: ```python cursor = dbtk.connect('imdb').cursor() # The Cursor maintains a reference to the Database connection print(cursor.connection.connection_name) # imdb # The Database maintains a reference to the driver print(cursor.connection.driver.__name__) # 'psycopg2', 'oracledb', etc. # Access the wrapped connection or cursor cursor.connection._connection cursor._cursor # Use driver exceptions try: cursor.execute(sql) except cursor.connection.driver.DatabaseError as e: cursor.connection.rollback() logger.error(f"Database error: {e}") ``` ## Cursors and Records All DBTK cursors return **Record** objects - a hybrid data structure that is memory efficient like at tuple but the functionality of dict or an object. ```python cursor = db.cursor() cursor.execute("SELECT id, name, email FROM users WHERE status = :status", {'status': 'active'}) for user in cursor: user['name'] # Dict-style access user.email # Attribute access user[0] # Index access user[:2] # Slicing id, name, email = user # Tuple unpacking ``` Records store both the original column names, and also normalize column names for attribute access, so `row.employee_id` (and `row['employee_id']`) works whether the source column is `Employee_ID`, `EMPLOYEE ID`, or `employee_id`. This makes your Table field mappings resilient to source naming inconsistencies. See [Record Objects](04-record.md) for full documentation on access patterns, normalization, mutation, and performance characteristics. ### Cursor Configuration ```python cursor = db.cursor( batch_size=5000, # Rows per batch in bulk operations debug=True, # Print SQL queries and bind variables return_cursor=True, # execute() returns cursor for chaining ) # With return_cursor=True, you can chain calls user = cursor.execute("SELECT * FROM users WHERE status = 'active'").fetchone() ``` Default cursor settings can be configured per-connection in the YAML config file or passed to `dbtk.connect()`. See [Configuration](02-configuration.md#database-connections) for detailed connection configuration documentation. ```yaml connections: my_database: type: postgres host: localhost database: mydb user: myuser cursor: batch_size: 4000 debug: false return_cursor: true ``` ## Parameter Styles ### Native Parameter Styles | Database | Dictionary Style | Positional Style | Placeholder | |---------------------|------------------|------------------|-----------------| | PostgreSQL | pyformat | format | `%(name)s` `%s` | | Oracle | named | numeric | `:name` `:1` | | MySQL | | format | `%s` | | SQL Server (pyodbc) | | qmark | `?` | | SQLite | | qmark | `?` | ## Cursor Methods ### Executing Queries ```python # MySQL — %s (format) paramstyle mysql_cur.execute("SELECT * FROM users WHERE id = %s", (42,)) # Oracle — :named paramstyle ora_cur.execute("SELECT * FROM users WHERE id = :id", {'id': 42}) # MS SQL Server — ? (qmark) paramstyle, convert_params rewrites :named → ? mssql_cur.execute("SELECT * FROM users WHERE id = :id", {'id': 42}, convert_params=True) # Multiple rows (batch) cursor.executemany( "INSERT INTO users (name, email) VALUES (:name, :email)", [ {'name': 'Alice', 'email': 'alice@example.com'}, {'name': 'Bob', 'email': 'bob@example.com'} ] ) # Execute from SQL file with portable parameter conversion cursor.execute_file('queries/create_schema.sql', {'status': 'active'}) ``` ### Prepared Statements For queries executed repeatedly with different parameters, PreparedStatement transforms the SQL to match the database's `paramstyle` and caches the parameter mapping. ```python import dbtk # From a query string — mirrors cursor.prepare_file() for inline SQL orders_stmt = cursor.prepare_query("SELECT * FROM orders WHERE customer_id = :id") # From a query file movies_stmt = cursor.prepare_file('queries/list_movies.sql') # Execute many times efficiently for user in users: orders_stmt.execute({'id': user.id}) order = orders_stmt.fetchone() process(user, order) ``` `PreparedStatement` is also available directly from `dbtk` or `dbtk.cursors` if you need to construct one explicitly: ```python import dbtk stmt = dbtk.PreparedStatement(cursor, query="SELECT * FROM orders WHERE customer_id = :id") stmt = dbtk.PreparedStatement(cursor, filename='queries/get_user.sql') ``` ### Parameter Conversion DBTK has tools to handle different parameter styles. You can use _named_ (`:name`) or _pyformat_ (`%(name)s`) in queries - DBTK can convert to the driver's native style. By default, `cursor.execute()` passes the query and parameters directly to the underlying driver — no rewriting is done. Use `convert_params=True` for a one-off conversion. `cursor.execute_file()` and `PreparedStatement` convert automatically. Oracle and PostgreSQL support both dictionary and positional parameters. Their default (db.driver.paramstyle) will be the dictionary style. If you want force positional mode, you can override the paramstyle as in the example below. ```python from dbtk.utils import process_sql_parameters, ParamStyle sql = 'SELECT * FROM users WHERE name = :name AND age > :age' # query automatically rewritten and parameters formatted to match paramstyle statement = dbtk.cursors.PreparedStatement(cursor, sql) # Use a positional style if you are doing a large executemany, positional_style = ParamStyle.get_positional_style(cur.paramstyle) # `process_sql_parameters` to manually convert a query and get parameter order query, param_names = process_sql_parameters(sql, positional_style) print(param_names) # ['name', 'age'] # extra params ignored, missing defaulted params = {'name': 'Aang', 'rank': 'Avatar'} # `prepare_params` converts params to whatever is needed by the query bind_vars = cursor.prepare_params(param_names, params, paramstyle=positional_style) # ['Aang', None] or {'name': 'Aang', 'age': None} depending on paramstyle ``` ### Fetching Results ```python # Fetch one record user = cursor.fetchone() # Fetch many records users = cursor.fetchmany(100) # Fetch all records all_users = cursor.fetchall() # Iterate (memory efficient for large result sets) for user in cursor: process(user) # Fetch exactly one (raises error if 0 or >1 rows) user = cursor.selectinto("SELECT * FROM users WHERE id = :id", {'id': 42}) ``` ### Cursor Properties ```python # Column names (after executing a query) columns = cursor.columns() # Original names normalized = cursor.columns(normalized=True) # Sanitized for Python identifiers # Row count (if available) count = cursor.rowcount # Description (DB-API standard) desc = cursor.description ``` ## Transaction Management ### Manual Transactions ```python try: cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1") cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2") cursor.connection.commit() except Exception: cursor.connection.rollback() raise ``` ### Context Manager Transactions ```python with db.transaction(): cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1") cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2") # Commits automatically on success, rolls back on exception ``` ## Error Handling ```python try: cursor.execute("SELECT * FROM nonexistent_table") except db.driver.DatabaseError as e: print(f"Database error: {e}") except db.driver.IntegrityError as e: print(f"Integrity constraint violated: {e}") ``` **Benefits of SQL files:** - Keep SQL separate from Python for better organisation and editor syntax highlighting - Test queries independently before integration - Reuse the same query across different scripts - Write once, run on any database ## Best Practices 1. **Use context managers** - Ensures connections are properly closed 2. **Use named/pyformat parameters** - More readable and portable across databases 3. **Iterate large result sets** - Don't `fetchall()` on millions of rows 4. **Use `transaction()` context manager** - Safe commit/rollback handling 5. **Use configuration files** - Keep credentials out of code 6. **Use `cursor.execute_file()`** - Portable SQL with automatic parameter conversion 7. **Use `PreparedStatement`** - Portable SQL to be executed repeatedly ## See Also - [Record Objects](04-record.md) - Full documentation on DBTK's universal data structure - [Configuration & Security](02-configuration.md) - YAML config files and password encryption - [ETL: Table & Transforms](07-table.md) - Using cursors with Table and DataSurge - [ETL: Tools & Logging](09-etl-tools.md) - IdentityManager and ValidationCollector - [Readers](05-readers.md) - Reading data from files and databases - [Writers](06-writers.md) - Writing data to files and databases