# Configuration & Security
DBTK uses YAML configuration files to manage database connections and keep credentials secure with encryption. Running `dbtk checkup` will copy a well commented sample config file to _~/.config/dbtk_sample.yml_. DBTK also has several [command line tools](#command-line-tools) to assist with configuration and encryption. ## Quick Start Create a `dbtk.yml` file in your project folder or in ~/.config folder: ```yaml connections: dev_db: type: postgres host: localhost database: myapp_dev user: developer password: dev_password prod_db: type: postgres host: db.example.com database: myapp_prod user: app_user encrypted_password: gAAAAABh... # Use dbtk encrypt-password ``` Connect from code: ```python import dbtk db = dbtk.connect('prod_db') # or with context manager with dbtk.connect('prod_db') as db: cur = db.cursor() ``` ## Configuration File Locations DBTK searches for config files in this order: 1. Explicitly set path: `dbtk.set_config_file('path/to/config.yml')` 2. Current directory: `./dbtk.yml` or `./dbtk.yaml` 3. User config: `~/.config/dbtk.yml` or `~/.config/dbtk.yaml` If no config file is found, a sample is automatically created at `~/.config/dbtk_sample.yml`. ```python import dbtk # Use specific config file dbtk.set_config_file('/path/to/production.yml') db = dbtk.connect('database_name') # Use default search path db = dbtk.connect('database_name') ``` ## Configuration File Structure The config file has two main sections: `settings`, `connections`, and optionally `passwords` and `drivers`. ### Settings ```yaml settings: default_batch_size: 1000 default_country: US default_timezone: UTC # Logging settings can be overwritten in a script level dbtk.setup_logging(level='DEBUG') logging: directory: ./logs level: INFO format: '%(asctime)s [%(levelname)s] %(name)s: %(message)s' timestamp_format: '%Y-%m-%d %H:%M:%S' filename_format: '%Y%m%d_%H%M%S' split_errors: true # If True, separate error log will be created (only if critical or errors are encountered console: true retention_days: 30 # File I/O settings compressed_file_buffer_size: 1048576 # 1MB buffer for reading compressed files (.gz, .bz2, .xz) data_dump_dir: /tmp # Default directory for BulkSurge dump() operations # CSV writer settings null_string_csv: '' # How to represent NULL values in CSV output ``` ### Database Connections Each connection is defined under the `connections:` key with a `type` field indicating the database type. Some database drivers use non-standard parameter names (`db` instead `database`). DBTK will automatically map standard names to driver specific names if needed. ```yaml connections: my_database: type: postgres # Database type: postgres, oracle, mysql, sqlserver, sqlite host: localhost # Server hostname port: 5432 # Port (optional, uses driver default) database: mydb # Database name user: myuser # Username password: mypassword # Plaintext - encrypt it! See below encrypted_password: gAAAAABh... #Use `dbtk encrypt-password mypassword` ``` **Driver selection:** You can optionally specify a `driver` field to choose a specific database adapter. If omitted, DBTK automatically selects the best available driver by priority. If you specify `driver`, `type` is optional since DBTK can infer it from the driver name. ```yaml connections: # Use psycopg (v3) instead of default psycopg2 pg_v3: driver: psycopg # run `dbtk checkup` to show available drivers host: localhost database: mydb user: myuser encrypted_password: gAAAAABh... # ODBC connection with DSN odbc_db: driver: pyodbc_postgres dsn: MY_DSN password: '${MY_PASSWORD}' # Pull from environment variable ``` ### Driver-Specific Examples **PostgreSQL:** ```yaml connections: postgres_db: type: postgres host: localhost port: 5432 database: mydb user: myuser encrypted_password: gAAAAABh... ``` **Oracle:** ```yaml connections: oracle_db: type: oracle host: oracle.company.com port: 1521 database: prod.company.com # Service name user: app_user encrypted_password: gAAAAABh... ``` **MySQL:** ```yaml connections: mysql_db: type: mysql host: localhost port: 3306 database: mydb user: myuser password: secret ``` **SQL Server:** ```yaml connections: sqlserver_db: driver: pyodbc_sqlserver type: sqlserver host: localhost\SQLEXPRESS # Instance name supported database: mydb user: myuser password: secret cursor: fast_executemany: true # Recommended for bulk operations ``` **SQLite:** ```yaml connections: sqlite_db: type: sqlite database: /path/to/database.db ``` ### Cursor Settings Set default cursor behavior for all cursors created from a connection: ```yaml connections: my_database: type: postgres host: localhost database: mydb user: myuser encrypted_password: gAAAAABh... cursor: batch_size: 4000 # Rows to process at once in bulk operations debug: false # Print SQL queries and bind variables return_cursor: true # execute() returns cursor for method chaining fast_executemany: true # For pyodbc SQL Server bulk inserts ``` ## Password Encryption DBTK uses Fernet symmetric encryption (from the `cryptography` library) for password storage. Before you can begin encrypting and decrypting passwords, you must generate and store an encryption key. If system keyring is available, this is as easy as running `dbtk store-key`. See [Command-Line Tools](#command-line-tools) and [Programatic Encryption](#programmatic-encryption) sections for help. ### Encryption Key Management DBTK looks for encryption keys in this order: 1. Environment variable: `DBTK_ENCRYPTION_KEY` 2. System keyring: service `dbtk`, key `encryption_key` (requires `keyring` library) ### Command-Line Tools All encryption operations use the `dbtk` CLI with subcommands: ```bash # Generate a new encryption key $ dbtk generate-key # Generate a key to manually store in environmental variable # Store key in system keyring (requires keyring library) $ dbtk store-key [key] # Store provided key or generate new one $ dbtk store-key --force # Overwrite existing key # Encrypt a single password $ dbtk encrypt-password mypassword gAAAAABh... # Encrypt all passwords in a config file # Finds plaintext 'password:' entries and converts to 'encrypted_password:' $ dbtk encrypt-config [dbtk.yml] # Migrate config to a new encryption key $ dbtk migrate-config old_config.yml new_config.yml --new-key "new_key_here" # Check dependencies, drivers, and configuration $ dbtk checkup # Interactive configuration setup wizard $ dbtk config-setup ``` If the `dbtk` command is not found (e.g. pip-installed without admin privileges so the scripts directory is not in PATH), every command above can be run as `python -m dbtk` instead: ```bash python -m dbtk config-setup python -m dbtk checkup python -m dbtk encrypt-config ``` ### Using Encrypted Passwords In your config file, use `encrypted_password` instead of `password`: ```yaml connections: prod_db: type: postgres host: db.example.com database: production user: app_user encrypted_password: gAAAAABh... ``` ### Programmatic Encryption ```python from dbtk import config # Generate encryption key key = config.generate_encryption_key() # Encrypt/decrypt passwords cfg = config.ConfigManager() encrypted = cfg.encrypt_password('my_secret') decrypted = cfg.decrypt_password(encrypted) # Encrypt all passwords in a config file config.encrypt_config_file('dbtk.yml') # Store key in system keyring config.store_key(key, force=True) ``` ### Environment Variables in Connection Config Reference environment variables in any connection parameter using `${VAR_NAME}` syntax. You can also provide a default value with `${VAR_NAME:default}`: ```yaml connections: # Required env var (fails if not set) prod_db: type: postgres host: '${PROD_HOST}' password: '${PROD_PASSWORD}' # With defaults (uses default if env var not set) dev_db: type: postgres host: '${DB_HOST:localhost}' port: '${DB_PORT:5432}' database: '${DB_NAME:myapp_dev}' user: '${DB_USER:developer}' password: '${DB_PASSWORD:dev_password}' ``` This is especially useful for Docker/CI environments where you want a config that works both locally (using defaults) and in production (using env vars). ### Recommended Setup for Production ```bash # 1. Generate encryption key $ dbtk generate-key # Output: kL7xP9... (your Fernet key) # 2. Store key securely $ export DBTK_ENCRYPTION_KEY=kL7xP9... # For containers/CI # OR $ dbtk store-key kL7xP9... # For workstations with keyring # 3. Encrypt passwords in config $ dbtk encrypt-config dbtk.yml ``` **Key rotation:** Use `dbtk migrate-config` to migrate a config file to a new environment without having shared encryption keys. The passwords will be decrypted with your current key and encrypted with the new key. ```bash export DBTK_ENCRYPTION_KEY="current_key" NEW_KEY=$(dbtk generate-key) dbtk migrate-config dbtk.yml dbtk_new.yml --new-key "$NEW_KEY" export DBTK_ENCRYPTION_KEY="$NEW_KEY" mv dbtk_new.yml dbtk.yml ``` ## Standalone Passwords Store non-database credentials (API keys, etc.) in the `passwords` section: ```yaml passwords: openai_key: description: "OpenAI API key for data processing" encrypted_password: gAAAAABh... ``` ## Custom Driver Registration Register custom database drivers in the config file. By registering a driver for any DB-API 2.0 compliant library, DBTK can support additional databases. Plain reads, writes, and `DataSurge` inserts/updates/deletes work without any additional configuration. The following features are database-specific and will not work without a matching `DatabaseDialect` subclass: `Table.upsert`, `DataSurge.upsert`, `Table.merge`, `DataSurge.merge`, `BulkSurge`, and `column_defs_from_db`. ```yaml drivers: firebird: database_type: firebird module: firebird.driver # Only needed if name doesn't match module priority: 1 param_map: {'database': 'db', 'password': 'passwd'} # Map non-standard parameter names required_params: [{'host', 'database', 'user'}, {'dsn'}] optional_params: {'port', 'protocol'} connection_method: kwargs # connection_string (postgres), dsn (oracle), odbc_string (odbc), kwargs (all others) default_port: 3050 ``` Or register programmatically: ```python from dbtk.database import register_user_drivers register_user_drivers({ 'my_driver': { 'database_type': 'postgres', 'priority': 10, 'param_map': {'database': 'dbname'}, 'required_params': [{'host', 'database', 'user'}], 'optional_params': {'port', 'password'}, 'connection_method': 'kwargs', 'default_port': 5432 } }) ``` ## Security Best Practices 1. Use `encrypted_password` in production config files 2. Never commit encryption keys to version control 3. Use `DBTK_ENCRYPTION_KEY` environment variable in containerized environments 4. Use system keyring on workstations (`dbtk store-key`) 5. Set restrictive permissions on config files (`chmod 600`) 6. Use separate configs for dev/staging/production ## See Also - [Database Connections](03-database-connections.md) - Using the Database and Cursor classes - [ETL Framework](09-etl-tools.md) - Building production ETL pipelines - [Advanced Features](10-advanced.md) - Custom drivers and performance tuning