# Troubleshooting Guide

Common issues and solutions when using DBTK.

## Installation Issues

### `dbtk` Command Not Found (Windows)

**Problem:** After `pip install dbtk`, running `dbtk checkup` gives *'dbtk' is not recognized* or a similar error.

**Cause:** pip installs console scripts to a `Scripts` folder that is not on your `PATH`.

**Solution:** Find the Scripts folder and add it to your PATH.

```powershell
# Find the Scripts folder
python -c "import sysconfig; print(sysconfig.get_path('scripts'))"
```

Then add that path to your Windows PATH:

1. Open **Start** → search *"environment variables"* → **Edit the system environment variables**
2. Click **Environment Variables…**
3. Under *User variables*, select **Path** and click **Edit**
4. Click **New** and paste the Scripts folder path
5. Click **OK** on all dialogs, then open a new terminal

`dbtk checkup` should now work from any directory.

---

### Import Errors

**Problem:** `ImportError: No module named 'dbtk'`

**Solution:**
```bash
# Verify installation
pip show dbtk

# Reinstall if needed
pip install --upgrade dbtk
```

### Driver Import Errors

**Problem:** `ModuleNotFoundError: No module named 'psycopg2'`

**Solution:** Install the appropriate database driver:
```bash
pip install psycopg2-binary  # PostgreSQL
pip install oracledb         # Oracle
pip install mysqlclient      # MySQL
pip install pyodbc           # SQL Server
```

**Check available drivers:**
```bash
dbtk checkup
```

---

## Database Connection Issues

### Connection Refused

**Problem:** `psycopg2.OperationalError: could not connect to server: Connection refused`

**Causes:**
1. Database server not running
2. Wrong host/port
3. Firewall blocking connection

**Solution:**
```python
# Verify connection parameters
from dbtk.database import postgres
db = postgres(
    host='localhost',  # Try 127.0.0.1 if localhost fails
    port=5432,         # Check actual port
    database='mydb',
    user='myuser',
    password='mypassword'
)

# Test connection
try:
    cursor = db.cursor()
    cursor.execute("SELECT 1")
    print("Connected!")
except Exception as e:
    print(f"Connection failed: {e}")
```

### Authentication Failed

**Problem:** `psycopg2.OperationalError: FATAL:  password authentication failed`

**Solutions:**
```python
# 1. Verify credentials
# 2. Check user permissions
# 3. For PostgreSQL, check pg_hba.conf

# Test with different authentication method
db = postgres(
    host='localhost',
    database='mydb',
    user='myuser',
    password='correctpassword'  # Verify this is correct
)
```

### Connection Timeout

**Problem:** Connection hangs or times out

**Solution:**
```python
# Add timeout parameter (driver-specific)
db = postgres(
    host='remote-db.example.com',
    database='mydb',
    user='myuser',
    password='mypassword',
    connect_timeout=10  # Seconds
)
```

### Encrypted Password Not Working

**Problem:** `cryptography.fernet.InvalidToken`

**Causes:**
1. Wrong encryption key
2. Key not found
3. Password encrypted with different key

**Solution:**
```bash
# Verify encryption key is set
echo $DBTK_ENCRYPTION_KEY

# Or check system keyring
dbtk store-key --show

# Re-encrypt with correct key
dbtk encrypt-config dbtk.yml
```

---

## SQL and Query Issues

### Parameter Style Errors

**Problem:** `TypeError: not all arguments converted during string formatting`

**Cause:** Using wrong parameter style for your database

**Solution:**
```python
# ✓ CORRECT: Use named parameters - DBTK converts automatically
cursor.execute(
    "SELECT * FROM users WHERE status = :status AND age > :age",
    {'status': 'active', 'age': 18}
)

# ✗ WRONG: Don't mix parameter styles
cursor.execute(
    "SELECT * FROM users WHERE status = %s AND age > :age",  # Mixed!
    ['active', 18]
)

# Use PreparedStatement or execute_file() for automatic conversion
stmt = cursor.prepare_file('query.sql')
stmt.execute({'status': 'active', 'age': 18})
```

### SQL Syntax Errors

**Problem:** `psycopg2.errors.SyntaxError: syntax error at or near "..."`

**Debugging:**
```python
# Enable debug mode to see generated SQL
cursor = db.cursor(debug=True)
cursor.execute("SELECT * FROM users WHERE id = :id", {'id': 123})
# Prints: SELECT * FROM users WHERE id = %(id)s
# Params: {'id': 123}

# Or use get_sql() with Table
table.set_values(record)
sql = table.get_sql('insert')
print(f"Generated SQL: {sql}")
```

### Column Not Found

**Problem:** `psycopg2.errors.UndefinedColumn: column "user_name" does not exist`

**Solution:**
```python
# Check column names in database
cursor.execute("SELECT * FROM users LIMIT 1")
print(cursor.columns())  # Show actual column names

# Use correct mapping in Table
table = Table('users', {
    'username': {'field': 'user_name'},  # Map to correct source field
    # 'user_name': {}  # ✗ If DB column is 'username' not 'user_name'
}, cursor=cursor)
```

---

## File I/O Issues

### File Not Found

**Problem:** `FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'`

**Solution:**
```python
import os
from pathlib import Path

# Use absolute paths
file_path = Path('/full/path/to/data.csv')
with dbtk.readers.get_reader(file_path) as reader:
    process(reader)

# Or check current directory
print(f"Current directory: {os.getcwd()}")
print(f"File exists: {Path('data.csv').exists()}")
```

### Character Encoding Errors

**Problem:** `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff`

**Solutions:**
```python
# 1. Use utf-8-sig to handle BOM (Byte Order Mark)
with dbtk.readers.CSVReader(open('data.csv', encoding='utf-8-sig')) as reader:
    process(reader)

# 2. Try different encodings
encodings = ['utf-8-sig', 'latin-1', 'cp1252', 'iso-8859-1']
for enc in encodings:
    try:
        with open('data.csv', encoding=enc) as f:
            reader = dbtk.readers.CSVReader(f)
            print(f"Success with encoding: {enc}")
            break
    except UnicodeDecodeError:
        continue

# 3. Detect encoding
# pip install chardet
import chardet
with open('data.csv', 'rb') as f:
    result = chardet.detect(f.read(10000))
    print(f"Detected encoding: {result['encoding']}")
```

### Corrupted Compressed Files

**Problem:** `gzip.BadGzipFile: Not a gzipped file`

**Solution:**
```python
# Verify file is actually compressed
import gzip
try:
    with gzip.open('data.csv.gz', 'rt') as f:
        f.read(100)  # Try reading first 100 bytes
    print("File is valid gzip")
except gzip.BadGzipFile:
    print("File is not gzipped or is corrupted")
    # Try reading as plain file
    with open('data.csv.gz', 'r') as f:
        reader = dbtk.readers.CSVReader(f)
```

### Excel File Errors

**Problem:** `ModuleNotFoundError: No module named 'openpyxl'`

**Solution:**
```bash
# Install Excel support
pip install openpyxl

# Or install full format support
pip install dbtk[formats]
```

---

## ETL and Table Issues

### Missing Required Fields

**Problem:** `ValueError: Required columns missing: {'email', 'name'}`

**Debugging:**
```python
# Check what's missing
table.set_values(record)
if not table.is_ready('insert'):
    missing = table.reqs_missing('insert')
    print(f"Missing fields: {missing}")
    print(f"Current values: {table.values}")

# Check source data
print(f"Record keys: {list(record.keys())}")
print(f"Record (normalized): {list(record.keys(normalized=True))}")
```

**Solutions:**
```python
# 1. Add defaults for missing fields
table = Table('users', {
    'email': {'field': 'email_address', 'nullable': False},
    'status': {'default': 'active'},  # Always has value
}, cursor=cursor)

# 2. Skip incomplete records
for record in reader:
    table.set_values(record)
    if table.is_ready('insert'):
        table.execute('insert')
    # Silently skip incomplete
```

### Transform Function Errors

**Problem:** `ValueError: invalid literal for int() with base 10: 'N/A'`

**Solution:**
```python
# Use transforms that handle None/empty gracefully
from dbtk.etl import transforms as tx

# ✓ CORRECT: get_int() returns None for invalid values
table = Table('users', {
    'age': {'field': 'age_string', 'fn': tx.get_int},  # Returns None if invalid
}, cursor=cursor)

# Or use string shorthand — returns None for invalid, use column default: 0 for zero-default
table = Table('users', {
    'age': {'field': 'age_string', 'fn': 'int', 'default': 0},
}, cursor=cursor)

# For custom error handling
def safe_int(value):
    try:
        return int(value) if value else None
    except (ValueError, TypeError):
        logger.warning(f"Invalid integer: {value}")
        return None

table = Table('users', {
    'age': {'field': 'age_string', 'fn': safe_int},
}, cursor=cursor)
```

### Database Expression Errors

**Problem:** `RuntimeError: BulkSurge does not support db_expr columns`

**Solution:**
```python
# BulkSurge can't use database functions (db_expr)
# Use DataSurge instead
surge = DataSurge(table)  # Supports db_expr
surge.insert(records)

# Or remove db_expr from column config for BulkSurge
table = Table('users', {
    'created_at': {'db_expr': 'CURRENT_TIMESTAMP'},  # ✗ Won't work with BulkSurge
}, cursor=cursor)

# For BulkSurge, compute in Python
table = Table('users', {
    'created_at': {'default': lambda: datetime.now()},  # ✓ Works with BulkSurge
}, cursor=cursor)
```

---

## Performance Issues

### Slow Bulk Loading

**Problem:** Insert/update operations taking too long

**Solutions:**
```python
# 1. Increase batch size
surge = DataSurge(table, batch_size=5000)  # Default is 1000

# 2. Use BulkSurge for maximum speed
surge = BulkSurge(table)  # 2-3x faster than DataSurge
surge.load(records)

# 3. Use transactions
surge = DataSurge(table, use_transaction=True)

# 4. Disable indexes during load, re-enable after
cursor.execute("ALTER TABLE users DISABLE TRIGGER ALL")  # PostgreSQL
surge.insert(records)
cursor.execute("ALTER TABLE users ENABLE TRIGGER ALL")
db.commit()
```

### Memory Issues with Large Files

**Problem:** `MemoryError` when processing large files

**Solutions:**
```python
# 1. Don't call fetchall() on huge result sets
# ✗ BAD: Loads everything into memory
results = cursor.fetchall()

# ✓ GOOD: Iterate (constant memory)
for row in cursor:
    process(row)

# 2. Use reader streaming (already efficient)
with dbtk.readers.get_reader('huge_file.csv.gz') as reader:
    for record in reader:  # Streams, doesn't load all
        process(record)

# 3. For DataFrames, use lazy/streaming
import polars as pl
df = pl.scan_csv('huge.csv').collect(streaming=True)

# 4. Increase compressed file buffer if needed
from dbtk.defaults import settings
settings['compressed_file_buffer_size'] = 2 * 1024 * 1024  # 2MB
```

### Slow Queries

**Problem:** Queries taking too long

**Debugging:**
```python
import time

# Time the query
start = time.time()
cursor.execute("SELECT * FROM huge_table WHERE status = :status",
               {'status': 'active'})
elapsed = time.time() - start
print(f"Query took {elapsed:.2f} seconds")

# Check query plan (PostgreSQL)
cursor.execute("EXPLAIN ANALYZE SELECT * FROM huge_table WHERE status = 'active'")
print(cursor.fetchall())
```

**Solutions:**
1. Add database indexes on frequently queried columns
2. Use `LIMIT` for development/testing
3. Filter data in database, not Python
4. Use database-side aggregation (GROUP BY, etc.)

---

## BulkSurge Specific Issues

### MySQL: "LOAD DATA LOCAL INFILE forbidden"

**Problem:** `mysql.connector.errors.DatabaseError: 1148: LOAD DATA LOCAL INFILE command is denied`

**Solution:**
```python
# Check server setting
cursor.execute("SELECT @@local_infile")
result = cursor.fetchone()
print(f"local_infile: {result[0]}")  # Must be 1

# If 0, contact DBA or use external method
surge.load(reader, method='external')  # Dumps CSV with instructions
```

### SQL Server: "connection_name missing"

**Problem:** `RuntimeError: BCP needs credentials. Please set up a named connection in the config file.`

**Solution:**
```python
# ✗ WRONG: Direct connection won't work for BCP
db = dbtk.sqlserver(host='server', user='sa', password='secret')

# ✓ CORRECT: Use named connection from config
db = dbtk.connect('mssql_prod')  # From dbtk.yml

# In dbtk.yml:
# connections:
#   mssql_prod:
#     type: sqlserver
#     host: server
#     database: mydb
#     user: sa
#     encrypted_password: gAAAAABh...
```

### Oracle: "direct_path_load not supported"

**Problem:** `AttributeError: module 'oracledb' has no attribute 'direct_path_load'`

**Solution:**
```bash
# Upgrade python-oracledb to 3.4+
pip install --upgrade oracledb

# Or use external method (SQL*Loader)
surge.load(reader, method='external')
```

---

## IdentityManager Issues

### Resolver Not Finding Records

**Problem:** All entities end up as `NOT_FOUND`

**Debugging:**
```python
# Test resolver independently
stmt = cursor.prepare_file('sql/resolve_user.sql')
stmt.execute({'user_id': '12345'})
result = stmt.fetchone()
print(f"Resolver result: {result}")

# Check IdentityManager configuration
im = IdentityManager('user_id', 'person_id', resolver=stmt)
entity = im.resolve({'user_id': '12345'})
print(f"Status: {entity['_status']}")
print(f"Entity: {entity}")
```

**Common Issues:**
1. SQL query returns no rows (check WHERE clause)
2. Wrong key column name in resolver query
3. Source data has wrong key values

### State File Errors

**Problem:** `JSONDecodeError: Expecting value`

**Solution:**
```python
# State file corrupted or incomplete
# Delete and regenerate
import os
os.remove('state/entities.json')

# Or validate before loading
import json
try:
    with open('state/entities.json') as f:
        data = json.load(f)
    im = IdentityManager.load_state('state/entities.json', resolver=stmt)
except json.JSONDecodeError:
    print("State file corrupted, starting fresh")
    im = IdentityManager('source_id', 'target_id', resolver=stmt)
```

---

## Logging Issues

### Log Files Not Created

**Problem:** No log files in logs directory

**Solution:**
```python
# Ensure logging is initialized
import dbtk
dbtk.setup_logging()  # Must call this first

# Check settings
from dbtk.defaults import settings
print(settings['logging'])

# Verify directory exists and is writable
from pathlib import Path
log_dir = Path('./logs')
log_dir.mkdir(parents=True, exist_ok=True)
print(f"Log directory exists: {log_dir.exists()}")
print(f"Log directory writable: {os.access(log_dir, os.W_OK)}")
```

### Errors Not Detected

**Problem:** `errors_logged()` returns None but errors occurred

**Solution:**
```python
# Ensure split_errors is enabled in config
# dbtk.yml:
# settings:
#   logging:
#     split_errors: true  # Required for errors_logged() to work

# Or check main log file if split_errors=false
dbtk.setup_logging()
# ... your code ...
error_log = dbtk.errors_logged()
if error_log:
    print(f"Errors in: {error_log}")
else:
    # Check if any ERROR/CRITICAL was logged
    import logging
    logger = logging.getLogger()
    # Review log file manually
```

---

## Getting Help

If you're still stuck:

1. **Enable debug mode:**
   ```python
   cursor = db.cursor(debug=True)  # Prints SQL and parameters
   ```

2. **Check the examples:**
   - Look in `/examples` folder for working code
   - Review [Getting Started](01-getting-started.md)

3. **Search documentation:**
   - [Database Connections](03-database-connections.md)
   - [ETL: Table & Transforms](07-table.md)
   - [ETL: DataSurge & BulkSurge](08-datasurge.md)
   - [API Reference](api.rst)

4. **Report issues:**
   - GitHub Issues: https://github.com/yourusername/dbtk/issues
   - Include: DBTK version, Python version, database type, error message

5. **Check dependencies:**
   ```bash
   dbtk checkup  # Shows installed drivers and configuration
   ```