- Updated `CODING_STANDARDS.md` with async SQLAlchemy patterns, modern Python type hints, and new error handling examples. - Introduced a new `COMMON_PITFALLS.md` file detailing frequent implementation mistakes and explicit rules to prevent them. - Covered database optimizations, validation best practices, FastAPI design guidelines, security considerations, and Python language issues. - Aimed to enhance code quality and reduce recurring mistakes during development.
699 lines
18 KiB
Markdown
699 lines
18 KiB
Markdown
# Common Pitfalls & How to Avoid Them
|
|
|
|
> **Purpose**: This document catalogs common mistakes encountered during implementation and provides explicit rules to prevent them. **Read this before writing any code.**
|
|
|
|
## Table of Contents
|
|
|
|
- [SQLAlchemy & Database](#sqlalchemy--database)
|
|
- [Pydantic & Validation](#pydantic--validation)
|
|
- [FastAPI & API Design](#fastapi--api-design)
|
|
- [Security & Authentication](#security--authentication)
|
|
- [Python Language Gotchas](#python-language-gotchas)
|
|
|
|
---
|
|
|
|
## SQLAlchemy & Database
|
|
|
|
### ❌ PITFALL #1: Using Mutable Defaults in Columns
|
|
|
|
**Issue**: Using `default={}` or `default=[]` creates shared state across all instances.
|
|
|
|
```python
|
|
# ❌ WRONG - All instances share the same dict!
|
|
class User(Base):
|
|
metadata = Column(JSON, default={}) # DANGER: Mutable default!
|
|
tags = Column(JSON, default=[]) # DANGER: Shared list!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Use callable factory
|
|
class User(Base):
|
|
metadata = Column(JSON, default=dict) # New dict per instance
|
|
tags = Column(JSON, default=list) # New list per instance
|
|
```
|
|
|
|
**Rule**: Always use `default=dict` or `default=list` (without parentheses), never `default={}` or `default=[]`.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #2: Forgetting to Index Foreign Keys
|
|
|
|
**Issue**: Foreign key columns without indexes cause slow JOIN operations.
|
|
|
|
```python
|
|
# ❌ WRONG - No index on foreign key
|
|
class UserSession(Base):
|
|
user_id = Column(UUID, ForeignKey('users.id'), nullable=False)
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Always index foreign keys
|
|
class UserSession(Base):
|
|
user_id = Column(UUID, ForeignKey('users.id'), nullable=False, index=True)
|
|
```
|
|
|
|
**Rule**: ALWAYS add `index=True` to foreign key columns. SQLAlchemy doesn't do this automatically.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #3: Missing Composite Indexes
|
|
|
|
**Issue**: Queries filtering by multiple columns cannot use single-column indexes efficiently.
|
|
|
|
```python
|
|
# ❌ MISSING - Slow query on (user_id, is_active)
|
|
class UserSession(Base):
|
|
user_id = Column(UUID, ForeignKey('users.id'), index=True)
|
|
is_active = Column(Boolean, default=True, index=True)
|
|
# Query: WHERE user_id=X AND is_active=TRUE uses only one index!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Composite index for common query pattern
|
|
class UserSession(Base):
|
|
user_id = Column(UUID, ForeignKey('users.id'), index=True)
|
|
is_active = Column(Boolean, default=True, index=True)
|
|
|
|
__table_args__ = (
|
|
Index('ix_user_sessions_user_active', 'user_id', 'is_active'),
|
|
)
|
|
```
|
|
|
|
**Rule**: Add composite indexes for commonly used multi-column filters. Review query patterns and create indexes accordingly.
|
|
|
|
**Performance Impact**: Can reduce query time from seconds to milliseconds for large tables.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #4: Not Using Soft Deletes
|
|
|
|
**Issue**: Hard deletes destroy data and audit trails permanently.
|
|
|
|
```python
|
|
# ❌ RISKY - Permanent data loss
|
|
def delete_user(user_id: UUID):
|
|
user = db.query(User).filter(User.id == user_id).first()
|
|
db.delete(user) # Data gone forever!
|
|
db.commit()
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Soft delete with audit trail
|
|
class User(Base):
|
|
deleted_at = Column(DateTime(timezone=True), nullable=True)
|
|
|
|
def soft_delete_user(user_id: UUID):
|
|
user = db.query(User).filter(User.id == user_id).first()
|
|
user.deleted_at = datetime.now(timezone.utc)
|
|
db.commit()
|
|
```
|
|
|
|
**Rule**: For user data, ALWAYS use soft deletes. Add `deleted_at` column and filter queries with `.filter(deleted_at.is_(None))`.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #5: Missing Query Ordering
|
|
|
|
**Issue**: Queries without `ORDER BY` return unpredictable results, breaking pagination.
|
|
|
|
```python
|
|
# ❌ WRONG - Random order, pagination broken
|
|
def get_users(skip: int, limit: int):
|
|
return db.query(User).offset(skip).limit(limit).all()
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Stable ordering for consistent pagination
|
|
def get_users(skip: int, limit: int):
|
|
return (
|
|
db.query(User)
|
|
.filter(User.deleted_at.is_(None))
|
|
.order_by(User.created_at.desc()) # Consistent order
|
|
.offset(skip)
|
|
.limit(limit)
|
|
.all()
|
|
)
|
|
```
|
|
|
|
**Rule**: ALWAYS add `.order_by()` to paginated queries. Default to `created_at.desc()` for newest-first.
|
|
|
|
---
|
|
|
|
## Pydantic & Validation
|
|
|
|
### ❌ PITFALL #6: Missing Size Validation on JSON Fields
|
|
|
|
**Issue**: Unbounded JSON fields enable DoS attacks through deeply nested objects.
|
|
|
|
```python
|
|
# ❌ WRONG - No size limit (JSON bomb vulnerability)
|
|
class UserCreate(BaseModel):
|
|
metadata: dict[str, Any] # No limit!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Validate serialized size
|
|
import json
|
|
from pydantic import field_validator
|
|
|
|
class UserCreate(BaseModel):
|
|
metadata: dict[str, Any]
|
|
|
|
@field_validator("metadata")
|
|
@classmethod
|
|
def validate_metadata_size(cls, v: dict[str, Any]) -> dict[str, Any]:
|
|
metadata_json = json.dumps(v, separators=(",", ":"))
|
|
max_size = 10_000 # 10KB limit
|
|
|
|
if len(metadata_json) > max_size:
|
|
raise ValueError(f"Metadata exceeds {max_size} bytes")
|
|
|
|
return v
|
|
```
|
|
|
|
**Rule**: ALWAYS validate the serialized size of dict/JSON fields. Typical limits:
|
|
- User metadata: 10KB
|
|
- Configuration: 100KB
|
|
- Never exceed 1MB
|
|
|
|
**Security Impact**: Prevents DoS attacks via deeply nested JSON objects.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #7: Missing max_length on String Fields
|
|
|
|
**Issue**: Unbounded text fields enable memory exhaustion attacks and database errors.
|
|
|
|
```python
|
|
# ❌ WRONG - No length limit
|
|
class UserCreate(BaseModel):
|
|
email: str
|
|
name: str
|
|
bio: str | None = None
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Explicit length limits matching database
|
|
class UserCreate(BaseModel):
|
|
email: str = Field(..., max_length=255)
|
|
name: str = Field(..., min_length=1, max_length=100)
|
|
bio: str | None = Field(None, max_length=500)
|
|
```
|
|
|
|
**Rule**: Add `max_length` to ALL string fields. Limits should match database column definitions:
|
|
- Emails: 255 characters
|
|
- Names/titles: 100-255 characters
|
|
- Descriptions/bios: 500-1000 characters
|
|
- Error messages: 5000 characters
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #8: Inconsistent Validation Between Create and Update
|
|
|
|
**Issue**: Adding validators to Create schema but not Update schema.
|
|
|
|
```python
|
|
# ❌ INCOMPLETE - Only validates on create
|
|
class UserCreate(BaseModel):
|
|
email: str = Field(..., max_length=255)
|
|
|
|
@field_validator("email")
|
|
@classmethod
|
|
def validate_email_format(cls, v: str) -> str:
|
|
if "@" not in v:
|
|
raise ValueError("Invalid email format")
|
|
return v.lower()
|
|
|
|
class UserUpdate(BaseModel):
|
|
email: str | None = None # No validator!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Same validation on both schemas
|
|
class UserCreate(BaseModel):
|
|
email: str = Field(..., max_length=255)
|
|
|
|
@field_validator("email")
|
|
@classmethod
|
|
def validate_email_format(cls, v: str) -> str:
|
|
if "@" not in v:
|
|
raise ValueError("Invalid email format")
|
|
return v.lower()
|
|
|
|
class UserUpdate(BaseModel):
|
|
email: str | None = Field(None, max_length=255)
|
|
|
|
@field_validator("email")
|
|
@classmethod
|
|
def validate_email_format(cls, v: str | None) -> str | None:
|
|
if v is None:
|
|
return v
|
|
if "@" not in v:
|
|
raise ValueError("Invalid email format")
|
|
return v.lower()
|
|
```
|
|
|
|
**Rule**: Apply the SAME validators to both Create and Update schemas. Handle `None` values in Update validators.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #9: Not Using Field Descriptions
|
|
|
|
**Issue**: Missing descriptions make API documentation unclear.
|
|
|
|
```python
|
|
# ❌ WRONG - No descriptions
|
|
class UserCreate(BaseModel):
|
|
email: str
|
|
password: str
|
|
is_superuser: bool = False
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Clear descriptions
|
|
class UserCreate(BaseModel):
|
|
email: str = Field(
|
|
...,
|
|
description="User's email address (must be unique)",
|
|
examples=["user@example.com"]
|
|
)
|
|
password: str = Field(
|
|
...,
|
|
min_length=8,
|
|
description="Password (minimum 8 characters)",
|
|
examples=["SecurePass123!"]
|
|
)
|
|
is_superuser: bool = Field(
|
|
default=False,
|
|
description="Whether user has superuser privileges"
|
|
)
|
|
```
|
|
|
|
**Rule**: Add `description` and `examples` to all fields for automatic OpenAPI documentation.
|
|
|
|
---
|
|
|
|
## FastAPI & API Design
|
|
|
|
### ❌ PITFALL #10: Missing Rate Limiting
|
|
|
|
**Issue**: No rate limiting allows abuse and DoS attacks.
|
|
|
|
```python
|
|
# ❌ WRONG - No rate limits
|
|
@router.post("/auth/login")
|
|
def login(credentials: OAuth2PasswordRequestForm):
|
|
# Anyone can try unlimited passwords!
|
|
...
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Rate limit sensitive endpoints
|
|
from slowapi import Limiter
|
|
|
|
limiter = Limiter(key_func=lambda request: request.client.host)
|
|
|
|
@router.post("/auth/login")
|
|
@limiter.limit("5/minute") # Only 5 attempts per minute
|
|
def login(request: Request, credentials: OAuth2PasswordRequestForm):
|
|
...
|
|
```
|
|
|
|
**Rule**: Apply rate limits to ALL endpoints:
|
|
- Authentication: 5/minute
|
|
- Write operations: 10-20/minute
|
|
- Read operations: 30-60/minute
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #11: Returning Sensitive Data in Responses
|
|
|
|
**Issue**: Exposing internal fields like passwords, tokens, or internal IDs.
|
|
|
|
```python
|
|
# ❌ WRONG - Returns password hash!
|
|
@router.get("/users/{user_id}")
|
|
def get_user(user_id: UUID, db: Session = Depends(get_db)) -> User:
|
|
return user_crud.get(db, id=user_id) # Returns ORM model with ALL fields!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Use response schema
|
|
@router.get("/users/{user_id}", response_model=UserResponse)
|
|
def get_user(user_id: UUID, db: Session = Depends(get_db)):
|
|
user = user_crud.get(db, id=user_id)
|
|
if not user:
|
|
raise HTTPException(status_code=404, detail="User not found")
|
|
return user # Pydantic filters to only UserResponse fields
|
|
|
|
class UserResponse(BaseModel):
|
|
"""Public user data - NO sensitive fields."""
|
|
id: UUID
|
|
email: str
|
|
is_active: bool
|
|
created_at: datetime
|
|
# NO: password, hashed_password, tokens, etc.
|
|
|
|
model_config = ConfigDict(from_attributes=True)
|
|
```
|
|
|
|
**Rule**: ALWAYS use dedicated response schemas. Never return ORM models directly.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #12: Missing Error Response Standardization
|
|
|
|
**Issue**: Inconsistent error formats confuse API consumers.
|
|
|
|
```python
|
|
# ❌ WRONG - Different error formats
|
|
@router.get("/users/{user_id}")
|
|
def get_user(user_id: UUID):
|
|
if not user:
|
|
raise HTTPException(404, "Not found") # Format 1
|
|
|
|
if not user.is_active:
|
|
return {"error": "User inactive"} # Format 2
|
|
|
|
try:
|
|
...
|
|
except Exception as e:
|
|
return {"message": str(e)} # Format 3
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Consistent error format
|
|
class ErrorResponse(BaseModel):
|
|
success: bool = False
|
|
errors: list[ErrorDetail]
|
|
|
|
class ErrorDetail(BaseModel):
|
|
code: str
|
|
message: str
|
|
field: str | None = None
|
|
|
|
@router.get("/users/{user_id}")
|
|
def get_user(user_id: UUID):
|
|
if not user:
|
|
raise NotFoundError(
|
|
message="User not found",
|
|
error_code="USER_001"
|
|
)
|
|
|
|
# Global exception handler ensures consistent format
|
|
@app.exception_handler(APIException)
|
|
async def api_exception_handler(request: Request, exc: APIException):
|
|
return JSONResponse(
|
|
status_code=exc.status_code,
|
|
content={
|
|
"success": False,
|
|
"errors": [
|
|
{
|
|
"code": exc.error_code,
|
|
"message": exc.message,
|
|
"field": exc.field
|
|
}
|
|
]
|
|
}
|
|
)
|
|
```
|
|
|
|
**Rule**: Use custom exceptions and global handlers for consistent error responses across all endpoints.
|
|
|
|
---
|
|
|
|
## Security & Authentication
|
|
|
|
### ❌ PITFALL #13: Logging Sensitive Information
|
|
|
|
**Issue**: Passwords, tokens, and secrets in logs create security vulnerabilities.
|
|
|
|
```python
|
|
# ❌ WRONG - Logs credentials
|
|
logger.info(f"User {email} logged in with password: {password}") # NEVER!
|
|
logger.debug(f"JWT token: {access_token}") # NEVER!
|
|
logger.info(f"Database URL: {settings.database_url}") # Contains password!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Never log sensitive data
|
|
logger.info(f"User {email} logged in successfully")
|
|
logger.debug("Access token generated")
|
|
logger.info(f"Database connected: {settings.database_url.split('@')[1]}") # Only host
|
|
```
|
|
|
|
**Rule**: NEVER log:
|
|
- Passwords (plain or hashed)
|
|
- Tokens (access, refresh, API keys)
|
|
- Full database URLs
|
|
- Credit card numbers
|
|
- Personal data (SSN, passport, etc.)
|
|
|
|
**Use Pydantic's `SecretStr`** for sensitive config values.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #14: Weak Password Requirements
|
|
|
|
**Issue**: No password strength requirements allow weak passwords.
|
|
|
|
```python
|
|
# ❌ WRONG - No validation
|
|
class UserCreate(BaseModel):
|
|
password: str
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Enforce minimum standards
|
|
class UserCreate(BaseModel):
|
|
password: str = Field(..., min_length=8)
|
|
|
|
@field_validator("password")
|
|
@classmethod
|
|
def validate_password_strength(cls, v: str) -> str:
|
|
if len(v) < 8:
|
|
raise ValueError("Password must be at least 8 characters")
|
|
|
|
# For admin/superuser, enforce stronger requirements
|
|
has_upper = any(c.isupper() for c in v)
|
|
has_lower = any(c.islower() for c in v)
|
|
has_digit = any(c.isdigit() for c in v)
|
|
|
|
if not (has_upper and has_lower and has_digit):
|
|
raise ValueError(
|
|
"Password must contain uppercase, lowercase, and number"
|
|
)
|
|
|
|
return v
|
|
```
|
|
|
|
**Rule**: Enforce password requirements:
|
|
- Minimum 8 characters
|
|
- Mix of upper/lower case and numbers for sensitive accounts
|
|
- Use bcrypt with appropriate cost factor (12+)
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #15: Not Validating Token Ownership
|
|
|
|
**Issue**: Users can access other users' resources using valid tokens.
|
|
|
|
```python
|
|
# ❌ WRONG - No ownership check
|
|
@router.delete("/sessions/{session_id}")
|
|
def revoke_session(
|
|
session_id: UUID,
|
|
current_user: User = Depends(get_current_user),
|
|
db: Session = Depends(get_db)
|
|
):
|
|
session = session_crud.get(db, id=session_id)
|
|
session_crud.deactivate(db, session_id=session_id)
|
|
# BUG: User can revoke ANYONE'S session!
|
|
return {"message": "Session revoked"}
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Verify ownership
|
|
@router.delete("/sessions/{session_id}")
|
|
def revoke_session(
|
|
session_id: UUID,
|
|
current_user: User = Depends(get_current_user),
|
|
db: Session = Depends(get_db)
|
|
):
|
|
session = session_crud.get(db, id=session_id)
|
|
|
|
if not session:
|
|
raise NotFoundError("Session not found")
|
|
|
|
# CRITICAL: Check ownership
|
|
if session.user_id != current_user.id:
|
|
raise AuthorizationError("You can only revoke your own sessions")
|
|
|
|
session_crud.deactivate(db, session_id=session_id)
|
|
return {"message": "Session revoked"}
|
|
```
|
|
|
|
**Rule**: ALWAYS verify resource ownership before allowing operations. Check `resource.user_id == current_user.id`.
|
|
|
|
---
|
|
|
|
## Python Language Gotchas
|
|
|
|
### ❌ PITFALL #16: Using is for Value Comparison
|
|
|
|
**Issue**: `is` checks identity, not equality.
|
|
|
|
```python
|
|
# ❌ WRONG - Compares object identity
|
|
if user.role is "admin": # May fail due to string interning
|
|
grant_access()
|
|
|
|
if count is 0: # Never works for integers outside -5 to 256
|
|
return empty_response
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Use == for value comparison
|
|
if user.role == "admin":
|
|
grant_access()
|
|
|
|
if count == 0:
|
|
return empty_response
|
|
```
|
|
|
|
**Rule**: Use `==` for value comparison. Only use `is` for:
|
|
- `is None` (checking for None)
|
|
- `is True` / `is False` (checking for exact boolean objects)
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #17: Mutable Default Arguments
|
|
|
|
**Issue**: Default mutable arguments are shared across all function calls.
|
|
|
|
```python
|
|
# ❌ WRONG - list is shared!
|
|
def add_tag(user: User, tags: list = []):
|
|
tags.append("default")
|
|
user.tags.extend(tags)
|
|
# Second call will have ["default", "default"]!
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Use None and create new list
|
|
def add_tag(user: User, tags: list | None = None):
|
|
if tags is None:
|
|
tags = []
|
|
tags.append("default")
|
|
user.tags.extend(tags)
|
|
```
|
|
|
|
**Rule**: Never use mutable defaults (`[]`, `{}`). Use `None` and create inside function.
|
|
|
|
---
|
|
|
|
### ❌ PITFALL #18: Not Using Type Hints
|
|
|
|
**Issue**: Missing type hints prevent catching bugs at development time.
|
|
|
|
```python
|
|
# ❌ WRONG - No type hints
|
|
def create_user(email, password, is_active=True):
|
|
user = User(email=email, password=password, is_active=is_active)
|
|
db.add(user)
|
|
return user
|
|
```
|
|
|
|
```python
|
|
# ✅ CORRECT - Full type hints
|
|
def create_user(
|
|
email: str,
|
|
password: str,
|
|
is_active: bool = True
|
|
) -> User:
|
|
user = User(email=email, password=password, is_active=is_active)
|
|
db.add(user)
|
|
return user
|
|
```
|
|
|
|
**Rule**: Add type hints to ALL functions. Use `mypy` to enforce type checking.
|
|
|
|
---
|
|
|
|
## Checklist Before Committing
|
|
|
|
Use this checklist to catch issues before code review:
|
|
|
|
### Database
|
|
- [ ] No mutable defaults (`default=dict`, not `default={}`)
|
|
- [ ] All foreign keys have `index=True`
|
|
- [ ] Composite indexes for multi-column queries
|
|
- [ ] Soft deletes with `deleted_at` column
|
|
- [ ] All queries have `.order_by()` for pagination
|
|
|
|
### Validation
|
|
- [ ] All dict/JSON fields have size validators
|
|
- [ ] All string fields have `max_length`
|
|
- [ ] Validators applied to BOTH Create and Update schemas
|
|
- [ ] All fields have descriptions
|
|
|
|
### API Design
|
|
- [ ] Rate limits on all endpoints
|
|
- [ ] Response schemas (never return ORM models)
|
|
- [ ] Consistent error format with global handlers
|
|
- [ ] OpenAPI docs are clear and complete
|
|
|
|
### Security
|
|
- [ ] No passwords, tokens, or secrets in logs
|
|
- [ ] Password strength validation
|
|
- [ ] Resource ownership verification
|
|
- [ ] CORS configured (no wildcards in production)
|
|
|
|
### Python
|
|
- [ ] Use `==` not `is` for value comparison
|
|
- [ ] No mutable default arguments
|
|
- [ ] Type hints on all functions
|
|
- [ ] No unused imports or variables
|
|
|
|
---
|
|
|
|
## Prevention Tools
|
|
|
|
### Pre-commit Checks
|
|
|
|
Add these to your development workflow:
|
|
|
|
```bash
|
|
# Format code
|
|
black app tests
|
|
isort app tests
|
|
|
|
# Type checking
|
|
mypy app --strict
|
|
|
|
# Linting
|
|
flake8 app tests
|
|
|
|
# Run tests
|
|
pytest --cov=app --cov-report=term-missing
|
|
|
|
# Check coverage (should be 80%+)
|
|
coverage report --fail-under=80
|
|
```
|
|
|
|
---
|
|
|
|
## When to Update This Document
|
|
|
|
Add new entries when:
|
|
1. A bug makes it to production
|
|
2. Multiple review cycles catch the same issue
|
|
3. An issue takes >30 minutes to debug
|
|
4. Security vulnerability discovered
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-10-31
|
|
**Issues Cataloged**: 18 common pitfalls
|
|
**Remember**: This document exists because these issues HAVE occurred. Don't skip it.
|