Files
fast-next-template/backend/docs/COMMON_PITFALLS.md
Felipe Cardoso e8156b751e Add async coding standards and common pitfalls documentation
- Updated `CODING_STANDARDS.md` with async SQLAlchemy patterns, modern Python type hints, and new error handling examples.
- Introduced a new `COMMON_PITFALLS.md` file detailing frequent implementation mistakes and explicit rules to prevent them.
- Covered database optimizations, validation best practices, FastAPI design guidelines, security considerations, and Python language issues.
- Aimed to enhance code quality and reduce recurring mistakes during development.
2025-10-31 19:24:00 +01:00

18 KiB

Common Pitfalls & How to Avoid Them

Purpose: This document catalogs common mistakes encountered during implementation and provides explicit rules to prevent them. Read this before writing any code.

Table of Contents


SQLAlchemy & Database

PITFALL #1: Using Mutable Defaults in Columns

Issue: Using default={} or default=[] creates shared state across all instances.

# ❌ WRONG - All instances share the same dict!
class User(Base):
    metadata = Column(JSON, default={})  # DANGER: Mutable default!
    tags = Column(JSON, default=[])     # DANGER: Shared list!
# ✅ CORRECT - Use callable factory
class User(Base):
    metadata = Column(JSON, default=dict)  # New dict per instance
    tags = Column(JSON, default=list)      # New list per instance

Rule: Always use default=dict or default=list (without parentheses), never default={} or default=[].


PITFALL #2: Forgetting to Index Foreign Keys

Issue: Foreign key columns without indexes cause slow JOIN operations.

# ❌ WRONG - No index on foreign key
class UserSession(Base):
    user_id = Column(UUID, ForeignKey('users.id'), nullable=False)
# ✅ CORRECT - Always index foreign keys
class UserSession(Base):
    user_id = Column(UUID, ForeignKey('users.id'), nullable=False, index=True)

Rule: ALWAYS add index=True to foreign key columns. SQLAlchemy doesn't do this automatically.


PITFALL #3: Missing Composite Indexes

Issue: Queries filtering by multiple columns cannot use single-column indexes efficiently.

# ❌ MISSING - Slow query on (user_id, is_active)
class UserSession(Base):
    user_id = Column(UUID, ForeignKey('users.id'), index=True)
    is_active = Column(Boolean, default=True, index=True)
    # Query: WHERE user_id=X AND is_active=TRUE uses only one index!
# ✅ CORRECT - Composite index for common query pattern
class UserSession(Base):
    user_id = Column(UUID, ForeignKey('users.id'), index=True)
    is_active = Column(Boolean, default=True, index=True)

    __table_args__ = (
        Index('ix_user_sessions_user_active', 'user_id', 'is_active'),
    )

Rule: Add composite indexes for commonly used multi-column filters. Review query patterns and create indexes accordingly.

Performance Impact: Can reduce query time from seconds to milliseconds for large tables.


PITFALL #4: Not Using Soft Deletes

Issue: Hard deletes destroy data and audit trails permanently.

# ❌ RISKY - Permanent data loss
def delete_user(user_id: UUID):
    user = db.query(User).filter(User.id == user_id).first()
    db.delete(user)  # Data gone forever!
    db.commit()
# ✅ CORRECT - Soft delete with audit trail
class User(Base):
    deleted_at = Column(DateTime(timezone=True), nullable=True)

def soft_delete_user(user_id: UUID):
    user = db.query(User).filter(User.id == user_id).first()
    user.deleted_at = datetime.now(timezone.utc)
    db.commit()

Rule: For user data, ALWAYS use soft deletes. Add deleted_at column and filter queries with .filter(deleted_at.is_(None)).


PITFALL #5: Missing Query Ordering

Issue: Queries without ORDER BY return unpredictable results, breaking pagination.

# ❌ WRONG - Random order, pagination broken
def get_users(skip: int, limit: int):
    return db.query(User).offset(skip).limit(limit).all()
# ✅ CORRECT - Stable ordering for consistent pagination
def get_users(skip: int, limit: int):
    return (
        db.query(User)
        .filter(User.deleted_at.is_(None))
        .order_by(User.created_at.desc())  # Consistent order
        .offset(skip)
        .limit(limit)
        .all()
    )

Rule: ALWAYS add .order_by() to paginated queries. Default to created_at.desc() for newest-first.


Pydantic & Validation

PITFALL #6: Missing Size Validation on JSON Fields

Issue: Unbounded JSON fields enable DoS attacks through deeply nested objects.

# ❌ WRONG - No size limit (JSON bomb vulnerability)
class UserCreate(BaseModel):
    metadata: dict[str, Any]  # No limit!
# ✅ CORRECT - Validate serialized size
import json
from pydantic import field_validator

class UserCreate(BaseModel):
    metadata: dict[str, Any]

    @field_validator("metadata")
    @classmethod
    def validate_metadata_size(cls, v: dict[str, Any]) -> dict[str, Any]:
        metadata_json = json.dumps(v, separators=(",", ":"))
        max_size = 10_000  # 10KB limit

        if len(metadata_json) > max_size:
            raise ValueError(f"Metadata exceeds {max_size} bytes")

        return v

Rule: ALWAYS validate the serialized size of dict/JSON fields. Typical limits:

  • User metadata: 10KB
  • Configuration: 100KB
  • Never exceed 1MB

Security Impact: Prevents DoS attacks via deeply nested JSON objects.


PITFALL #7: Missing max_length on String Fields

Issue: Unbounded text fields enable memory exhaustion attacks and database errors.

# ❌ WRONG - No length limit
class UserCreate(BaseModel):
    email: str
    name: str
    bio: str | None = None
# ✅ CORRECT - Explicit length limits matching database
class UserCreate(BaseModel):
    email: str = Field(..., max_length=255)
    name: str = Field(..., min_length=1, max_length=100)
    bio: str | None = Field(None, max_length=500)

Rule: Add max_length to ALL string fields. Limits should match database column definitions:

  • Emails: 255 characters
  • Names/titles: 100-255 characters
  • Descriptions/bios: 500-1000 characters
  • Error messages: 5000 characters

PITFALL #8: Inconsistent Validation Between Create and Update

Issue: Adding validators to Create schema but not Update schema.

# ❌ INCOMPLETE - Only validates on create
class UserCreate(BaseModel):
    email: str = Field(..., max_length=255)

    @field_validator("email")
    @classmethod
    def validate_email_format(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("Invalid email format")
        return v.lower()

class UserUpdate(BaseModel):
    email: str | None = None  # No validator!
# ✅ CORRECT - Same validation on both schemas
class UserCreate(BaseModel):
    email: str = Field(..., max_length=255)

    @field_validator("email")
    @classmethod
    def validate_email_format(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("Invalid email format")
        return v.lower()

class UserUpdate(BaseModel):
    email: str | None = Field(None, max_length=255)

    @field_validator("email")
    @classmethod
    def validate_email_format(cls, v: str | None) -> str | None:
        if v is None:
            return v
        if "@" not in v:
            raise ValueError("Invalid email format")
        return v.lower()

Rule: Apply the SAME validators to both Create and Update schemas. Handle None values in Update validators.


PITFALL #9: Not Using Field Descriptions

Issue: Missing descriptions make API documentation unclear.

# ❌ WRONG - No descriptions
class UserCreate(BaseModel):
    email: str
    password: str
    is_superuser: bool = False
# ✅ CORRECT - Clear descriptions
class UserCreate(BaseModel):
    email: str = Field(
        ...,
        description="User's email address (must be unique)",
        examples=["user@example.com"]
    )
    password: str = Field(
        ...,
        min_length=8,
        description="Password (minimum 8 characters)",
        examples=["SecurePass123!"]
    )
    is_superuser: bool = Field(
        default=False,
        description="Whether user has superuser privileges"
    )

Rule: Add description and examples to all fields for automatic OpenAPI documentation.


FastAPI & API Design

PITFALL #10: Missing Rate Limiting

Issue: No rate limiting allows abuse and DoS attacks.

# ❌ WRONG - No rate limits
@router.post("/auth/login")
def login(credentials: OAuth2PasswordRequestForm):
    # Anyone can try unlimited passwords!
    ...
# ✅ CORRECT - Rate limit sensitive endpoints
from slowapi import Limiter

limiter = Limiter(key_func=lambda request: request.client.host)

@router.post("/auth/login")
@limiter.limit("5/minute")  # Only 5 attempts per minute
def login(request: Request, credentials: OAuth2PasswordRequestForm):
    ...

Rule: Apply rate limits to ALL endpoints:

  • Authentication: 5/minute
  • Write operations: 10-20/minute
  • Read operations: 30-60/minute

PITFALL #11: Returning Sensitive Data in Responses

Issue: Exposing internal fields like passwords, tokens, or internal IDs.

# ❌ WRONG - Returns password hash!
@router.get("/users/{user_id}")
def get_user(user_id: UUID, db: Session = Depends(get_db)) -> User:
    return user_crud.get(db, id=user_id)  # Returns ORM model with ALL fields!
# ✅ CORRECT - Use response schema
@router.get("/users/{user_id}", response_model=UserResponse)
def get_user(user_id: UUID, db: Session = Depends(get_db)):
    user = user_crud.get(db, id=user_id)
    if not user:
        raise HTTPException(status_code=404, detail="User not found")
    return user  # Pydantic filters to only UserResponse fields

class UserResponse(BaseModel):
    """Public user data - NO sensitive fields."""
    id: UUID
    email: str
    is_active: bool
    created_at: datetime
    # NO: password, hashed_password, tokens, etc.

    model_config = ConfigDict(from_attributes=True)

Rule: ALWAYS use dedicated response schemas. Never return ORM models directly.


PITFALL #12: Missing Error Response Standardization

Issue: Inconsistent error formats confuse API consumers.

# ❌ WRONG - Different error formats
@router.get("/users/{user_id}")
def get_user(user_id: UUID):
    if not user:
        raise HTTPException(404, "Not found")  # Format 1

    if not user.is_active:
        return {"error": "User inactive"}  # Format 2

    try:
        ...
    except Exception as e:
        return {"message": str(e)}  # Format 3
# ✅ CORRECT - Consistent error format
class ErrorResponse(BaseModel):
    success: bool = False
    errors: list[ErrorDetail]

class ErrorDetail(BaseModel):
    code: str
    message: str
    field: str | None = None

@router.get("/users/{user_id}")
def get_user(user_id: UUID):
    if not user:
        raise NotFoundError(
            message="User not found",
            error_code="USER_001"
        )

# Global exception handler ensures consistent format
@app.exception_handler(APIException)
async def api_exception_handler(request: Request, exc: APIException):
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "success": False,
            "errors": [
                {
                    "code": exc.error_code,
                    "message": exc.message,
                    "field": exc.field
                }
            ]
        }
    )

Rule: Use custom exceptions and global handlers for consistent error responses across all endpoints.


Security & Authentication

PITFALL #13: Logging Sensitive Information

Issue: Passwords, tokens, and secrets in logs create security vulnerabilities.

# ❌ WRONG - Logs credentials
logger.info(f"User {email} logged in with password: {password}")  # NEVER!
logger.debug(f"JWT token: {access_token}")  # NEVER!
logger.info(f"Database URL: {settings.database_url}")  # Contains password!
# ✅ CORRECT - Never log sensitive data
logger.info(f"User {email} logged in successfully")
logger.debug("Access token generated")
logger.info(f"Database connected: {settings.database_url.split('@')[1]}")  # Only host

Rule: NEVER log:

  • Passwords (plain or hashed)
  • Tokens (access, refresh, API keys)
  • Full database URLs
  • Credit card numbers
  • Personal data (SSN, passport, etc.)

Use Pydantic's SecretStr for sensitive config values.


PITFALL #14: Weak Password Requirements

Issue: No password strength requirements allow weak passwords.

# ❌ WRONG - No validation
class UserCreate(BaseModel):
    password: str
# ✅ CORRECT - Enforce minimum standards
class UserCreate(BaseModel):
    password: str = Field(..., min_length=8)

    @field_validator("password")
    @classmethod
    def validate_password_strength(cls, v: str) -> str:
        if len(v) < 8:
            raise ValueError("Password must be at least 8 characters")

        # For admin/superuser, enforce stronger requirements
        has_upper = any(c.isupper() for c in v)
        has_lower = any(c.islower() for c in v)
        has_digit = any(c.isdigit() for c in v)

        if not (has_upper and has_lower and has_digit):
            raise ValueError(
                "Password must contain uppercase, lowercase, and number"
            )

        return v

Rule: Enforce password requirements:

  • Minimum 8 characters
  • Mix of upper/lower case and numbers for sensitive accounts
  • Use bcrypt with appropriate cost factor (12+)

PITFALL #15: Not Validating Token Ownership

Issue: Users can access other users' resources using valid tokens.

# ❌ WRONG - No ownership check
@router.delete("/sessions/{session_id}")
def revoke_session(
    session_id: UUID,
    current_user: User = Depends(get_current_user),
    db: Session = Depends(get_db)
):
    session = session_crud.get(db, id=session_id)
    session_crud.deactivate(db, session_id=session_id)
    # BUG: User can revoke ANYONE'S session!
    return {"message": "Session revoked"}
# ✅ CORRECT - Verify ownership
@router.delete("/sessions/{session_id}")
def revoke_session(
    session_id: UUID,
    current_user: User = Depends(get_current_user),
    db: Session = Depends(get_db)
):
    session = session_crud.get(db, id=session_id)

    if not session:
        raise NotFoundError("Session not found")

    # CRITICAL: Check ownership
    if session.user_id != current_user.id:
        raise AuthorizationError("You can only revoke your own sessions")

    session_crud.deactivate(db, session_id=session_id)
    return {"message": "Session revoked"}

Rule: ALWAYS verify resource ownership before allowing operations. Check resource.user_id == current_user.id.


Python Language Gotchas

PITFALL #16: Using is for Value Comparison

Issue: is checks identity, not equality.

# ❌ WRONG - Compares object identity
if user.role is "admin":  # May fail due to string interning
    grant_access()

if count is 0:  # Never works for integers outside -5 to 256
    return empty_response
# ✅ CORRECT - Use == for value comparison
if user.role == "admin":
    grant_access()

if count == 0:
    return empty_response

Rule: Use == for value comparison. Only use is for:

  • is None (checking for None)
  • is True / is False (checking for exact boolean objects)

PITFALL #17: Mutable Default Arguments

Issue: Default mutable arguments are shared across all function calls.

# ❌ WRONG - list is shared!
def add_tag(user: User, tags: list = []):
    tags.append("default")
    user.tags.extend(tags)
    # Second call will have ["default", "default"]!
# ✅ CORRECT - Use None and create new list
def add_tag(user: User, tags: list | None = None):
    if tags is None:
        tags = []
    tags.append("default")
    user.tags.extend(tags)

Rule: Never use mutable defaults ([], {}). Use None and create inside function.


PITFALL #18: Not Using Type Hints

Issue: Missing type hints prevent catching bugs at development time.

# ❌ WRONG - No type hints
def create_user(email, password, is_active=True):
    user = User(email=email, password=password, is_active=is_active)
    db.add(user)
    return user
# ✅ CORRECT - Full type hints
def create_user(
    email: str,
    password: str,
    is_active: bool = True
) -> User:
    user = User(email=email, password=password, is_active=is_active)
    db.add(user)
    return user

Rule: Add type hints to ALL functions. Use mypy to enforce type checking.


Checklist Before Committing

Use this checklist to catch issues before code review:

Database

  • No mutable defaults (default=dict, not default={})
  • All foreign keys have index=True
  • Composite indexes for multi-column queries
  • Soft deletes with deleted_at column
  • All queries have .order_by() for pagination

Validation

  • All dict/JSON fields have size validators
  • All string fields have max_length
  • Validators applied to BOTH Create and Update schemas
  • All fields have descriptions

API Design

  • Rate limits on all endpoints
  • Response schemas (never return ORM models)
  • Consistent error format with global handlers
  • OpenAPI docs are clear and complete

Security

  • No passwords, tokens, or secrets in logs
  • Password strength validation
  • Resource ownership verification
  • CORS configured (no wildcards in production)

Python

  • Use == not is for value comparison
  • No mutable default arguments
  • Type hints on all functions
  • No unused imports or variables

Prevention Tools

Pre-commit Checks

Add these to your development workflow:

# Format code
black app tests
isort app tests

# Type checking
mypy app --strict

# Linting
flake8 app tests

# Run tests
pytest --cov=app --cov-report=term-missing

# Check coverage (should be 80%+)
coverage report --fail-under=80

When to Update This Document

Add new entries when:

  1. A bug makes it to production
  2. Multiple review cycles catch the same issue
  3. An issue takes >30 minutes to debug
  4. Security vulnerability discovered

Last Updated: 2025-10-31 Issues Cataloged: 18 common pitfalls Remember: This document exists because these issues HAVE occurred. Don't skip it.