2025-01-31 13:20:46 +01:00
2025-01-31 13:20:46 +01:00
2025-01-31 13:20:46 +01:00
2025-01-22 17:09:05 +01:00
2025-01-22 17:10:56 +01:00
2025-01-22 17:18:02 +01:00
2025-01-22 17:18:02 +01:00

Training Monitor

Real-time monitoring system for ML training processes, specialized for diffusion model training visualization.

Features

  • Real-time training progress visualization
  • Training metrics monitoring (loss, learning rate, ETA)
  • Sample image gallery with timeline view
  • Remote SFTP data source integration
  • Docker-based deployment

Project Structure

training-monitor/
├── frontend/                 # React frontend application
│   ├── src/
│   ├── public/
│   ├── Dockerfile
│   └── package.json
├── backend/                  # FastAPI backend application
│   ├── app/
│   │   ├── api/             # API routes
│   │   ├── core/            # Core functionality
│   │   ├── models/          # Data models
│   │   └── services/        # Business logic
│   ├── tests/
│   └── Dockerfile
├── docker/
│   └── docker-compose.yml
└── scripts/                  # Utility scripts

Tech Stack

  • Frontend: React (v18+)

    • Tailwind CSS for styling
    • React Query for data fetching
    • React Router for navigation
  • Backend: FastAPI

    • Paramiko for SFTP operations
    • Pydantic for data validation
    • SQLAlchemy for database (if needed)

Setup Instructions

  1. Clone the repository:
git clone https://github.com/yourusername/training-monitor.git
cd training-monitor
  1. Start the development environment:
docker-compose up --build

The application will be available at:

Development

Frontend Development

cd frontend
npm install
npm run dev

Backend Development

cd backend
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
uvicorn app.main:app --reload

Configuration

Create a .env file in the root directory:

# Backend
SFTP_HOST=42.42.42.42
SFTP_USER=username
SFTP_PASSWORD=your_password
SFTP_PATH=/data/samples

# Frontend
REACT_APP_API_URL=http://localhost:8000

Contributing

  1. Create a feature branch
  2. Commit your changes
  3. Push to the branch
  4. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Description
Real-time monitoring system for ML training processes, specialized for diffusion model training visualization
Readme MIT 202 KiB
Languages
TypeScript 54.8%
Python 44.3%
JavaScript 0.5%
CSS 0.3%