Skip to content

Latest commit

 

History

History
258 lines (205 loc) · 8.74 KB

File metadata and controls

258 lines (205 loc) · 8.74 KB

PostgreSQL Data Migration Tool

A modern web-based tool for migrating data between PostgreSQL databases with an intuitive UI for selecting tables and monitoring progress.

Features

  • Web-based UI: Modern React frontend with responsive design
  • Database Connection Testing: Test connections before migration
  • Table Selection: Browse and select specific tables to migrate
  • Incremental Migration: Process large tables in configurable chunks to reduce memory usage
  • Memory Management: Automatic memory monitoring and chunk splitting
  • Streaming Support: Server-side cursors for very large tables (500K+ rows)
  • Data Safety: Truncates target tables before insertion to ensure clean data
  • Real-time Statistics: Detailed migration progress with chunk-level tracking
  • Resume Capability: Resume failed migrations from the last successful chunk
  • Error Handling: Comprehensive error reporting and validation
  • Flexible Processing: Both file-based and direct migration strategies with incremental support

Architecture

  • Backend: FastAPI with PostgreSQL connectivity using psycopg2
  • Frontend: React with TypeScript and Tailwind CSS
  • Database: PostgreSQL (source and target)

Prerequisites

  • Python 3.8+
  • Node.js 16+
  • PostgreSQL databases (source and target)
  • Network access between the tool and both databases

Installation & Setup

1. Clone or Download

# If you have the project files, navigate to the directory
cd postgres-data-migration

2. Backend Setup

# Install Python dependencies
pip install -r requirements.txt

3. Frontend Setup

# Install Node.js dependencies
npm install

Running the Application

1. Start the Backend Server

# Run the FastAPI server
python main.py

The API will be available at http://localhost:8000

2. Start the Frontend

# In a new terminal, start the React development server
npm start

The web application will open at http://localhost:3000

Usage

1. Database Connection

  • Enter credentials for your source database (where data will be copied from)
  • Click "Test Connection" to verify connectivity
  • Enter credentials for your target database (where data will be copied to)
  • Click "Test Connection" to verify connectivity

2. Table Selection

  • After successful source connection, available tables will be displayed
  • Select the tables you want to migrate
  • Review table row counts to understand data volume

3. Migration Process

  • Click "Migrate" to start the data transfer
  • ⚠️ Warning: Selected tables in the target database will be truncated (all existing data deleted)
  • Monitor real-time progress and statistics
  • Review detailed results after completion

API Endpoints

  • POST /test-connection - Test database connectivity
  • POST /list-tables - Get list of tables with row counts
  • POST /migrate-data - Perform data migration
  • GET /health - Health check endpoint

Configuration

Database Configuration (YAML)

The application supports predefined database configurations using a YAML file. This allows you to:

  • Store multiple source and destination database configurations
  • Select databases from a dropdown instead of manual entry
  • Keep sensitive credentials in a secure configuration file
  • Set default migration settings

Setup Configuration

  1. Copy the sample configuration:

    cp config.sample.yaml config.yaml
  2. Edit config.yaml with your database details:

    databases:
      sources:
        local_source:
          name: "Local Source Database"
          host: "localhost"
          port: 5432
          database: "source_db"
          username: "readonly_user"
          password: "your_password"
          description: "Local source database - READ ONLY"
        
        prod_source:
          name: "Production Source Database"
          host: "prod-db.company.com"
          port: 5432
          database: "production_db"
          username: "migration_readonly"
          password: "secure_password"
          description: "Production source database - READ ONLY"
    
      destinations:
        local_target:
          name: "Local Target Database"
          host: "localhost"
          port: 5432
          database: "target_db"
          username: "postgres"
          password: "your_password"
          description: "Local target database"
    
    default_settings:
      chunk_size: 50000
      max_connections: 2
      enable_incremental: true
      analyze_dependencies: true
  3. Security Note: The config.yaml file is excluded from version control to protect sensitive credentials.

Using Configuration

  • Database Selection: Choose from predefined databases in the dropdown
  • Custom Entry: Select "Custom Configuration" to manually enter credentials
  • Auto-populate: Selecting a predefined database automatically fills in all connection details
  • Reload: Use the "Reload Configurations" button to refresh after updating the YAML file

Environment Variables

You can set the following environment variables:

# Frontend API URL (default: http://localhost:8000)
REACT_APP_API_URL=http://localhost:8000

Database Requirements

  • Both source and target databases must be PostgreSQL
  • Target database should have the same table schemas as source
  • User accounts need appropriate permissions:
    • Source: SELECT permissions only (recommended: use readonly user)
    • Target: TRUNCATE and INSERT permissions on tables

🔒 Security Best Practices

Source Database Access:

  • Always use a readonly user for source databases to prevent accidental data modification
  • Create a dedicated readonly user with minimal permissions:
    -- Create readonly user for source database
    CREATE USER migration_readonly WITH PASSWORD 'secure_password';
    GRANT CONNECT ON DATABASE your_database TO migration_readonly;
    GRANT USAGE ON SCHEMA public TO migration_readonly;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO migration_readonly;
    GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO migration_readonly;
  • This ensures your source data remains safe during migration operations

Security Considerations

  • Database credentials are only stored in memory during the session
  • No persistent storage of sensitive information
  • Use strong database passwords and limit network access
  • Consider using connection pooling for production environments

Performance Tips

  • Incremental Migration: Large tables (100K+ rows) are automatically processed in chunks (default 50K rows per chunk)
  • Memory Efficiency: Configurable memory limits prevent out-of-memory errors
  • Streaming: Tables with 500K+ rows use server-side cursors for minimal memory usage
  • Chunk Processing: Each chunk is processed independently with immediate cleanup
  • Adaptive Sizing: Chunk sizes are automatically adjusted based on table characteristics
  • Pause Control: Configurable pauses between chunks help reduce database load
  • Resume Support: Failed migrations can be resumed from the last successful chunk
  • Consider running migrations during low-traffic periods
  • Monitor database performance during large migrations

Troubleshooting

Common Issues

  1. Connection Failed

    • Verify database host, port, and credentials
    • Check network connectivity and firewall settings
    • Ensure PostgreSQL is running and accepting connections
  2. Permission Denied

    • Verify user has necessary permissions on source and target tables
    • Check that target user can TRUNCATE and INSERT
  3. Schema Mismatch

    • Ensure target tables exist with compatible schemas
    • Check column names and data types match between source and target
  4. Migration Timeout

    • For very large tables, consider breaking down the migration
    • Check network stability and database performance

Logs

  • Backend logs are displayed in the terminal running python main.py
  • Frontend errors appear in the browser console
  • Check browser network tab for API request details

Development

Project Structure

postgres-data-migration/
├── main.py                 # FastAPI backend
├── requirements.txt        # Python dependencies
├── package.json           # Node.js dependencies
├── src/
│   ├── App.tsx            # Main React component
│   ├── components/        # React components
│   ├── types.ts           # TypeScript interfaces
│   └── api.ts             # API client
└── README.md

Adding Features

  • Backend logic: Modify main.py
  • Frontend components: Add to src/components/
  • API client: Update src/api.ts
  • Types: Define in src/types.ts

License

This project is provided as-is for educational and practical use. Please ensure you have appropriate backups before running migrations on production data.