系统架构设计 / System Architecture Design

概述 / Overview

智能动漫生成系统是一个简单的CLI工具，将小说转换为动漫视频。系统采用六阶段顺序处理流程，所有数据通过文件系统存储和传递，直接调用第三方AI服务API。

The Intelligent Anime Generation System is a simple CLI tool that converts novels to anime videos. The system uses a six-stage sequential processing pipeline, with all data stored and passed through the file system, directly calling third-party AI service APIs.

核心理念 / Core Philosophy:

输入: 小说文本文件 (novel.txt)
输出: 完整动漫视频 (anime_episode.mp4)
处理方式: 顺序执行六个阶段，每阶段读写文件系统
依赖: 外部AI服务API（通过配置文件设置）

处理流程 / Processing Pipeline

novel.txt
    ↓
┌─────────────────────────────────┐
│  Stage 1: Semantic Processing   │  → 01_story/story_structure.json
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  Stage 2: Worldview & Style     │  → 02_world_style/*.json
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  Stage 3: Asset Generation      │  → 03_assets/character_assets/
└─────────────────────────────────┘     03_assets/scene_assets/
    ↓
┌─────────────────────────────────┐
│  Stage 4: Shot Generation       │  → 04_shots/shot_clips/*.mp4
└─────────────────────────────────┘
    ↓
┌─────────────────────────────────┐
│  Stage 5: Audio Generation      │  → 05_audio/dialogue/
└─────────────────────────────────┘     05_audio/sfx/
                                        05_audio/bgm/
    ↓
┌─────────────────────────────────┐
│  Stage 6: Editing & Delivery    │  → 06_edit_delivery/anime_episode.mp4
└─────────────────────────────────┘

核心组件 / Core Components

1. CLI工具 / CLI Tool

职责 / Responsibilities:

接收用户命令和参数
读取配置文件（AI服务密钥、生成参数等）
顺序执行六个处理阶段
显示进度和状态信息
处理错误和中断恢复

基本命令 / Basic Commands:

# 完整流程：从小说到视频
anime-gen process --input novel.txt --output output/

# 执行单个阶段
anime-gen stage1 --workdir /path/to/project
anime-gen stage2 --workdir /path/to/project
# ... 以此类推

# 从某阶段恢复
anime-gen process --input novel.txt --output output/ --resume-from stage3

2. 配置管理 / Configuration

配置文件 / Config File: config.yaml

# AI服务配置
ai_services:
  llm:
    provider: "openai"  # openai / anthropic / local
    api_key: "${OPENAI_API_KEY}"
    model: "gpt-4"
  
  image_gen:
    provider: "stable-diffusion"
    api_key: "${SD_API_KEY}"
    model: "sdxl-1.0"
  
  video_gen:
    provider: "runway"
    api_key: "${RUNWAY_API_KEY}"
  
  tts:
    provider: "azure"
    api_key: "${AZURE_TTS_KEY}"
    region: "eastus"

# 生成参数
generation:
  target_quality: "standard"  # low / standard / high
  target_duration: 300  # 目标时长（秒）
  frame_rate: 24
  resolution: "1920x1080"

# 输出目录结构
output:
  base_dir: "./output"
  keep_intermediate: true  # 保留中间文件

3. 阶段处理器 / Stage Processors

每个阶段都是独立的处理模块，遵循统一接口：

type StageProcessor interface {
    // 执行处理
    Process(inputDir string, outputDir string) error
    
    // 验证输入
    ValidateInput(inputDir string) error
    
    // 获取阶段信息
    GetInfo() StageInfo
}

type StageInfo struct {
    Name        string
    Description string
    InputFiles  []string  // 需要的输入文件
    OutputFiles []string  // 产生的输出文件
}

六个阶段处理器：

SemanticProcessor: 语义理解与故事结构化
StyleProcessor: 世界观与基调确立
AssetProcessor: 角色与视觉资产生成
ShotProcessor: 分镜化与镜头生成
AudioProcessor: 音频与对白生成
EditProcessor: 剪辑与输出

4. AI服务客户端 / AI Service Clients

提供简单的API调用封装：

// LLM客户端
type LLMClient struct {
    provider string
    apiKey   string
    model    string
}

func (c *LLMClient) Complete(prompt string) (string, error)

// 图像生成客户端
type ImageGenClient struct {
    provider string
    apiKey   string
}

func (c *ImageGenClient) Generate(prompt string, params ImageParams) ([]byte, error)

// 视频生成客户端
type VideoGenClient struct {
    provider string
    apiKey   string
}

func (c *VideoGenClient) Generate(prompt string, params VideoParams) ([]byte, error)

// TTS客户端
type TTSClient struct {
    provider string
    apiKey   string
}

func (c *TTSClient) Synthesize(text string, voiceID string) ([]byte, error)

数据流 / Data Flow

文件系统组织 / Filesystem Organization

project/
├── config.yaml              # 配置文件
├── novel.txt                # 输入小说
└── output/                  # 输出目录
    ├── 01_story/
    │   ├── novel_cleaned.txt
    │   └── story_structure.json
    ├── 02_world_style/
    │   ├── world_config.json
    │   └── style_bible.json
    ├── 03_assets/
    │   ├── character_assets/
    │   │   ├── characters.json
    │   │   └── char_001/
    │   │       ├── front.png
    │   │       ├── side.png
    │   │       └── expressions/
    │   └── scene_assets/
    │       ├── environments.json
    │       └── loc_001/
    │           └── background.png
    ├── 04_shots/
    │   ├── shot_plan.json
    │   ├── shot_metadata.json
    │   └── shot_clips/
    │       ├── shot_001.mp4
    │       └── shot_002.mp4
    ├── 05_audio/
    │   ├── alignment.json
    │   ├── dialogue_tracks/
    │   ├── sfx_tracks/
    │   └── bgm_tracks/
    └── 06_edit_delivery/
        ├── timeline_edit.xml
        ├── deliverables/
        │   └── anime_episode.mp4
        ├── subtitles/
        │   └── episode.srt
        └── metadata.json

数据传递 / Data Passing

每个阶段从文件系统读取输入（上一阶段的输出）
每个阶段将结果写入文件系统
使用JSON格式存储结构化数据
使用标准格式存储媒体文件（PNG/MP4/WAV等）

技术选型 / Technology Stack

编程语言 / Programming Language

Go 1.21+: 主要开发语言，用于CLI工具和流程编排

CLI框架 / CLI Framework

Cobra 或 urfave/cli: 命令行界面
进度显示: 文本进度条和状态输出

文件处理 / File Processing

JSON: 结构化数据存储
YAML: 配置文件

视频/音频处理 / Video/Audio Processing

FFmpeg: 视频编辑、转码、音视频合成

AI服务 / AI Services

LLM: OpenAI API / Anthropic Claude API
图像生成: Stable Diffusion API / Midjourney API
视频生成: Runway API / Pika API
TTS: Azure TTS / Coqui TTS / VITS

执行流程 / Execution Flow

主流程 / Main Flow

func main() {
    // 1. 加载配置
    config := LoadConfig("config.yaml")
    
    // 2. 初始化AI服务客户端
    llmClient := NewLLMClient(config.AIServices.LLM)
    imageClient := NewImageGenClient(config.AIServices.ImageGen)
    videoClient := NewVideoGenClient(config.AIServices.VideoGen)
    ttsClient := NewTTSClient(config.AIServices.TTS)
    
    // 3. 创建阶段处理器
    stages := []StageProcessor{
        NewSemanticProcessor(llmClient),
        NewStyleProcessor(llmClient, imageClient),
        NewAssetProcessor(imageClient),
        NewShotProcessor(videoClient),
        NewAudioProcessor(ttsClient),
        NewEditProcessor(),
    }
    
    // 4. 顺序执行各阶段
    for i, stage := range stages {
        fmt.Printf("执行阶段 %d: %s\n", i+1, stage.GetInfo().Name)
        
        inputDir := GetStageInputDir(i)
        outputDir := GetStageOutputDir(i)
        
        if err := stage.ValidateInput(inputDir); err != nil {
            log.Fatalf("阶段 %d 输入验证失败: %v", i+1, err)
        }
        
        if err := stage.Process(inputDir, outputDir); err != nil {
            log.Fatalf("阶段 %d 处理失败: %v", i+1, err)
        }
        
        fmt.Printf("阶段 %d 完成\n", i+1)
    }
    
    fmt.Println("全部处理完成！")
}

错误处理 / Error Handling

每个阶段独立处理错误
失败时保存当前进度
支持从失败点恢复执行
记录详细日志到文件

性能优化 / Performance Optimization

并行处理 / Parallel Processing

在单个阶段内，某些任务可以并行：

阶段3: 多个角色资产可并行生成
阶段4: 多个镜头片段可并行生成
阶段5: 多条音频轨道可并行生成

// 示例：并行生成多个角色
var wg sync.WaitGroup
for _, char := range characters {
    wg.Add(1)
    go func(c Character) {
        defer wg.Done()
        generateCharacterAsset(c)
    }(char)
}
wg.Wait()

缓存策略 / Caching

缓存AI服务响应（相同prompt避免重复调用）
角色资产复用（同一角色不同场景）
场景背景复用

安全性 / Security

API密钥管理 / API Key Management

从环境变量读取敏感信息
配置文件中使用 ${ENV_VAR} 占位符
不在代码或日志中硬编码密钥

输入验证 / Input Validation

验证小说文件格式和编码
检查配置文件完整性
防止路径遍历攻击

可观测性 / Observability

日志 / Logging

记录每个阶段的开始和结束时间
记录AI服务调用详情（不含密钥）
记录错误和警告信息
日志文件：output/process.log

进度显示 / Progress Display

执行阶段 1: 语义理解与故事结构化
[████████████████████░░░░░░░░░░] 60% (处理章节 3/5)

估计剩余时间: 5分钟

扩展性 / Extensibility

插件化AI服务 / Pluggable AI Services

通过接口设计，轻松切换AI服务提供商：

type LLMProvider interface {
    Complete(prompt string) (string, error)
}

// OpenAI实现
type OpenAIProvider struct { ... }

// Anthropic实现
type AnthropicProvider struct { ... }

// 本地模型实现
type LocalLLMProvider struct { ... }

自定义阶段 / Custom Stages

可以添加或替换处理阶段：

添加质量检查阶段
添加人工审核暂停点
自定义后处理效果

部署 / Deployment

单机运行 / Single Machine

┌────────────────────────────────────┐
│      本地机器 / Local Machine       │
│  ┌──────────────────────────────┐  │
│  │   CLI工具 (Go程序)            │  │
│  │   - 流程编排                  │  │
│  │   - 阶段处理                  │  │
│  │   - AI服务调用                │  │
│  └──────────────────────────────┘  │
│  ┌──────────────────────────────┐  │
│  │   本地文件系统                │  │
│  │   - 输入小说                  │  │
│  │   - 中间结果                  │  │
│  │   - 最终视频                  │  │
│  └──────────────────────────────┘  │
│  ┌──────────────────────────────┐  │
│  │   FFmpeg                      │  │
│  └──────────────────────────────┘  │
└────────────────────────────────────┘
         │
         ▼
┌────────────────────────────────────┐
│    第三方AI服务 / AI Services       │
│  - OpenAI / Claude (LLM)           │
│  - Stable Diffusion (图像)         │
│  - Runway / Pika (视频)            │
│  - Azure TTS (语音)                │
└────────────────────────────────────┘

系统要求 / System Requirements

操作系统: Linux / macOS / Windows
内存: 8GB+ (推荐16GB+)
存储: 根据小说长度，每章节约需要10-50GB
网络: 稳定的互联网连接（调用AI服务API）
依赖: FFmpeg, Go 1.21+
✅ 第三方API调用

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

系统架构设计 / System Architecture Design

概述 / Overview

处理流程 / Processing Pipeline

核心组件 / Core Components

1. CLI工具 / CLI Tool

2. 配置管理 / Configuration

3. 阶段处理器 / Stage Processors

4. AI服务客户端 / AI Service Clients

数据流 / Data Flow

文件系统组织 / Filesystem Organization

数据传递 / Data Passing

技术选型 / Technology Stack

编程语言 / Programming Language

CLI框架 / CLI Framework

文件处理 / File Processing

视频/音频处理 / Video/Audio Processing

AI服务 / AI Services

执行流程 / Execution Flow

主流程 / Main Flow

错误处理 / Error Handling

性能优化 / Performance Optimization

并行处理 / Parallel Processing

缓存策略 / Caching

安全性 / Security

API密钥管理 / API Key Management

输入验证 / Input Validation

可观测性 / Observability

日志 / Logging

进度显示 / Progress Display

扩展性 / Extensibility

插件化AI服务 / Pluggable AI Services

自定义阶段 / Custom Stages

部署 / Deployment

单机运行 / Single Machine

系统要求 / System Requirements

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

系统架构设计 / System Architecture Design

概述 / Overview

处理流程 / Processing Pipeline

核心组件 / Core Components

1. CLI工具 / CLI Tool

2. 配置管理 / Configuration

3. 阶段处理器 / Stage Processors

4. AI服务客户端 / AI Service Clients

数据流 / Data Flow

文件系统组织 / Filesystem Organization

数据传递 / Data Passing

技术选型 / Technology Stack

编程语言 / Programming Language

CLI框架 / CLI Framework

文件处理 / File Processing

视频/音频处理 / Video/Audio Processing

AI服务 / AI Services

执行流程 / Execution Flow

主流程 / Main Flow

错误处理 / Error Handling

性能优化 / Performance Optimization

并行处理 / Parallel Processing

缓存策略 / Caching

安全性 / Security

API密钥管理 / API Key Management

输入验证 / Input Validation

可观测性 / Observability

日志 / Logging

进度显示 / Progress Display

扩展性 / Extensibility

插件化AI服务 / Pluggable AI Services

自定义阶段 / Custom Stages

部署 / Deployment

单机运行 / Single Machine

系统要求 / System Requirements