Skip to content

cmudrc/OpenSeeSimE-Full

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenSeeSimE-Full

Shared utilities and infrastructure for evaluating vision-language models on the OpenSeeSimE benchmark datasets.

This repository provides standardized tools for prompt construction, response parsing, checkpoint management, and evaluation protocols to ensure reproducible and fair comparison of VLM performance on engineering simulation visualization tasks.


About OpenSeeSimE

OpenSeeSimE is a large-scale benchmark for evaluating vision-language models on engineering simulation interpretation tasks. The benchmark consists of two comprehensive datasets covering different physics domains:

Available Datasets:

Dataset Domain Examples HuggingFace Repository
OpenSeeSimE-Structural Structural Mechanics & FEA ~103K 🤗 cmudrc/OpenSeeSimE-Structural
OpenSeeSimE-Fluid Computational Fluid Dynamics ~98K 🤗 cmudrc/OpenSeeSimE-Fluid

What This Repository Provides:

While the full datasets are hosted on HuggingFace, this repository contains the shared utilities for working with both datasets including prompt construction, video processing, response parsing, checkpoint management, and evaluation protocols.


Features

  • Dataset Loading: Load and filter OpenSeeSimE datasets by media type (image/video)
  • Prompt Construction: Build standardized system and user prompts for consistent evaluation
  • Video Processing: Extract frames with middle-frame-centered symmetric sampling
  • Response Parsing: Parse and validate model responses with exact-match checking
  • Evaluation: Calculate accuracy metrics overall and per question type
  • Checkpoint Management: Save and resume evaluation progress automatically

Installation & Setup

Clone the Repository

git clone https://github.com/cmudrc/OpenSeeSimE-Full.git
cd OpenSeeSimE-Full

Install Dependencies

pip install -r requirements.txt

Core dependencies include: datasets, transformers, torch, pillow, opencv-python, numpy, pandas, tqdm

Environment Configuration

Set your HuggingFace token (required for dataset access):

export HUGGING_FACE_HUB_TOKEN="hf_..."

Or login via CLI:

huggingface-cli login

Verify Setup

from utils import load_benchmark_dataset

# Load Structural dataset
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Structural",
    media_type='image'
)
print(f"Successfully loaded {len(dataset)} examples")

Quick Start

from utils import (
    load_benchmark_dataset,
    build_system_prompt,
    build_user_prompt,
    parse_model_response,
    evaluate_response
)

# Load dataset
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Structural",
    media_type='image'
)

# Get an example
example = dataset[0]

# Build prompts
system_prompt = build_system_prompt()
user_prompt = build_user_prompt(
    question=example['question'],
    answer_choices=example['answer_choices'],
    is_video=False
)

# Call your model
# model_response = your_model.generate(system_prompt, user_prompt, example['image'])

# Parse and evaluate
model_answer, explanation = parse_model_response(
    model_response,
    example['answer_choices']
)

is_correct = evaluate_response(
    model_answer,
    example['answer'],
    example['answer_choices']
)

Key Utilities

Dataset Loading

# Load Structural dataset
dataset = load_benchmark_dataset(dataset_name="cmudrc/OpenSeeSimE-Structural")

# Load Fluid dataset, images only
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Fluid",
    media_type='image'
)

Prompt Construction

system_prompt = build_system_prompt()
user_prompt = build_user_prompt(question, answer_choices, is_video=False)

Video Processing

# Extract 8 frames with middle frame guaranteed
frames = extract_video_frames(video_path, num_frames=8)

Response Parsing and Evaluation

# Parse model response
answer, explanation = parse_model_response(response_text, answer_choices)

# Evaluate against ground truth
is_correct = evaluate_response(model_answer, ground_truth, answer_choices)

Checkpoint Management

# Load checkpoint to resume
processed_indices, results = load_checkpoint("checkpoint.pkl")

# Save progress
save_checkpoint("checkpoint.pkl", processed_indices, results)

# Clean up after completion
cleanup_checkpoint("checkpoint.pkl")

Evaluation Best Practices

  1. Use standardized prompts: Always use build_system_prompt() and build_user_prompt() for consistency
  2. Validate responses: Use parse_model_response() to ensure answers match provided choices
  3. Enable checkpointing: Save progress frequently for long evaluations
  4. Use deterministic settings: Set temperature=0.0 and do_sample=False for evaluation
  5. Middle-frame sampling: For videos, use middle_frame_guarantee=True (frame 100 contains maximum deformation/flow development)

Dataset Information

Both datasets share the same structure:

Field Type Description
question str Question text
answer str Ground truth answer
answer_choices List[str] Multiple choice options
question_id int Question identifier (1-20)
question_type str Binary, Multiple Choice, or Spatial
media_type str "image" or "video"
image PIL.Image Image for image examples
video str Video path for video examples
file_name str Original file identifier
source_file str Source simulation model

OpenSeeSimE-Structural

  • 5 structural models: Dog Bone, Hip Implant, Pressure Vessel, Thermal Beam, Wall Bracket
  • Physics: Stress analysis, deformation patterns, structural mechanics
  • Visualizations: Stress contours, displacement fields, strain distributions

OpenSeeSimE-Fluid

  • 5 fluid models: Bent Pipe, Converging Nozzle, Mixing Pipe, Heat Sink, Heat Exchanger
  • Physics: Turbulent flow, heat transfer, complex flow patterns
  • Visualizations: Velocity contours, pressure fields, streamlines, pathlines

For complete dataset details, see the HuggingFace repositories.


Standardized Prompts

The system prompt enforces structured output:

  • Line 1: Exact copy of answer from choices
  • Line 2+: Brief explanation (10-15 words)
  • No paraphrasing or summarizing of the answer

User prompt format:

{question}

Answer options:
- {choice_1}
- {choice_2}
...

Instructions:
1. First line: Provide ONLY your answer exactly as it appears in the options above.
2. Second line onwards: Provide a brief summary explaining your reasoning.

Answer:

Citation

If you use the OpenSeeSimE benchmark or these utilities, please cite:

@article{ezemba2024opensesime,
  title={OpenSeeSimE: A Large-Scale Benchmark to Assess Vision-Language Model Question Answering Capabilities in Engineering Simulations},
  author={Ezemba, Jessica and Pohl, Jason and Tucker, Conrad and McComb, Christopher},
  year={2025}
}

License

MIT License - See LICENSE file for details.


Contact

Authors: Jessica Ezemba ([email protected]), Jason Pohl, Conrad Tucker, Christopher McComb
Institution: Department of Mechanical Engineering, Carnegie Mellon University

For questions or issues, open an issue on GitHub or email [email protected]


Last Updated: December 24, 2025

About

Large Scale Engineering Simulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages