OpenSeeSimE-Full

Shared utilities and infrastructure for evaluating vision-language models on the OpenSeeSimE benchmark datasets.

This repository provides standardized tools for prompt construction, response parsing, checkpoint management, and evaluation protocols to ensure reproducible and fair comparison of VLM performance on engineering simulation visualization tasks.

About OpenSeeSimE

OpenSeeSimE is a large-scale benchmark for evaluating vision-language models on engineering simulation interpretation tasks. The benchmark consists of two comprehensive datasets covering different physics domains:

Available Datasets:

Dataset	Domain	Examples	HuggingFace Repository
OpenSeeSimE-Structural	Structural Mechanics & FEA	~103K	🤗 cmudrc/OpenSeeSimE-Structural
OpenSeeSimE-Fluid	Computational Fluid Dynamics	~98K	🤗 cmudrc/OpenSeeSimE-Fluid

What This Repository Provides:

While the full datasets are hosted on HuggingFace, this repository contains the shared utilities for working with both datasets including prompt construction, video processing, response parsing, checkpoint management, and evaluation protocols.

Features

Dataset Loading: Load and filter OpenSeeSimE datasets by media type (image/video)
Prompt Construction: Build standardized system and user prompts for consistent evaluation
Video Processing: Extract frames with middle-frame-centered symmetric sampling
Response Parsing: Parse and validate model responses with exact-match checking
Evaluation: Calculate accuracy metrics overall and per question type
Checkpoint Management: Save and resume evaluation progress automatically

Installation & Setup

Clone the Repository

git clone https://github.com/cmudrc/OpenSeeSimE-Full.git
cd OpenSeeSimE-Full

Install Dependencies

pip install -r requirements.txt

Core dependencies include: datasets, transformers, torch, pillow, opencv-python, numpy, pandas, tqdm

Environment Configuration

Set your HuggingFace token (required for dataset access):

export HUGGING_FACE_HUB_TOKEN="hf_..."

Or login via CLI:

huggingface-cli login

Verify Setup

from utils import load_benchmark_dataset

# Load Structural dataset
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Structural",
    media_type='image'
)
print(f"Successfully loaded {len(dataset)} examples")

Quick Start

from utils import (
    load_benchmark_dataset,
    build_system_prompt,
    build_user_prompt,
    parse_model_response,
    evaluate_response
)

# Load dataset
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Structural",
    media_type='image'
)

# Get an example
example = dataset[0]

# Build prompts
system_prompt = build_system_prompt()
user_prompt = build_user_prompt(
    question=example['question'],
    answer_choices=example['answer_choices'],
    is_video=False
)

# Call your model
# model_response = your_model.generate(system_prompt, user_prompt, example['image'])

# Parse and evaluate
model_answer, explanation = parse_model_response(
    model_response,
    example['answer_choices']
)

is_correct = evaluate_response(
    model_answer,
    example['answer'],
    example['answer_choices']
)

Key Utilities

Dataset Loading

# Load Structural dataset
dataset = load_benchmark_dataset(dataset_name="cmudrc/OpenSeeSimE-Structural")

# Load Fluid dataset, images only
dataset = load_benchmark_dataset(
    dataset_name="cmudrc/OpenSeeSimE-Fluid",
    media_type='image'
)

Prompt Construction

system_prompt = build_system_prompt()
user_prompt = build_user_prompt(question, answer_choices, is_video=False)

Video Processing

# Extract 8 frames with middle frame guaranteed
frames = extract_video_frames(video_path, num_frames=8)

Response Parsing and Evaluation

# Parse model response
answer, explanation = parse_model_response(response_text, answer_choices)

# Evaluate against ground truth
is_correct = evaluate_response(model_answer, ground_truth, answer_choices)

Checkpoint Management

# Load checkpoint to resume
processed_indices, results = load_checkpoint("checkpoint.pkl")

# Save progress
save_checkpoint("checkpoint.pkl", processed_indices, results)

# Clean up after completion
cleanup_checkpoint("checkpoint.pkl")

Evaluation Best Practices

Use standardized prompts: Always use build_system_prompt() and build_user_prompt() for consistency
Validate responses: Use parse_model_response() to ensure answers match provided choices
Enable checkpointing: Save progress frequently for long evaluations
Use deterministic settings: Set temperature=0.0 and do_sample=False for evaluation
Middle-frame sampling: For videos, use middle_frame_guarantee=True (frame 100 contains maximum deformation/flow development)

Dataset Information

Both datasets share the same structure:

Field	Type	Description
`question`	str	Question text
`answer`	str	Ground truth answer
`answer_choices`	List[str]	Multiple choice options
`question_id`	int	Question identifier (1-20)
`question_type`	str	Binary, Multiple Choice, or Spatial
`media_type`	str	"image" or "video"
`image`	PIL.Image	Image for image examples
`video`	str	Video path for video examples
`file_name`	str	Original file identifier
`source_file`	str	Source simulation model

OpenSeeSimE-Structural

5 structural models: Dog Bone, Hip Implant, Pressure Vessel, Thermal Beam, Wall Bracket
Physics: Stress analysis, deformation patterns, structural mechanics
Visualizations: Stress contours, displacement fields, strain distributions

OpenSeeSimE-Fluid

5 fluid models: Bent Pipe, Converging Nozzle, Mixing Pipe, Heat Sink, Heat Exchanger
Physics: Turbulent flow, heat transfer, complex flow patterns
Visualizations: Velocity contours, pressure fields, streamlines, pathlines

For complete dataset details, see the HuggingFace repositories.

Standardized Prompts

The system prompt enforces structured output:

Line 1: Exact copy of answer from choices
Line 2+: Brief explanation (10-15 words)
No paraphrasing or summarizing of the answer

User prompt format:

{question}

Answer options:
- {choice_1}
- {choice_2}
...

Instructions:
1. First line: Provide ONLY your answer exactly as it appears in the options above.
2. Second line onwards: Provide a brief summary explaining your reasoning.

Answer:

Citation

If you use the OpenSeeSimE benchmark or these utilities, please cite:

@article{ezemba2024opensesime,
  title={OpenSeeSimE: A Large-Scale Benchmark to Assess Vision-Language Model Question Answering Capabilities in Engineering Simulations},
  author={Ezemba, Jessica and Pohl, Jason and Tucker, Conrad and McComb, Christopher},
  year={2025}
}

License

MIT License - See LICENSE file for details.

Contact

Authors: Jessica Ezemba ([email protected]), Jason Pohl, Conrad Tucker, Christopher McComb
Institution: Department of Mechanical Engineering, Carnegie Mellon University

For questions or issues, open an issue on GitHub or email [email protected]

Last Updated: December 24, 2025

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_usage.py		example_usage.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenSeeSimE-Full

About OpenSeeSimE

Features

Installation & Setup

Clone the Repository

Install Dependencies

Environment Configuration

Verify Setup

Quick Start

Key Utilities

Dataset Loading

Prompt Construction

Video Processing

Response Parsing and Evaluation

Checkpoint Management

Evaluation Best Practices

Dataset Information

OpenSeeSimE-Structural

OpenSeeSimE-Fluid

Standardized Prompts

Citation

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

cmudrc/OpenSeeSimE-Full

Folders and files

Latest commit

History

Repository files navigation

OpenSeeSimE-Full

About OpenSeeSimE

Features

Installation & Setup

Clone the Repository

Install Dependencies

Environment Configuration

Verify Setup

Quick Start

Key Utilities

Dataset Loading

Prompt Construction

Video Processing

Response Parsing and Evaluation

Checkpoint Management

Evaluation Best Practices

Dataset Information

OpenSeeSimE-Structural

OpenSeeSimE-Fluid

Standardized Prompts

Citation

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages