Skip to content

fix: buffer safety and thread safety in util.cu#377

Open
randomizedcoder wants to merge 2 commits into
NVIDIA:masterfrom
randomizedcoder:sa/01-sprintf-localtime
Open

fix: buffer safety and thread safety in util.cu#377
randomizedcoder wants to merge 2 commits into
NVIDIA:masterfrom
randomizedcoder:sa/01-sprintf-localtime

Conversation

@randomizedcoder

Copy link
Copy Markdown

Static analysis findings

cpplint (Level 5) and flawfinder (Level 2, CWE-120) independently flagged src/util.cu for unsafe C string operations:

Finding Tool Location Severity
sprintf() with no buffer size limit cpplint, flawfinder util.cu:426,461-466 (7 calls) CWE-120 buffer overflow
strcpy() with no bounds checking cpplint, flawfinder util.cu:576 CWE-120 buffer overflow
localtime() returns shared static data cpplint util.cu:280 CWE-362 race condition

These were the top three findings across all 10 static analysis tools run on the nccl-tests codebase (see static-analysis/README.md on the static-analysis branch).

Why these matter

sprintf in getFloatStr() can overflow its buffer. The function is called with tight buffers (char timeStr[8] for width=7), but sprintf treats width as a minimum field width, not a maximum. With extreme values (e.g., bandwidth > 10M GB/s), sprintf writes past the buffer end. The adversarial test (test_sprintf_overflow_exploit) demonstrates this: a canary byte placed at buf[8] is silently corrupted.

strcpy at line 576 copies a fixed string into the middle of a buffer with no bounds checking. While the current usage is safe by construction (the target offset is always MAX_LINE-5 and the string is 4 bytes), the pattern is fragile — any change to MAX_LINE or the string could introduce an overflow.

localtime() returns a pointer to a process-wide static struct tm. The function formatNow() is called from writeBenchmarkLineBody(), which runs per-thread. Concurrent calls corrupt each other's results. The adversarial test (test_localtime_race_exploit) demonstrates year corruption across 8 threads running 10K iterations each.

Git history

All affected code in src/util.cu traces to a single commit:

Commit Date Author Description
00f5281 2025-10-17 David Addison "Add support for JSON output to perf test framework"

This was a large commit that introduced the JSON output infrastructure, the getFloatStr() helper, the line-truncation logic, and the formatNow() timestamp function. The sprintf/strcpy/localtime patterns were carried over from the kind of quick C idioms that are natural in a test harness but become a concern as the codebase grows. util.cu itself didn't exist before this commit — the file was extracted to support the new output framework.

Note that src/util.cu has had 8 commits from 5 contributors since then. The unsafe patterns have persisted through all of those changes, suggesting they haven't been noticed rather than being a deliberate choice.

Changes

Commit 1: Tests (TDD — intentionally FAIL before fix)

  • tests/c/test_util_safety.c — 11 test cases:
    • Source verification (scans util.cu for unsafe patterns)
    • snprintf truncation, return value, and buffer size tests
    • Adversarial exploit proofs: test_sprintf_overflow_exploit (canary corruption), test_strcpy_overflow_exploit (canary corruption), test_localtime_race_exploit (thread corruption)
    • memcpy truncation marker safety
    • localtime_r basic + thread safety (8 threads × 10K iterations)

Commit 2: Fixes

  • getFloatStr(): Add size parameter, replace 6x sprintfsnprintf
  • writeBenchmarkLinePreamble(): sprintfsnprintf with sizeof(rootName)
  • Line 576: strcpymemcpy + explicit null-termination
  • formatNow(): localtimelocaltime_r with stack-allocated struct tm

Test plan

  • make -C tests/c test — 11/11 pass
  • Verify commit 1 shows 10/11 pass (source-verification intentionally fails)
  • Verify commit 2 shows 11/11 pass (all fixed)
  • Adversarial tests prove the vulnerabilities are real:
    • test_sprintf_overflow_exploit — canary byte at buf[8] corrupted by sprintf
    • test_strcpy_overflow_exploit — canary byte at buf[5] corrupted by strcpy
    • test_localtime_race_exploit — year corruption across 8 concurrent threads

🤖 Generated with Claude Code

randomizedcoder and others added 2 commits April 28, 2026 14:30
Table-driven tests for three classes of unsafe C function usage
found by static analysis (cpplint, flawfinder, semgrep):

  - sprintf overflow: demonstrates buffer overrun when output exceeds
    the caller-provided buffer (getFloatStr uses 7-8 byte buffers)
  - strcpy overflow: demonstrates write past buffer end vs safe memcpy
  - localtime race: 8-thread concurrent test shows data corruption
    when threads share localtime's static buffer

Source-verification test intentionally FAILS at this commit to
demonstrate the unsafe patterns exist in util.cu before the fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: randomizedcoder dave.seddon.ca@gmail.com <dave.seddon.ca@gmail.com>
Static analysis (cpplint, flawfinder, semgrep) flagged three classes
of unsafe C function usage in util.cu:

  - sprintf: 8 calls without buffer size limit (CWE-120)
  - strcpy: 1 call without bounds checking (CWE-120)
  - localtime: 1 call using thread-unsafe static buffer (CWE-362)

Replace sprintf with snprintf (add size parameter to getFloatStr),
strcpy with memcpy + explicit null-termination (compile-time constant
sizes), and localtime with localtime_r (stack-allocated struct tm).

All 11 tests now PASS including source-verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: randomizedcoder dave.seddon.ca@gmail.com <dave.seddon.ca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant