fix: buffer safety and thread safety in util.cu#377
Open
randomizedcoder wants to merge 2 commits into
Open
Conversation
Table-driven tests for three classes of unsafe C function usage
found by static analysis (cpplint, flawfinder, semgrep):
- sprintf overflow: demonstrates buffer overrun when output exceeds
the caller-provided buffer (getFloatStr uses 7-8 byte buffers)
- strcpy overflow: demonstrates write past buffer end vs safe memcpy
- localtime race: 8-thread concurrent test shows data corruption
when threads share localtime's static buffer
Source-verification test intentionally FAILS at this commit to
demonstrate the unsafe patterns exist in util.cu before the fix.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: randomizedcoder dave.seddon.ca@gmail.com <dave.seddon.ca@gmail.com>
Static analysis (cpplint, flawfinder, semgrep) flagged three classes of unsafe C function usage in util.cu: - sprintf: 8 calls without buffer size limit (CWE-120) - strcpy: 1 call without bounds checking (CWE-120) - localtime: 1 call using thread-unsafe static buffer (CWE-362) Replace sprintf with snprintf (add size parameter to getFloatStr), strcpy with memcpy + explicit null-termination (compile-time constant sizes), and localtime with localtime_r (stack-allocated struct tm). All 11 tests now PASS including source-verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: randomizedcoder dave.seddon.ca@gmail.com <dave.seddon.ca@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Static analysis findings
cpplint (Level 5) and flawfinder (Level 2, CWE-120) independently flagged
src/util.cufor unsafe C string operations:sprintf()with no buffer size limitutil.cu:426,461-466(7 calls)strcpy()with no bounds checkingutil.cu:576localtime()returns shared static datautil.cu:280These were the top three findings across all 10 static analysis tools run on the nccl-tests codebase (see
static-analysis/README.mdon thestatic-analysisbranch).Why these matter
sprintfingetFloatStr()can overflow its buffer. The function is called with tight buffers (char timeStr[8]forwidth=7), butsprintftreatswidthas a minimum field width, not a maximum. With extreme values (e.g., bandwidth > 10M GB/s),sprintfwrites past the buffer end. The adversarial test (test_sprintf_overflow_exploit) demonstrates this: a canary byte placed atbuf[8]is silently corrupted.strcpyat line 576 copies a fixed string into the middle of a buffer with no bounds checking. While the current usage is safe by construction (the target offset is alwaysMAX_LINE-5and the string is 4 bytes), the pattern is fragile — any change toMAX_LINEor the string could introduce an overflow.localtime()returns a pointer to a process-wide staticstruct tm. The functionformatNow()is called fromwriteBenchmarkLineBody(), which runs per-thread. Concurrent calls corrupt each other's results. The adversarial test (test_localtime_race_exploit) demonstrates year corruption across 8 threads running 10K iterations each.Git history
All affected code in
src/util.cutraces to a single commit:00f5281This was a large commit that introduced the JSON output infrastructure, the
getFloatStr()helper, the line-truncation logic, and theformatNow()timestamp function. Thesprintf/strcpy/localtimepatterns were carried over from the kind of quick C idioms that are natural in a test harness but become a concern as the codebase grows.util.cuitself didn't exist before this commit — the file was extracted to support the new output framework.Note that
src/util.cuhas had 8 commits from 5 contributors since then. The unsafe patterns have persisted through all of those changes, suggesting they haven't been noticed rather than being a deliberate choice.Changes
Commit 1: Tests (TDD — intentionally FAIL before fix)
tests/c/test_util_safety.c— 11 test cases:test_sprintf_overflow_exploit(canary corruption),test_strcpy_overflow_exploit(canary corruption),test_localtime_race_exploit(thread corruption)Commit 2: Fixes
getFloatStr(): Addsizeparameter, replace 6xsprintf→snprintfwriteBenchmarkLinePreamble():sprintf→snprintfwithsizeof(rootName)strcpy→memcpy+ explicit null-terminationformatNow():localtime→localtime_rwith stack-allocatedstruct tmTest plan
make -C tests/c test— 11/11 passtest_sprintf_overflow_exploit— canary byte at buf[8] corrupted by sprintftest_strcpy_overflow_exploit— canary byte at buf[5] corrupted by strcpytest_localtime_race_exploit— year corruption across 8 concurrent threads🤖 Generated with Claude Code