Skip to content

UBSAN reports writing to misaligned address. #279

@Please-just-dont

Description

@Please-just-dont

Describe the bug

When I construct a string with a char*, the string is initialised like this:

void init(string_view other) noexcept(false) 
{
// "other" is the string_view I passed in the constructor
        sz_ptr_t start; // after allocating memory start = 0x7bfff5900029
        if (!_with_alloc(
                [&](sz_alloc_type &alloc) { return (start = sz_string_init_length(&string_, other.size(), &alloc)); }))

            throw std::bad_alloc();
        sz_copy(start, (sz_cptr_t)other.data(), other.size());
    }

The result of the allocation is at address 0x7b......29, which seems strange. Then the first copy to "target" here:


#if SZ_USE_MISALIGNED_LOADS
    while (length >= 8) *(sz_u64_t *)target = *(sz_u64_t const *)source, target += 8, source += 8, length -= 8;
#endif

results in the UBSAN warning:

runtime error: store to misaligned address 0x7bfff5900029 for type 'sz_u64_t' (aka 'unsigned long'), which requires 8 byte alignment SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior

There are a number of things I don't understand about this.

  1. I'm on an x86_64 platform, unaligned reads and stores should be supported according to the comment in strinzilla.h:

/**

  • @brief A misaligned load can be - trying to fetch eight consecutive bytes from an address
  •      that is not divisible by eight. On x86 enabled by default. On ARM it's not.
    
  • Most platforms support it, but there is no industry standard way to check for those.
  • This value will mostly affect the performance of the serial (SWAR) backend.
    */
    #ifndef SZ_USE_MISALIGNED_LOADS
    #if defined(x86_64) || defined(_M_X64) || defined(i386) || defined(_M_IX86)
    #define SZ_USE_MISALIGNED_LOADS (1) // true or false
  1. This custom copy function exists because memcpy(NULL) is UB, and so this proceeds to copy 8 bytes at a time, or 1 byte at a time in a loop. Wouldn't it just be more efficient to check for null and memcpy? Memcpy performs better than looping manually.

Steps to reproduce

Constructing a string as static global variable, initialised by the dynamic initialisation process, passing a string literal (const char*) to constructor of Stringzilla string class.

Expected behavior

???

StringZilla version

3.12.6

Operating System

Linux Mint

Hardware architecture

x86

Which interface are you using?

C++ bindings

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Edit: I have posted a question on Stack Overflow here . Apparently it is undefined behaviour. Apparently the address is 0x.....9 because that's where the SSO buffer starts? Wouldn't it just be easier to use memcpy?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions