GitHub - sahil-time/distributed_systems

To compile and run "seq_consistency_poc.c"

gcc -Wall seq_consistency_poc.c -o seq_c_poc  -pthread

/*
CONCURRENT & PARALLEL PROGRAMMING REF:
    1. Dmitry Vyukov => https://www.1024cores.net/home
    2. Jeff Preshing => https://preshing.com/archives/
    3. Bartosz Milewski  => https://bartoszmilewski.com/2008/11/11/who-ordered-sequential-consistency/
    4. Maurice Herlihy => https://www.youtube.com/playlist?list=PLbsY-4I8oat9o7p4re3308L4uk0YJe8ez
    5. https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
    6. Martin Kleppman => https://www.youtube.com/watch?v=D5iCl12MuRw
    7. https://www.youtube.com/watch?v=YmhmsHE7be4&t=613s [ Sequential Consistency ]

- https://learn.microsoft.com/en-us/windows/win32/dxtecharts/lockless-programming?redirectedfrom=MSDN
- https://www.youtube.com/watch?v=5sZo3SrLrGA&t=2500s [ https://ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/ ]
- https://www.reddit.com/r/programming/comments/to192/memory_reordering_caught_in_the_act/
- https://www.youtube.com/watch?v=5sZo3SrLrGA&t=2500s
*/

/*
SEQUENTIAL CONSISTENCY

REFS:
    1. https://www.youtube.com/watch?v=YmhmsHE7be4&t=613s [ Sequential Consistency ]
    2. https://www.youtube.com/watch?v=D5iCl12MuRw
    3. https://github.com/CoffeeBeforeArch/CoffeeBeforeArch.github.io/blob/master/_posts/2020-11-29-hardware-memory-ordering.md

- Basically, once the execution of threads complete, we want to see if the thread-specific
    orders is a subset of the total-order of the execution. This is Sequential Consistency.
    Total Order is basically the LEGAL possible execution that satisfies what the threads see.
    For e.g.

    Thread1:        WRITE(X,1)
    Thread2:                    WRITE(X,2)  READ(X,1)
    Legal Total-Order:  WRITE(X,2)  WRITE(X,1)  READ(X,1)

    Obv because the ONLY way READ will see X as 1 is IF Thread1's WRITE is JUST BEFORE it.
    Also since the the WRITE(X,2) preceeds READ(X,1) this execution is Sequentially Consistent.

    HOW ABOUT below:

    Thread1:    WRITE(X,1)  READ(Y,0)
    Thread2:    WRITE(Y,1)  READ(X,0)

    Find a Total-Order for the above :D

    Legal Total Orders: READ(X,0), WRITE(X,1), READ(Y,0), WRITE(Y,1) OR
                        READ(X,0), READ(Y,0), WRITE(X,1), WRITE(Y,1) OR
                        READ(X,0), READ(Y,0), WRITE(Y,1), WRITE(X,1) OR
                        READ(Y,0), WRITE(Y,1), READ(X,0), WRITE(X,1) OR
                        READ(Y,0), READ(X,0), WRITE(X,1), WRITE(Y,1) OR
                        READ(Y,0), READ(X,0), WRITE(Y,1), WRITE(X,1)

    None of the above Legal Total Orders satisfy Sequential Consistency

- The memory reordering that we talk about is BECAUSE of the STORE buffers. Apparently
    CPU executes all instructions sequentially...BUT since STORE's are first LOCALLY
    Observable and ONLY later become GLOBALLY Observable due to the Buffers, the LOAD's
    "SEEM" to be executed BEFORE the STORE's from the Memory POV. BUT CPU executed them
    sequentially.

MIKE HAERTEL:

    I'm a CPU architect who has worked on x86 for both AMD and Intel in the past.

    Although the the original author's explanation ("x86/64 processors ... are allowed to reorder machine instructions") is technically correct, it is misleading. In particular, the observed apparent re-ordering has nothing to do with out-of-order execution in the CPU core. X86 processors take great pains to ensure that out-of-order execution is not observable from other CPUs interacting only through regular cacheable shared memory. (Out-of-order execution is observable using certain non-cacheable memory types.)

    What's really happening is this:

    Each x86 CPU (behaves as if it) executes its loads in order with respect to its other loads, and (as if it) executes its stores in order with respect to other stores. Each CPU also behaves as if it executes its loads in order with respect to its own stores.

    What really happens when a CPU "executes" a store is that it puts the store into a (per-CPU) queue of pending stores. The contents of this queue are not visible to other CPUs. Sometime later, contents of the queue are actually written, in order, to the coherent memory image shared by all CPUs.

    Load execution snoops the contents of the local CPU's store queue (but only those stores that are logically older than the load in program order). This snooping of its own store queue is how the CPU appears to execute its own loads in order with respect to its own stores.

    So logically speaking, every CPU is executing both its loads and its stores in order; the only reason anything appears to happen out of order is that store "execution" is not an atomic event: it happens in two stages, with the store becoming first locally and then later globally observable.

    So in fact x86 memory accesses are actually pretty strongly ordered.

    For example, suppose you have a "mutator" CPU that is continually incrementing a memory location, and an "observer" CPU that is continually loading from the same memory location.

    On x86, the observer will see a stream of load results that is monotonically non-decreasing.

    On a more weakly-ordered CPU architecture, such as ARM, the observer might actually see a stream of load results in which some pair of consecutive loads return decreasing values.

    I'm splitting hairs now, but I think it would be more accurate to characterize the original author's observed results as "transitory non-coherence" than "memory reordering".
*/

/*
MARTIN KLEPPMAN:
    https://www.youtube.com/watch?v=D5iCl12MuRw

*/

/*
-  Real time programming deal with determinism. Meaning we do NOT want any
   construct that can use indeterminate amount of time for e.g. a heap memory
   allocation since it is a system call. Or a mutex because it can cause priority
   inversion.
- https://moodycamel.com/blog/2013/a-fast-lock-free-queue-for-c++
- Lock free programming is a thread-safe paradigm where in case of contention the
  system will still advance as a whole.
  Wait free takes this a step further, where each thread can always advance regardless
  of what others are doing.


- The compiler is allowed to reorder LOAD's and STORE's as long as they appear to do
  the same thing on that thread. Now this is fine for most cases, but when another
  thread comes into play, this reordering can become apparent.

- Infact you CAN force the compiler to NOT reorder the instructions, but then the CPU
  can do it. Most modern CPU's do NOT obey sequential consistency.
*/

/*
 * This code is POC to show that CPU re-orders instructions in normal executions!
 * We are disabling Compiler re-ordering, so that instructions in memory are in
 * the order that we expect. However, CPU will re-order them while executing!
 *
 * We are NOT talking about race conditions here! Race conditions do NOT imply
 * re-ordering of instructions!
 *
 * Example used here:
 *
 *      thread1                  thread2
 * 1:   X = 1               3:   Y = 1
 * 2:   read Y              4:   read X
 *
 * Intuitively, we always assume that the instructions are Sequentially Consistent!
 * That means that "read Y" is ALWAYS executed after "X = 1". This seems like a given!
 * However we will see that this is NOT the case!
 *
 * If a system were Sequentially Consistent, what would be the possible values of X & Y
 * at the end of the execution?
 *
 * All possible Interleavings in a Sequentially Consistent System:
 * 
 * [ 1,2,3,4 ]      [ 1,3,2,4 ]     [ 1,3,4,2 ]     [ 3,1,2,4 ]     [ 3,4,1,2 ]     [ 3,1,4,2 ]
 * X = 1, Y = 0     X = 1, Y = 1    X = 1, Y = 1    X = 1, Y = 1    X = 0, Y = 1    X = 1, Y = 1
 *
 * In all the above Interleavings, 2 follows 1 and 4 follows 3!
 * 
 * As you see that there is NO case where X = 0 and Y = 0 in a Seq Consistent system!
 *
 * We will prove that the processors that we work on, like x86 or any commercial processor,
 * do NOT guarantee Sequential Consistency!
 *
 * The issue ONLY comes up if the threads execute on different CPU's. When we say that the
 * instructions are OUT OF ORDER, it is relativistic. On a single CPU, it is ALWAYS Sequentially
 * Consistent. But when you look at it from outside, like from the POV of memory maybe, you can
 * see that the LOAD happened before a STORE. Since the STORE is still in the Write Buffer.
 *
 * UT RESULTS:
 *     distributed_systems > ./seq_c_poc 
 *     Number of processors: 16
 *     1 reorders detected after 9754303 iterations
 *     2 reorders detected after 13322298 iterations
 * 
 * sched_getcpu() gives the CPU on which the current thread is running! If you set the thread-affinity
 * to the same CPU, it will by default become sequentially consistent!
 *
 */

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
seq_consistency_poc.c		seq_consistency_poc.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages