硬件级别的缓存对程序的影响

三级缓存

JVM-合并写(write combining)

当cpu修改了某值后会把数据线存入L1中，这个时候可能没有命中，则会一往下查找，会写入到L2中，此时，由于往L2中写的时候需要大量的时间，同时这个变量还可能继续被修改，此时会用到合并写的技术，所谓合并写就是把两次的写的结果一次进行写出。
这里会用到合并写的地址空间(WCBuffer)，在64位的操作系统中一共4个字节，也就是如果我们的修改在4个字节以内，则会使用到合并写的技术，如果超过4个字节，则无法使用到合写的红利。

final class WriteCombining {

   private static final int ITERATIONS = Integer.MAX_VALUE;
   private static final int ITEMS = 1 << 24;
   private static final int MASK = ITEMS - 1;

   private static final byte[] arrayA = new byte[ITEMS];
   private static final byte[] arrayB = new byte[ITEMS];
   private static final byte[] arrayC = new byte[ITEMS];
   private static final byte[] arrayD = new byte[ITEMS];
   private static final byte[] arrayE = new byte[ITEMS];
   private static final byte[] arrayF = new byte[ITEMS];

   public static void main(final String[] args) {

       for (int i = 1; i <= 3; i++) {
           System.out.println(i + " SingleLoop duration (ns) = " + runCaseOne()/10000);
           System.out.println(i + " SplitLoop  duration (ns) = " + runCaseTwo()/10000);
       }
   }

   public static long runCaseOne() {
       long start = System.nanoTime();
       int i = ITERATIONS;

       while (--i != 0) {
           int slot = i & MASK;
           byte b = (byte) i;
           arrayA[slot] = b;
           arrayB[slot] = b;
           arrayC[slot] = b;
           arrayD[slot] = b;
           arrayE[slot] = b;
           arrayF[slot] = b;
       }
       return System.nanoTime() - start;
   }

   public static long runCaseTwo() {
       long start = System.nanoTime();
       int i = ITERATIONS;
       while (--i != 0) {
           int slot = i & MASK;
           byte b = (byte) i;
           arrayA[slot] = b;
           arrayB[slot] = b;
           arrayC[slot] = b;
       }
       i = ITERATIONS;
       while (--i != 0) {
           int slot = i & MASK;
           byte b = (byte) i;
           arrayD[slot] = b;
           arrayE[slot] = b;
           arrayF[slot] = b;
       }
       return System.nanoTime() - start;
   }

1 2	1 SingleLoop duration (ns) = 428328 1 SplitLoop duration (ns) = 415759

runCaseTwo

runCaseTwo写的过程是一次四个字节写入，注意b占一个字节

runCaseOne

runCaseOne写的过程是第一次循环下来写入四个字节，然后将剩余的三个字节放入到wc缓存中，等待第二次while的进入，第二次循环进入后在分割出一个字节放入wc中写入告诉缓存，一次类推，所以它比较慢。

JVM-Cache Line、缓存对齐、伪共享

由于寄存器的速度是非常快的，是内存的100被，是硬盘的10的六次方倍。
所以cpu读取数据，吸纳从寄存器中读取，如果无，则一次L1\L2\L3读取。

cache line 缓存行

那么系统读取数据是需要什么读取什么吗？当然是的，但是由于缓存行的存在，他会都去更多的数据。比如读取一个int类型4个字节的数据，他会把这四个字节后面的60个字节都读进去，即每次读64个字节的数据。这就是缓存行cache line

缓存行

场景：
在多核cpu读取数据的时候：

core1 读取了x=1的数据，不好意思，由于cache line的存在，他需要把后面的y=2联通后面的数据一读进core1中；
core2 读取了y=2的数据，不好意思，由于cache line的存在，他需要把后面的x=1联通后面的数据一读进core2中；
此时core1 对x=1做了运算是的x=11
此时core2 对y=2做了运算是的x=22

下你在在core1和core2中的数据如下：

伪对齐

我们发现出现了数据不一致的问题。关于数据不一致问题，这就是伪共享的问题。

缓存对齐可以提高效率

知道了cache line的存在，我们在写代码的时候可以利用缓存对齐（就是每次64字节）的方式去提供效率，真的可以吗？
如果每次不是64个字节，需要等待到了64个字节才会写入缓存哦！中间有等待的时间，是的效率下降。

public class One {
    private static class T {
        public volatile long x = 0L;
    }

    //含有两个Long类型的数据，每个8个字节，大概率会加载到同一个cpu中。
    public static T[] arr = new T[2];

    static {
        arr[0] = new T();
        arr[1] = new T();
    }

    public static void main(String[] args) throws Exception {
        //线程t1修改第一个long 在core1中 由于一致性的存在，需要一个机制做一致性，影响效率
        Thread t1 = new Thread(()->{
            for (long i = 0; i < 1000_0000L; i++) {
                arr[0].x = i;
            }
        });

        //线程t2修改第一个long 在core2中 由于一致性的存在，需要一个机制做一致性，影响效率
        Thread t2 = new Thread(()->{
            for (long i = 0; i < 1000_0000L; i++) {
                arr[1].x = i;
            }
        });

        final long start = System.nanoTime();
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println((System.nanoTime() - start)/100_0000);
    }
}

缓存对齐时效率高，不存在伪共享

public class Two {
    private static class Padding {
        public volatile long p1, p2, p3, p4, p5, p6, p7;
    }

    //继承Padding ,内衣金含有p1, p2, p3, p4, p5, p6, p7 56个字节的数据了，再加上 long x 大概路占用了一个缓存行，
    //线程2 按到T的时候前面也有p1, p2, p3, p4, p5, p6, p7 56个字节的数据了，后面才是long  x 所以 一定不存在为共享的问题，也就不存在某机制保证
    //一致性的问题了。效率会高
    private static class T extends Padding {
        public volatile long x = 0L;
    }

    public static T[] arr = new T[2];

    static {
        arr[0] = new T();
        arr[1] = new T();
    }

    public static void main(String[] args) throws Exception {
        Thread t1 = new Thread(()->{
            for (long i = 0; i < 1000_0000L; i++) {
                arr[0].x = i;
            }
        });

        Thread t2 = new Thread(()->{
            for (long i = 0; i < 1000_0000L; i++) {
                arr[1].x = i;
            }
        });

        final long start = System.nanoTime();
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println((System.nanoTime() - start)/100_0000);
    }
}

伪共享问题引出数据一致性问题

数据一致性问题解决方案
针对一致性的问题有两种解决方案：

总线锁:在L3和L2直接加锁，拿到锁才能处理下面工作。
一致性协议MESI（缓存一致性协议)