我正在尝试编写一个线程安全的已排序单链表.我写了两个版本:粗粒度同步和细粒度同步.以下是两种实现:
细粒度:
public void add(T t) { Node curr = head; curr.lock.lock(); while (curr.next != null) { // Invariant: curr is locked // Invariant: curr.data < t curr.next.lock.lock(); if (t.compareto(curr.next.data) <= 0) { break; } Node tmp = curr.next; curr.lock.unlock(); curr = tmp; } // curr is acquired curr.next = new Node(curr.next,t); if (curr.next.next != null) { // old curr's next is acquired curr.next.next.lock.unlock(); } curr.lock.unlock(); }
粗粒度:
public void add(T t) { lock.lock(); Node curr = head; while (curr.next != null) { if (t.compareto(curr.next.data) <= 0) { break; } curr = curr.next; } curr.next = new Node(curr.next,t); lock.unlock(); }
我将4个线程(在4个逻辑cpu核心上)的两个版本定时插入20000个整数.每个线程的时间显示cpu时间(即它不包括等待时间).
Fine grained: Worked 1 spent 1080 ms Worked 2 spent 1230 ms Worked 0 spent 1250 ms Worked 3 spent 1260 ms wall time: 1620 ms Coarse grained: Worked 1 spent 190 ms Worked 2 spent 270 ms Worked 3 spent 410 ms Worked 0 spent 280 ms wall time: 1298 ms
我最初的想法是.lock()和.unlock()是问题,但我分析了实现,他们一起只消耗了30%的时间.我的第二个猜测是,细粒度的解决方案有更多的缓存未命中,但我怀疑它,因为单个链表与数组不同,本质上容易出现缓存未命中.
知道为什么我没有得到预期的并行化吗?
解决方法
是的,这可能是由于缓存未命中.包含锁的缓存行在cpu之间不断反弹.
另外,请注意你已经获得了很多平行:
Fine grained: Worked 1 spent 1080 ms Worked 2 spent 1230 ms Worked 0 spent 1250 ms Worked 3 spent 1260 ms wall time: 1620 ms Coarse grained: Worked 1 spent 190 ms Worked 2 spent 270 ms Worked 3 spent 410 ms Worked 0 spent 280 ms wall time: 1298 ms
虽然每个单独的线程由于缓存未命中(以及增加的开销)而花费更多时间,但整个过程仅稍微慢一些.