Categories
Java Software

The Cost of Contention

Martin Thompson first reported on the cost of contention using a simple benchmark that measures the time to increment a 64-bit counter 500 million times using various strategies. Results were reported here (section 3.1) and here (Managing Contention vs. Doing Real Work).

I re-implemented this benchmark here.

import java.util.concurrent.BrokenBarrierException;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.CyclicBarrier;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public class TestIncrement {
public static void main(String[] args) throws InterruptedException, BrokenBarrierException {
runTest(new CounterBare(), "Single thread", 1);
runTest(new CounterWithVolatile(), "Single thread with volatile", 1);
runTest(new CounterWithCAS(), "Single thread with CAS", 1);
runTest(new CounterWithSynchronized(), "Single thread with synchronized", 1);
runTest(new CounterWithLock(), "Single thread with lock", 1);
runTest(new CounterWithCAS(), "Two threads with CAS", 2);
runTest(new CounterWithSynchronized(), "Two threads with synchronized", 2);
runTest(new CounterWithLock(), "Two threads with lock", 2);
}
private static void runTest(final Counter counter, final String name, final int threadCount)
throws InterruptedException, BrokenBarrierException {
final CountDownLatch endLatch = new CountDownLatch(threadCount);
final int warmUpIterations = 100_000;
final int iterations = 500_000_000;
final int perThreadIterations = iterations / threadCount;
final CyclicBarrier startBarrier = new CyclicBarrier(threadCount + 1);
for (int i = 0; i < threadCount; i++) {
@SuppressWarnings("Convert2Lambda")
Thread thread = new Thread(new Runnable() {
@Override
public void run() {
for (int a = 0; a < warmUpIterations; a++) {
runIterations(counter, 5);
}
for (int a = 0; a < 5; a++) {
runIterations(counter, warmUpIterations);
}
try {
startBarrier.await();
} catch (InterruptedException | BrokenBarrierException e) {
throw new IllegalStateException(e);
}
runIterations(counter, perThreadIterations);
endLatch.countDown();
}
});
thread.start();
}
startBarrier.await();
long startNanos = System.nanoTime();
endLatch.await();
long elapsedNanos = System.nanoTime() startNanos;
assert counter.getValue() == iterations + (threadCount * 10 * warmUpIterations);
System.out.printf("%40s: %,d ms%n", name, TimeUnit.NANOSECONDS.toMillis(elapsedNanos));
}
private static void runIterations(Counter c, int n) {
for (int j = 0; j < n; j++) {
c.increment();
}
}
private interface Counter {
void increment();
long getValue();
}
private static class CounterBare implements Counter {
private long value = 0;
@Override
public void increment() {
value++;
}
@Override
public long getValue() {
return value;
}
}
private static class CounterWithVolatile implements Counter {
private volatile long value = 0;
@SuppressWarnings("NonAtomicOperationOnVolatileField")
@Override
public void increment() {
value++;
}
@Override
public long getValue() {
return value;
}
}
private static class CounterWithCAS implements Counter {
private AtomicLong value = new AtomicLong(0);
@Override
public void increment() {
value.incrementAndGet();
}
@Override
public long getValue() {
return value.get();
}
}
private static class CounterWithSynchronized implements Counter {
private final Object lock = new Object();
private long value = 0;
@Override
public void increment() {
synchronized (lock) {
value++;
}
}
@Override
public long getValue() {
return value;
}
}
private static class CounterWithLock implements Counter {
private final Lock lock = new ReentrantLock();
private long value = 0;
@Override
public void increment() {
lock.lock();
try {
value++;
} finally {
lock.unlock();
}
}
@Override
public long getValue() {
return value;
}
}
}

view raw
TestIncrement.java
hosted with ❤ by GitHub

The results I observed (running on Java 9 with a 2017 MacBook Pro with a 2.9 GHz 7th Generation Kaby Lake Intel Core i7 processor) are comparable to those reported by Martin 7 years ago.

Method Time (ms)
Kaby Lake, Java 10
Time (ms)
Westmere
Single thread 70 300
Single thread with volatile 2,700 4,700
Single thread with CAS 3,500 5,700
Single thread with synchronized 2,000
Single thread with lock 9,300 10,000
Two threads with CAS 10,800 18,000
Two threads with synchronized 22,400
Two threads with lock 52,500 118,000

While this micro-benchmark is not representative of real-world workloads (as explained here), tempted by its simplicity I plan to use it as the first benchmark to track optimizations to the air-java concurrency library. This would be followed up by a more comprehensive benchmark like this one, which measure both latency  and throughput under various configurations, and finally a real-world application.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.