Martin Thompson first reported on the cost of contention using a simple benchmark that measures the time to increment a 64-bit counter 500 million times using various strategies. Results were reported here (section 3.1) and here (Managing Contention vs. Doing Real Work).
I re-implemented this benchmark here.
The results I observed (running on Java 9 with a 2017 MacBook Pro with a 2.9 GHz 7th Generation Kaby Lake Intel Core i7 processor) are comparable to those reported by Martin 7 years ago.
Kaby Lake, Java 10
|Single thread with volatile||2,700||4,700|
|Single thread with CAS||3,500||5,700|
|Single thread with synchronized||2,000|
|Single thread with lock||9,300||10,000|
|Two threads with CAS||10,800||18,000|
|Two threads with synchronized||22,400|
|Two threads with lock||52,500||118,000|
While this micro-benchmark is not representative of real-world workloads (as explained here), tempted by its simplicity I plan to use it as the first benchmark to track optimizations to the air-java concurrency library. This would be followed up by a more comprehensive benchmark like this one, which measure both latency and throughput under various configurations, and finally a real-world application.