Cache-line contention makes RwLock five times slower than Mutex

Redstone's author found that on commercial hardware an RwLock made a read-heavy Tensor Cache about 5× slower than a Mutex due to cache-line ping-pong and atomic contention.

Hobby Community