Does require custom hardware or kernel cooperation for speed (e.g. it needs to do batched MMU operations without clearing the TBL on each 2MB page). Looks like it's got a better read barrier than the Pauseless one; that does of course cost extra on stock hardware.
Join 4,500+ subscribers and get the best books mentioned on Hacker News every Thursday.