Azul's concurrent GC is much more clever. They use some tricks with memory mapping in order to perform really efficient read barriers (which traditional VMs have considered too costly). This allows compaction to happen while mutator threads proceed.
That is an awesome GC and I wish I could have it. AFAIK they are having trouble because they mmap/munmap a lot, and virtual memory systems on modern OS don't do that efficiently enough.
Yes, they currently depend on kernel patches to provide bulk map/unmap APIs as well as some other vm_area trickery to allow parts of physical memory to be mapped to multiple places in virtual memory with different protection flags.
Could anyone with better understanding of GC comment?