c++ - Why is std::mutex so slow on OSX? -

- June 15, 2012

i have following benchmark: https://gist.github.com/leifwalsh/10010580

essentially spins k threads , each thread 16 million / k lock/increment/unlock cycles, using spinlock , std::mutex. on osx, std::mutex devastatingly slower spinlock when contended, whereas on linux it's competitive or bit faster.

osx:

spinlock 1:     334ms spinlock 2:     3537ms spinlock 3:     4815ms spinlock 4:     5653ms std::mutex 1:   813ms std::mutex 2:   38464ms std::mutex 3:   44254ms std::mutex 4:   47418ms

linux:

spinlock 1:     305ms spinlock 2:     1590ms spinlock 3:     1820ms spinlock 4:     2300ms std::mutex 1:   377ms std::mutex 2:   1124ms std::mutex 3:   1739ms std::mutex 4:   2668ms

the processors different, not that different (osx intel(r) core(tm) i7-2677m cpu @ 1.80ghz, linux intel(r) core(tm) i5-2500k cpu @ 3.30ghz), seems library or kernel problem. know source of slowness?

to clarify question, understand "there different mutex implementations optimize different things , isn't problem, it's expected". question is: actual differences in implementation cause this? or, if it's hardware issue (maybe cache lot slower on macbook), that's acceptable too.

you're measuring library's choice of trading off throughput fairness. benchmark heavily artificial , penalizes attempt provide fairness @ all.

the implementation can 2 things. can let same thread mutex twice in row, or can change thread gets mutex. benchmark heavily penalizes change in threads because context switch takes time , because ping-ponging mutex , val cache cache takes time.

most likely, showing different trade-offs implementations have make. heavily rewards implementations prefer give mutex thread last held it. benchmark rewards implementations waste cpu that! rewards implementations waste cpu avoid context switches, when there's other useful work cpu do! doesn't penalize implementation inter-core traffic can slow down other unrelated threads.

also, people implement mutexes presume performance in uncontended case more important performance in contended case. there numerous tradeoffs can make between these cases, such presuming there might thread waiting or checking if there is. benchmark tests (or @ least, only) case typically traded off in favor of case presumed more common.

bluntly, senseless benchmark incapable of identifying problem.

the specific explanation linux implementation spinlock/futex hybrid while osx implementation conventional, equivalent locking kernel object. spinlock portion of linux implementation favors allowing same thread released mutex lock again, benchmark heavily rewards.

Search This Blog

XPATH

c++ - Why is std::mutex so slow on OSX? -

Comments

Post a Comment

Popular posts from this blog

Change the color of an oval at click in Java AWT -

c# - MSBuild\12.0\bin\Microsoft.Common.CurrentVersion.targets(3243,9): error MSB4094 -

I am trying to solve the error message 'incompatible ranks 0 and 1 in assignment' in a fortran 95 program. -