Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions istesting.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
//go:build go1.21

package locker

import "testing"

func isTesting() bool {
return testing.Testing()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like importing testing like this, though admittedly I'd not seen this function before.
Are we expecting users of this library to have the different behavior under test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some package-level bool that's set as part of tests in this repository (so both "testing" to be true, and the bool to be set?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admittedly I'd not seen this function before.

It's new as of go1.21. The proposal was https://go.dev/issue/52600

Are we expecting users of this library to have the different behavior under test?

If this library is free of bugs there will be no observed difference in behaviour in testing vs. application code. But if this library does contain latent bugs, I figure this strikes the best compromise between stability in production and discoverability of bugs.

There are a few ways to go about defending against lock counter / concurrency bugs:

  1. Defensively reset the structs unconditionally. Heisenbugs in production may never get caught, or could get misattributed to the application. This is closest to the v1 behaviour.
  2. Unconditionally assert the invariants. Bugs will get noticed and correctly attributed to the library, but at the risk of programs panicking in production.
  3. Defensively reset the structs in applications, assert the invariants in test binaries. Test coverage of the library expands to the test suites of our users, without causing any grief in production.
  4. Assert the invariants in moby/locker CI; defensive resetting everywhere else, including importer test suites. Only marginally better than option 1 as bugs will only be caught if we get really lucky in a CI run of the moby/locker tests.

I would rather go with option 2 if we can't come to a consensus on 3. Better to fail fast and definitively than send application maintainers on a wild goose chase for "impossible" unbounded concurrency as a consequence of bugs in moby/locker.

Perhaps some package-level bool that's set as part of tests in this repository (so both "testing" to be true, and the bool to be set?)

There's no point to call testing.Testing() in _test.go files as execution of code in those files already implies that a go test program is being executed.

}
7 changes: 7 additions & 0 deletions istesting_go120.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
//go:build !go1.21

package locker

func isTesting() bool {
return false
}
42 changes: 37 additions & 5 deletions mutexmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ free up more global locks to handle other tasks.
package locker

import (
"strconv"
"sync"
"sync/atomic"
)
Expand Down Expand Up @@ -37,13 +38,29 @@ func (l *MutexMap[T]) Lock(key T) {
if !exists {
nameLock = lockCtrPool.Get().(*lockCtr)
l.locks[key] = nameLock

if isTesting() {
if !nameLock.TryLock() {
panic("MutexMap: lock taken from pool is already locked")
}
nameLock.Unlock()
if w := nameLock.waiters.Load(); w != 0 {
panic("MutexMap: lock taken from pool has nonzero waiters counter" +
" (waiters=" + strconv.Itoa(int(w)) + ")")
}
}
// Defensive coding: force the waiters counter to a known value
// just in case the lockCtr we got from the pool was recycled
// and not properly reset.
nameLock.waiters.Store(1)
} else {
// Increment the nameLock waiters while inside the main mutex.
// This makes sure that the lock isn't deleted if `Lock` and
// `Unlock` are called concurrently.
nameLock.waiters.Add(1)
}

// Increment the nameLock waiters while inside the main mutex.
// This makes sure that the lock isn't deleted if `Lock` and `Unlock` are called concurrently.
nameLock.waiters.Add(1)
l.mu.Unlock()

// Lock the nameLock outside the main mutex so we don't block other operations.
// Once locked then we can decrement the number of waiters for this lock.
nameLock.Lock()
Expand All @@ -62,8 +79,23 @@ func (l *MutexMap[T]) Unlock(key T) {
(&sync.Mutex{}).Unlock()
}

if nameLock.waiters.Load() <= 0 {
if w := nameLock.waiters.Load(); w <= 0 {
delete(l.locks, key)

// Putting a lock back into the pool when the waiters counter is
// nonzero would result in some really nasty, subtle
// misbehaviour -- which would only manifest when pool items get
// reused, i.e. in situations with lots of lock churn. The
// counters should be 0 when the lock is deleted from the map
// and returned to the pool.
if isTesting() && w < 0 {
panic("MutexMap: lock unlocked with negative counter" +
" (waiters=" + strconv.Itoa(int(w)) + ")")
}
// But given the consequences of that invariant being violated,
// zero out the counter just in case to minimize the impact of
// such bugs.
nameLock.waiters.Store(0)
defer lockCtrPool.Put(nameLock)
}
nameLock.Unlock()
Expand Down
41 changes: 38 additions & 3 deletions rwmutexmap.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package locker

import (
"strconv"
"sync"
"sync/atomic"
)
Expand All @@ -21,8 +22,42 @@ type rwlockCtr struct {
readers atomic.Int32 // Number of readers currently holding the lock
}

// assertReset asserts that l is zeroed out. Under testing it panics if the
// assertions are false. Otherwise it zeroes out the struct to minimize the
// impact of whichever bugs caused the invariant to be violated.
//
// This function returns l.
func (l *rwlockCtr) assertReset() *rwlockCtr {
// Putting a lock back into the pool when the counters are nonzero would
// result in some really nasty, subtle misbehaviour -- which would only
// manifest when pool items get reused, i.e. in situations with lots of
// lock churn. The counters should be 0 when the lock is deleted from
// the map and returned to the pool.
if isTesting() {
if !l.TryLock() {
panic("RWMutexMap: lock is unexpectedly locked")
}
l.Unlock()
r, w := l.readers.Load(), l.waiters.Load()
if r != 0 || w != 0 {
panic("RWMutexMap: lock has nonzero counters " +
"(readers=" + strconv.Itoa(int(r)) +
", waiters=" + strconv.Itoa(int(w)) + ")")
}
}
// But given the consequences of that invariant being violated, zero out
// the counters just in case to minimize the impact of such bugs.
l.waiters.Store(0)
l.readers.Store(0)
return l
}

var rwlockCtrPool = sync.Pool{New: func() any { return new(rwlockCtr) }}

func rwPoolPut(nameLock *rwlockCtr) {
rwlockCtrPool.Put(nameLock.assertReset())
}

func (l *RWMutexMap[T]) get(key T) *rwlockCtr {
if l.locks == nil {
l.locks = make(map[T]*rwlockCtr)
Expand All @@ -31,7 +66,7 @@ func (l *RWMutexMap[T]) get(key T) *rwlockCtr {
nameLock, exists := l.locks[key]
if !exists {
nameLock = rwlockCtrPool.Get().(*rwlockCtr)
l.locks[key] = nameLock
l.locks[key] = nameLock.assertReset()
}
return nameLock
}
Expand Down Expand Up @@ -80,7 +115,7 @@ func (l *RWMutexMap[T]) Unlock(key T) {

if nameLock.waiters.Load() <= 0 && nameLock.readers.Load() <= 0 {
delete(l.locks, key)
defer rwlockCtrPool.Put(nameLock)
defer rwPoolPut(nameLock)
}
nameLock.Unlock()
}
Expand All @@ -96,7 +131,7 @@ func (l *RWMutexMap[T]) RUnlock(key T) {

if nameLock.waiters.Load() <= 0 && nameLock.readers.Load() <= 0 {
delete(l.locks, key)
defer rwlockCtrPool.Put(nameLock)
defer rwPoolPut(nameLock)
}
nameLock.RUnlock()
}
Expand Down
Loading