-
Notifications
You must be signed in to change notification settings - Fork 96
Remove readDirStreamWithPtr; replace readdir_r() with readdir() #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Yeah, the Not sure what thread-safety guarantees directory streams in the Perhaps it's easier to use |
getRealDirType, | ||
unsafeOpenDirStreamFd, | ||
readDirStreamWith, | ||
readDirStreamWithPtr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exported from System.Posix.Directory.Internals
, but the module says "no PVP guarantees", so it's fine. Although maybe we can shim it for time being with readDirStreamWithPtr _ = readDirStreamWith
?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I reintroduced it, it would be with a deprecation warning stating it's implemented like that. Happy to do that—let me know.
foreign import ccall unsafe "__hscore_free_dirent" | ||
c_freeDirEnt :: Ptr CDirent -> IO () | ||
|
||
foreign import ccall unsafe "readdir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this works on mac? We've had horrible bugs before when not using capi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this works on mac?
It passed CI so I assume so. 32-bit ARM failed though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing that to capi
fails locally with
/tmp/ghc233912_0/ghc_160.c:7:102: error:
error: unknown type name ‘DIR’
7 | struct dirent* ghczuwrapperZC0ZCunixzm2zi8zi7zi0zminplaceZCSystemziPosixziDirectoryziCommonZCreaddir(DIR* a1) {return readdir(a1);}
|
Busy with work so not looking into it right now.
This replaces all use of
readdir_r()
withreaddir()
, and removesSystem.Posix.Directory.Internals.readDirStreamWithPtr
as a consequence. The rationale is 4-fold: safety, portability, performance, and maintainability.Safety
According to the
readdir_r(3)
manpage, it's unsafe because it's impossible to compute the maximum filename length, hence the size of the buffer to be allocated. Currently we callpathconf(".", _PC_NAME_MAX)
whenNAME_MAX
isn't defined. However, that's the longest filename a process is allowed to create, not the longest than can already be there. Thefpathconf(3)
manpage explicitly statesEven if
NAME_MAX
is defined, the glibc manual statesThe
readdir_r(3)
manpage goes on to say that glibc may fail withENAMETOOLONG
, but implies there's no way to detect that until the final dirent is read. On other systems,d_name
may be truncated, or worse, not null-terminated.The original reason for
readdir_r()
was to provide a reentrant version ofreaddir()
. However, "in modern implementations"readdir()
is thread-safe, except for concurrent readers to the sameDIR *
stream. In that specific case it says the user should do their own synchronization. This PR adds a warning to that effect. It's expected (by whom? I don't know) that a future POSIX.1 version will makereaddir()
thread-safe (including to the sameDIR *
, I guess?).Portability
readdir_r()
is deprecated by glibc for the reasons listed above.Performance
I compiled directory-ospath-streaming against unix before and after these changes and executed a simple benchmark counting files recursively under my home directory (about 500K). (Here is the change in the streaming library for the after benchmark.)
readDirStreamWithPtr
readdir_r()
readDirStreamWith
readdir()
That 1% improvement was oddly consistent for me, even though tasty-bench claimed ±15ms of error. What's important here is that it isn't worse. This makes sense. You're not supposed to free the pointer returned by
readdir()
because it "may be statically allocated". My guess is a buffer is allocated byopendir()
or the first call toreaddir()
, and is freed when the kernel cleans up the file descriptor. If byopendir()
, then it's a sunk cost—either way, a user isn't going to do any better by attempting to manage their ownstruct dirent
(notwithstanding the safety issues).For
readdir_r()
, Haskell adds another small cost by allocating thePtr (Ptr CDirent)
out param.Maintainability
When you glance at
readDirStreamWith
versusreadDirStreamWithPtr
and their types, you might think: "ThePtr
one is more low-level for when I want to manage memory myself for extra performance/control; the other one probably does that for me each time, trading control for simplicity." Indeed, currentlyreadDirStreamWith
does do that, and is defined in terms ofreadDirStreamWithPtr
.That intuition is understandable. But the thing being allocated is simply the extra bookkeeping for reentrancy required by
readdir_r()
. Currently the note says,This is not true. The pointer to that pointer is freed, but the
Ptr CDirent
itself is not. It's moot anyway since this PR reimplementsreadDirStreamWith
in terms ofreaddir()
. The note is replaced accordingly.Also note that the above change to directory-ospath-streaming means the "cache" mechanism there can likely be removed. It's only there for the
Ptr (Ptr CDirent)
argument. I'm curious if this leads to more small performance gains.What I'm getting at in terms of code maintainability is
readdir()
iscc @sergv