-
Couldn't load subscription status.
- Fork 144
optimize createFpAndTrim with shrinkMutableByteArray# #576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -138,7 +138,7 @@ import Data.Word | |
| import Data.Data (Data(..), mkNoRepType) | ||
|
|
||
| import GHC.Base (nullAddr#,realWorld#,unsafeChr) | ||
| import GHC.Exts (IsList(..)) | ||
| import GHC.Exts (IsList(..), shrinkMutableByteArray#) | ||
| import GHC.CString (unpackCString#) | ||
| import GHC.Exts (Addr#, minusAddr#) | ||
|
|
||
|
|
@@ -160,7 +160,8 @@ import GHC.ForeignPtr (ForeignPtr(ForeignPtr) | |
| #if __GLASGOW_HASKELL__ < 900 | ||
| , newForeignPtr_ | ||
| #endif | ||
| , mallocPlainForeignPtrBytes) | ||
|
|
||
| , mallocPlainForeignPtrBytes, ForeignPtrContents (PlainPtr)) | ||
|
|
||
| #if MIN_VERSION_base(4,10,0) | ||
| import GHC.ForeignPtr (plusForeignPtr) | ||
|
|
@@ -183,6 +184,7 @@ import GHC.ForeignPtr (unsafeWithForeignPtr) | |
|
|
||
| import qualified Language.Haskell.TH.Lib as TH | ||
| import qualified Language.Haskell.TH.Syntax as TH | ||
| import Data.Functor (($>)) | ||
|
|
||
| #if !MIN_VERSION_base(4,15,0) | ||
| unsafeWithForeignPtr :: ForeignPtr a -> (Ptr a -> IO b) -> IO b | ||
|
|
@@ -641,10 +643,13 @@ createFpUptoN' l action = do | |
| -- | ||
| createFpAndTrim :: Int -> (ForeignPtr Word8 -> IO Int) -> IO ByteString | ||
| createFpAndTrim l action = do | ||
| fp <- mallocByteString l | ||
| l' <- action fp | ||
| if assert (0 <= l' && l' <= l) $ l' >= l | ||
| then return $! BS fp l | ||
| fp <- mallocByteString l | ||
| l' <- action fp | ||
| if assert (0 <= l' && l' <= l) $ l' >= l | ||
| then return $! BS fp l | ||
| else | ||
| if l < 4096 | ||
| then shrinkFp fp l' $> BS fp l' | ||
| else createFp l' $ \fp' -> memcpyFp fp' fp l' | ||
| {-# INLINE createFpAndTrim #-} | ||
|
|
||
|
|
@@ -1023,6 +1028,12 @@ memcpy p q s = void $ c_memcpy p q (fromIntegral s) | |
| memcpyFp :: ForeignPtr Word8 -> ForeignPtr Word8 -> Int -> IO () | ||
| memcpyFp fp fq s = unsafeWithForeignPtr fp $ \p -> | ||
| unsafeWithForeignPtr fq $ \q -> memcpy p q s | ||
|
|
||
| shrinkFp :: ForeignPtr Word8 -> Int -> IO () | ||
| shrinkFp (ForeignPtr _ (PlainPtr marr)) (I# l#) = | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd write |
||
| IO $ \s1# -> case shrinkMutableByteArray# marr l# s1# of | ||
| s2# -> (# s2#, () #) | ||
| shrinkFp _ _ = error "Must be PlainPtr" | ||
|
|
||
| {- | ||
| foreign import ccall unsafe "string.h memmove" c_memmove | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is the main tricky bit.
MutableByteArray#s do not return their memory to the RTS when we shrink them in-place. But they do return their memory to the RTS when we make a copy.MutableByteArray#s may well retain all of their memory regardless of whether we shrink in place or create a copy, because of how they are currently allocated in GHC. (There are probably several GHC issues related to this topic. See for example this one and this one. It's not clear if or when this will be improved.)So we want to make a copy when:
If I recall correctly, the exact threshold for when a buffer is allocated as a large heap object depends on both the platform and the GHC version, but has been somewhere around 3K for 64-bit systems and half that for 32-bit systems for some time now.
(This reasoning should be documented in the code.)