Store first value in Dict directly in innerjoin #29

non-Jedi · 2020-08-28T03:36:17Z

I saw some of the discussion in
JuliaData/DataFrames.jl#2340 and got
curious about what was possible.

This avoids allocating a Vector for the case where l does not have
multiple indices with the same value. For the smoke-test benchmark in
JuliaData/DataFrames.jl#2340 (comment),
this reduces allocations by half and overall runtime by 10%.

Most of the allocations still come from this
line
which it's much less clear how to reduce allocations in. I'm not sure
how much JuliaLang/julia#24909 affects
performance in this case. One option would be to heuristically
estimate the size of out based on the size of l and r and call
sizehint! on it; this didn't seem to help in my testing.

I realize I'm only optimizing a single method of innerjoin, but I'm
not super familiar with this field nor with the inner workings of this
package, so I leave it to you to decide if this is a worthwhile
complication and if it's relevant elsewhere in the package.

This avoids allocating a Vector for the case where l does not have multiple indices with the same value. For the smoke-test benchmark in <JuliaData/DataFrames.jl#2340 (comment)>, this reduces allocations by half and overall runtime by 10%.

andyferris · 2021-11-09T00:12:50Z

Ah thanks.

Another strategy I tried a while back was to first assume the join keys are distinct and then bail to a more general implementation when that's not the case. It would be awesome to compare the performance vesus this.

(Sorry @non-Jedi I haven't been keeping track of PRs...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Store first value in Dict directly in innerjoin #29

Store first value in Dict directly in innerjoin #29

Uh oh!

non-Jedi commented Aug 28, 2020 •

edited

Loading

Uh oh!

andyferris commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Store first value in Dict directly in innerjoin #29

Are you sure you want to change the base?

Store first value in Dict directly in innerjoin #29

Uh oh!

Conversation

non-Jedi commented Aug 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andyferris commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

non-Jedi commented Aug 28, 2020 •

edited

Loading