-
Notifications
You must be signed in to change notification settings - Fork 302
Optimize the nameMap of ZipFile #378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
7a3e348
4019b26
77585b4
c07b7df
eec6ab5
71c3641
d0e5365
b7cb0f0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,7 @@ | |
| import java.nio.file.Files; | ||
| import java.nio.file.Path; | ||
| import java.nio.file.StandardOpenOption; | ||
| import java.util.ArrayList; | ||
| import java.util.Arrays; | ||
| import java.util.Collections; | ||
| import java.util.Comparator; | ||
|
|
@@ -368,9 +369,14 @@ public static void closeQuietly(final ZipFile zipFile) { | |
| private final List<ZipArchiveEntry> entries = new LinkedList<>(); | ||
|
|
||
| /** | ||
| * Maps String to list of ZipArchiveEntrys, name -> actual entries. | ||
| * Maps a string to the first entry by that name. | ||
| */ | ||
| private final Map<String, LinkedList<ZipArchiveEntry>> nameMap = new HashMap<>(HASH_SIZE); | ||
| private final Map<String, ZipArchiveEntry> nameMap = new HashMap<>(HASH_SIZE); | ||
|
|
||
| /** | ||
| * If multiple entries have the same name, maps the name to entries by that name. | ||
| */ | ||
| private Map<String, List<ZipArchiveEntry>> duplicateNameMap; | ||
|
|
||
| /** | ||
| * The encoding to use for file names and the file comment. | ||
|
|
@@ -749,8 +755,22 @@ private void fillNameMap() { | |
| // entries are filled in populateFromCentralDirectory and | ||
| // never modified | ||
| final String name = ze.getName(); | ||
| final LinkedList<ZipArchiveEntry> entriesOfThatName = nameMap.computeIfAbsent(name, k -> new LinkedList<>()); | ||
| entriesOfThatName.addLast(ze); | ||
| final ZipArchiveEntry firstEntry = nameMap.putIfAbsent(name, ze); | ||
|
|
||
| if (firstEntry != null) { | ||
| if (duplicateNameMap == null) { | ||
| duplicateNameMap = new HashMap<>(); | ||
| } | ||
|
|
||
| final List<ZipArchiveEntry> entriesOfThatName = duplicateNameMap.computeIfAbsent(name, k -> { | ||
| // Create a list when there are two entries with the same name | ||
| final ArrayList<ZipArchiveEntry> list = new ArrayList<>(2); | ||
Glavo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| list.add(firstEntry); | ||
| return list; | ||
| }); | ||
|
|
||
| entriesOfThatName.add(ze); | ||
| } | ||
| }); | ||
| } | ||
|
|
||
|
|
@@ -820,7 +840,13 @@ public Enumeration<ZipArchiveEntry> getEntries() { | |
| * @since 1.6 | ||
| */ | ||
| public Iterable<ZipArchiveEntry> getEntries(final String name) { | ||
| return nameMap.getOrDefault(name, ZipArchiveEntry.EMPTY_LINKED_LIST); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mentioned a potential 30% slowdown in this method in return for additional safety, which may or may not be a fair tradeoff. But I'm a bit concerned that this change potentially breaks the current behaviour. As you say, this currently returns a modifiable LinkedList, which some people may (though I hope not) depend on? Because it requires an explicit cast, I'm not too worried.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the slowdown is only in case there are duplicit entries which is extremely unlikely. I'd still simplify the code, as according to earlier comment. |
||
| final List<ZipArchiveEntry> entriesOfThatName = duplicateNameMap == null ? null : duplicateNameMap.get(name); | ||
| if (entriesOfThatName == null) { | ||
| final ZipArchiveEntry entry = nameMap.get(name); | ||
| return entry != null ? Collections.singletonList(entry) : Collections.emptyList(); | ||
| } else { | ||
| return Collections.unmodifiableList(entriesOfThatName); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -847,8 +873,16 @@ public Enumeration<ZipArchiveEntry> getEntriesInPhysicalOrder() { | |
| * @since 1.6 | ||
| */ | ||
| public Iterable<ZipArchiveEntry> getEntriesInPhysicalOrder(final String name) { | ||
| final LinkedList<ZipArchiveEntry> linkedList = nameMap.getOrDefault(name, ZipArchiveEntry.EMPTY_LINKED_LIST); | ||
| return Arrays.asList(sortByOffset(linkedList.toArray(ZipArchiveEntry.EMPTY_ARRAY))); | ||
| if (duplicateNameMap != null) { | ||
| final List<ZipArchiveEntry> list = duplicateNameMap.get(name); | ||
| if (list != null) { | ||
| final ZipArchiveEntry[] entriesOfThatName = list.toArray(ZipArchiveEntry.EMPTY_ARRAY); | ||
| return Arrays.asList(sortByOffset(entriesOfThatName)); | ||
| } | ||
| } | ||
|
|
||
| final ZipArchiveEntry entry = nameMap.get(name); | ||
| return entry != null ? Collections.singletonList(entry) : Collections.emptyList(); | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -862,8 +896,7 @@ public Iterable<ZipArchiveEntry> getEntriesInPhysicalOrder(final String name) { | |
| * @return the ZipArchiveEntry corresponding to the given name - or {@code null} if not present. | ||
| */ | ||
| public ZipArchiveEntry getEntry(final String name) { | ||
| final LinkedList<ZipArchiveEntry> entries = nameMap.get(name); | ||
| return entries != null ? entries.getFirst() : null; | ||
| return nameMap.get(name); | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is extremely unlikely, would it make sense to keep the original Map and change the value to Object and let it hold either ZipArchiveEntry or the List? It would eventually complicate the lookup but that should be a single place only.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I REALLY don't like that. When you look at the pack200 code, which is written at the pre-generics Java version level, and some of its lists, you can't tell if some of the lists are buggy, sloppy, or trying to be clever. That or the lack of generics let's you get away with not having fully thought through the hierarchy of types you put in those lists.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, such optimizations typically come at cost of readability and safety. However, keeping the amount of code touching the data structures limited may help - we'll need only three methods -
first-by-name,all-by-nameandadd-entry. And document the field to not touch except via those three methods :-)Just a suggestion if it comes to true optimization, given this PR is purely about improving time and space performance. Note that the
getEntries(name)is not slower only because of wrapping into aCollectionbut also because of performing double lookup.