-
-
Notifications
You must be signed in to change notification settings - Fork 1
Another batch of insight fixes #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
af452cf
dup files fixes
trevor-e 033ea4f
fix
trevor-e 420ccb6
fix
trevor-e c8be8dd
refactor
trevor-e b069b88
tests
trevor-e 06191b0
fix
trevor-e ce0ad49
clean up
trevor-e 81be538
fix duplicate file analysis
trevor-e 49ffe49
fix
trevor-e e483be1
fix
trevor-e cd4c23d
fix
trevor-e File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 27 additions & 16 deletions
43
src/launchpad/size/insights/apple/main_binary_export_metadata.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,31 +1,42 @@ | ||
| from launchpad.size.insights.insight import Insight, InsightsInput | ||
| from launchpad.size.models.apple import MachOBinaryAnalysis, MainBinaryExportMetadataResult | ||
| from launchpad.size.models.insights import FileSavingsResult | ||
|
|
||
|
|
||
| class MainBinaryExportMetadataInsight(Insight[MainBinaryExportMetadataResult]): | ||
| """Insight for analyzing the exported symbols metadata in the main binary.""" | ||
| """Insight for analyzing the exported symbols metadata in all main binaries.""" | ||
|
|
||
| MIN_EXPORTS_THRESHOLD = 1024 | ||
|
|
||
| def generate(self, input: InsightsInput) -> MainBinaryExportMetadataResult | None: | ||
| """Generate insight for main binary exported symbols analysis.""" | ||
| """Generate insight for all main binary exported symbols analysis.""" | ||
|
|
||
| main_binary_analysis = None | ||
| export_files: list[FileSavingsResult] = [] | ||
|
|
||
| # Analyze all main binaries (main app, app extensions, watch apps) | ||
| for analysis in input.binary_analysis: | ||
| if isinstance(analysis, MachOBinaryAnalysis) and analysis.is_main_binary: | ||
| main_binary_analysis = analysis | ||
| break | ||
|
|
||
| if not main_binary_analysis or not main_binary_analysis.binary_analysis: | ||
| if not analysis.binary_analysis: | ||
| continue | ||
|
|
||
| # Look for dyld_exports_trie component in this main binary | ||
| for component in analysis.binary_analysis.components: | ||
| if component.name == "dyld_exports_trie": | ||
| if component.size >= self.MIN_EXPORTS_THRESHOLD: | ||
| export_files.append( | ||
| FileSavingsResult( | ||
| file_path=analysis.binary_relative_path, | ||
| total_savings=component.size, | ||
| ) | ||
| ) | ||
| break | ||
|
|
||
| if not export_files: | ||
| return None | ||
|
|
||
| dyld_exports_trie_component = None | ||
| for component in main_binary_analysis.binary_analysis.components: | ||
| if component.name == "dyld_exports_trie": | ||
| dyld_exports_trie_component = component | ||
| break | ||
|
|
||
| if not dyld_exports_trie_component or dyld_exports_trie_component.size < self.MIN_EXPORTS_THRESHOLD: | ||
| return None | ||
| total_savings = sum(file.total_savings for file in export_files) | ||
|
|
||
| return MainBinaryExportMetadataResult(total_savings=dyld_exports_trie_component.size) | ||
| return MainBinaryExportMetadataResult( | ||
| total_savings=total_savings, | ||
| files=export_files, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,45 +1,126 @@ | ||
| import hashlib | ||
| import os | ||
|
|
||
| from collections import defaultdict | ||
| from pathlib import Path | ||
| from typing import Dict, List | ||
|
|
||
| from launchpad.size.insights.insight import Insight, InsightsInput | ||
| from launchpad.size.models.common import FileInfo | ||
| from launchpad.size.models.insights import DuplicateFileGroup, DuplicateFilesInsightResult | ||
| from launchpad.size.models.common import FileInfo, TreemapType | ||
| from launchpad.size.models.insights import ( | ||
| DuplicateFileGroup, | ||
| DuplicateFilesInsightResult, | ||
| ) | ||
|
|
||
|
|
||
| class DuplicateFilesInsight(Insight[DuplicateFilesInsightResult]): | ||
| def generate(self, input: InsightsInput) -> DuplicateFilesInsightResult: | ||
| files_by_hash: Dict[str, List[FileInfo]] = defaultdict(list) | ||
| for file in input.file_analysis.files: | ||
| if file.hash_md5: | ||
| files_by_hash[file.hash_md5].append(file) | ||
| EXTENSION_ALLOWLIST = [".xcprivacy"] | ||
|
|
||
| # Make sure to group all duplicates in a directory that has one of these extensions | ||
| DIRECTORY_EXTENSIONS = [".bundle"] | ||
|
|
||
| def generate(self, input: InsightsInput) -> DuplicateFilesInsightResult | None: | ||
| groups: List[DuplicateFileGroup] = [] | ||
| total_savings = 0 | ||
|
|
||
| for file_list in files_by_hash.values(): | ||
| if len(file_list) > 1: | ||
| # Calculate savings: total size - size of one copy we keep | ||
| total_file_size = sum(f.size for f in file_list) | ||
| savings_for_this_group = total_file_size - file_list[0].size | ||
|
|
||
| if savings_for_this_group > 0: # Only include if there are actual savings | ||
| sorted_files = sorted(file_list, key=lambda f: (-f.size, f.path)) | ||
| filenames = sorted(set(os.path.basename(f.path) for f in sorted_files)) | ||
| group_filename = filenames[0] | ||
|
|
||
| group = DuplicateFileGroup( | ||
| filename=group_filename, | ||
| files=sorted_files, | ||
| total_savings=savings_for_this_group, | ||
| ) | ||
| groups.append(group) | ||
| total_savings += savings_for_this_group | ||
|
|
||
| groups = sorted(groups, key=lambda g: (-g.total_savings, g.filename)) | ||
|
|
||
| return DuplicateFilesInsightResult( | ||
| groups=groups, | ||
| total_savings=total_savings, | ||
| ) | ||
| covered_containers: set[str] = set() | ||
| for infos in self._duplicate_directories(input.file_analysis.files).values(): | ||
| if len(infos) < 2: | ||
| continue | ||
|
|
||
| infos.sort(key=lambda f: (-f.size, f.path)) | ||
| group_size = sum(fi.size for fi in infos) | ||
| savings = group_size - infos[0].size | ||
| if savings <= 0: | ||
| continue | ||
|
|
||
| groups.append( | ||
| DuplicateFileGroup( | ||
| filename=os.path.basename(infos[0].path), | ||
| files=infos, | ||
| total_savings=savings, | ||
| ) | ||
| ) | ||
| total_savings += savings | ||
| for info in infos: | ||
| covered_containers.add(info.path) | ||
|
|
||
| files_by_hash: Dict[str, List[FileInfo]] = defaultdict(list) | ||
| for f in input.file_analysis.files: | ||
| if ( | ||
| f.hash_md5 | ||
| and not self._is_allowed_extension(f.path) | ||
| and not any(f.path.startswith(c + "/") or f.path == c for c in covered_containers) # ← NEW GUARD | ||
| ): | ||
| files_by_hash[f.hash_md5].append(f) | ||
|
|
||
| for dup_files in files_by_hash.values(): | ||
| if len(dup_files) < 2: | ||
| continue | ||
|
|
||
| dup_files.sort(key=lambda f: (-f.size, f.path)) | ||
| savings = sum(f.size for f in dup_files) - dup_files[0].size | ||
| if savings <= 0: | ||
| continue | ||
|
|
||
| container = self._directory_grouping(dup_files[0].path) | ||
| name = os.path.basename(container) if container else os.path.basename(dup_files[0].path) | ||
|
|
||
| groups.append( | ||
| DuplicateFileGroup( | ||
| filename=name, | ||
| files=dup_files, | ||
| total_savings=savings, | ||
| ) | ||
| ) | ||
| total_savings += savings | ||
|
|
||
| groups.sort(key=lambda g: (-g.total_savings, g.filename)) | ||
|
|
||
| if len(groups) > 0: | ||
| return DuplicateFilesInsightResult(groups=groups, total_savings=total_savings) | ||
|
|
||
| return None | ||
|
|
||
| def _is_allowed_extension(self, file_path: str) -> bool: | ||
| return any(file_path.endswith(ext) for ext in self.EXTENSION_ALLOWLIST) | ||
|
|
||
| def _directory_grouping(self, file_path: str) -> str | None: | ||
| p = Path(file_path) | ||
| for i, part in enumerate(p.parts): | ||
| if any(part.endswith(ext) for ext in self.DIRECTORY_EXTENSIONS): | ||
| return str(Path(*p.parts[: i + 1])) | ||
| return None | ||
|
|
||
| def _duplicate_directories(self, files: List[FileInfo]) -> Dict[str, List[FileInfo]]: | ||
| dir_to_children: Dict[str, List[FileInfo]] = defaultdict(list) | ||
| for f in files: | ||
| if f.hash_md5: | ||
| root = self._directory_grouping(f.path) | ||
| if root: | ||
| dir_to_children[root].append(f) | ||
|
|
||
| dup_dirs: Dict[str, List[FileInfo]] = defaultdict(list) | ||
| for root, children in dir_to_children.items(): | ||
| if not children: | ||
| continue | ||
|
|
||
| md5 = hashlib.md5() | ||
| for h in sorted(c.hash_md5 for c in children if c.hash_md5): | ||
| md5.update(h.encode()) | ||
| folder_hash = md5.hexdigest() | ||
|
|
||
| dup_dirs[folder_hash].append( | ||
| FileInfo( | ||
| full_path=( | ||
| children[0].full_path.parent / root if children[0].full_path is not None else Path(root) | ||
| ), | ||
| path=root, | ||
| size=sum(c.size for c in children), | ||
| file_type="directory", | ||
| hash_md5=folder_hash, | ||
| treemap_type=TreemapType.FILES, | ||
| children=children, | ||
| ) | ||
| ) | ||
| return dup_dirs | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.