-
Notifications
You must be signed in to change notification settings - Fork 1k
RANGER-5406: Support export policies in a segmented manner #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ranger-2.3
Are you sure you want to change the base?
RANGER-5406: Support export policies in a segmented manner #741
Conversation
|
@mneethiraj @kumaab |
|
Thank you @yunyezhang-work for the patch! please raise a PR for the |
| return ret; | ||
| } | ||
|
|
||
| private List<RangerPolicy> cutRangerPolicyList(List<RangerPolicy> policyList, SearchFilter filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested name: getRangerPoliciesInRange
| int startIndex = filter.getBeginIndex(); | ||
| int pageSize = filter.getOffsetIndex(); | ||
| int toIndex = Math.min(startIndex + pageSize, totalCount); | ||
| LOG.info("==>totalCount: " + totalCount + " startIndex: " + startIndex + " pageSize: " +pageSize + " toIndex: " + toIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid string concatenation, use String.format()
| LOG.info("Invalid or Unsupported sortType : " + sortType); | ||
| } | ||
| } else { | ||
| LOG.info("Invalid or Unsupported sortBy property : " + sortBy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid string concat, check all references.
See: https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+Java+Style+Guide
agents-common/src/main/java/org/apache/ranger/plugin/util/SearchFilter.java
Show resolved
Hide resolved
| public static final String UPDATE_TIME = "updateTime"; // sort | ||
| public static final String START_INDEX = "startIndex"; | ||
| public static final String BEGIN_INDEX = "beginIndex"; | ||
| public static final String OFFSET_INDEX = "offsetIndex"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think OFFSET is more meaning full than OFFSET_INDEX, offset is not index. What do you think ?
| private int startIndex; | ||
| private int maxRows = Integer.MAX_VALUE; | ||
| private int beginIndex = -1; | ||
| private int offsetIndex = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you've added new fields to the SearchFilter class, don't forget to modify the copy constructor (public SearchFilter(SearchFilter other)) accordingly to ensure the new attributes are properly copied.
| } | ||
|
|
||
| public void setBeginIndex(int beginIndex) { | ||
| this.beginIndex = beginIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should validate that beginIndex >= 0. What’s your opinion?
What changes were proposed in this pull request?
In big data production environments, customers create a massive number of policies, often reaching hundreds of thousands or even millions. Exporting the entire set of policies for disaster recovery would result in an enormous data volume and extremely slow import speeds into the backup cluster. Our current experimental data shows that importing 10,000 policies via the API is very memory-intensive and takes approximately 15 minutes. Importing 100,000 policies via the API will take 2.5h or even longer.
With an even larger number of policies, memory consumption will increase significantly, and insufficient memory can cause import interruptions. Therefore, we recommend modifying the API to allow for segmented export. This will save memory and ensure data reliability when importing to other clusters for disaster recovery.
How was this patch tested?
To manually test this feature, you can send an HTTP request to the ranger. Using a shell command as an example:
Without the segmentation parameter, calling the export API
getPoliciesInJsonwill export all policies. As shown in the figure, there are 18 policies in this environment for hdfs-xxx.curl -u$USER:$PASSWORD -XGET "http://$RANGER_HOST:$RANGER_PORT/service/plugins/policies/exportJson?serviceName=$SERVICE&checkPoliciesExists=true" -v -o export.jsonAdding the segmentation parameter will export the policies for the specified start and end position range. As shown in the figure, policies 1-5 of hdfs-xxx are exported.

curl -u$USER:$PASSWORD -XGET "http://$RANGER_HOST:$RANGER_PORT/service/plugins/policies/exportJson?serviceName=$SERVICE&checkPoliciesExists=true&beginIndex=$BEGIN_INDEX&offsetIndex=$OFFSET_INDEX" -v -o export_${BEGIN_INDEX}_${OFFSET_INDEX}.json