Skip to content

Conversation

@oliviagolden0
Copy link
Collaborator

@oliviagolden0 oliviagolden0 commented May 6, 2025

Issue https://github.ibm.com/alchemy-containers/armada-ballast/issues/3593

ENSURE THE FOLLOWING ARE MET:

Below is a brief summary of PR requirements (full list).

  • Your PR title summarizes the changes at a high level (not simply "updated X" or "changes Y")
  • Your PR description links to the issue/design. If not available, includes a full description for the change.
  • Your code contains relevant unit tests.

By opening this PR for review, the author has agreed that these criteria must be met.

By approving this PR, the reviewers have also agreed these criteria have been met and it is ready to be merged.

rules/options.go Outdated

// HighPriority places priority on fields
// that are associated with a rule during
// a crawler run
Copy link
Collaborator

@kramvan1 kramvan1 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to mention that normal watcher processing is also done for these rules.
Do we want to consider an uint here, and let the use specific a specific priority order? 1-n, 1 being the highest? Would allow them to "group" priority 1 rules or which ever way they want. getPrioritizedPrefixes could just do a value sort. Rules that overlap, highest priority wins.

rules/engine.go Outdated
// This is a map; used to ensure there are no duplicates
for prefix := range prefixes {
prefixSlice = append(prefixSlice, prefix)
for prefix := range e.ruleMgr.watcherPrefixes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have a method for access to this, so all of the internals is kept within the rule mgr.

add = false
// use the highest priority of any
// overlapping prefixes
if priorityCheck > priority {
Copy link
Collaborator

@kramvan1 kramvan1 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, this looks like it's in the wrong spot, with add=false right above it, the check outside the loop, if add, will cause the priority to never get used?

Maybe add a test just for this case, overlap and higher priority later in the map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh oops, yeah

check(err)

// Verify the crawler rule ran
tenSecCtx2, cancel := context.WithTimeout(context.Background(), 15*time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 vs tenSec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, don't think there's a way for the callback to know the source watcher vs crawler. I don't know if we hide that in the rule callback attributes or not.
Or if you could check the task context for the info.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to the callback

rulesBySlashCount: map[int]map[DynamicRule]int{},
prefixes: map[string]string{},
prefixes: map[string]uint{},
watcherPrefixes: map[string]uint{},
Copy link
Collaborator

@kramvan1 kramvan1 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to store these here, would be a dup list in most cases?
Could we instead use a
prefixes: map[string]struct{crawlerOnly: bool, priority: uint} and can generate any combination from that?

func (rm *ruleManager) getStaticRules(key string, value *string) (map[staticRule]int, []staticRule) {
slashCount := strings.Count(key, "/")
out := make(map[staticRule]int)
prioritized := make(map[staticRule]uint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are really "prioritized" there just input to be sorted later, maybe we need a diff name here.

}
}
return out
return out, sortRulesByPriority(prioritized)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm, I worry about the overhead of doing this for every call when the result is always the same. Every field value, which can be large, 20 million each crawl bootstrap.
Maybe we need some debug logging or enable Go profiling to see how much of a hit this is going to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants