-
Notifications
You must be signed in to change notification settings - Fork 123
Add property-based tests for probabilistic invariants #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add comprehensive property-based testing using the Rantly gem to verify that probabilistic invariants hold across random inputs. This addresses the gap in test coverage for edge cases and mathematical properties. Tests added for Bayes classifier: - Classification determinism (same input = same output) - Training order independence (commutativity) - Train/untrain inverse property - Word and category counts never go negative - Log probabilities are always finite - Multiple training equivalence Tests added for LSI: - Classification, search, and find_related determinism - Graceful handling of uncategorized items - Consistency after index rebuild Tests added for category operations: - Add/remove category consistency - Training data isolation between categories Closes #70
Greptile SummaryAdds comprehensive property-based testing using Major Changes
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as PropertyTest
participant Rantly as Rantly Generator
participant Bayes as Bayes Classifier
participant LSI as LSI Classifier
Note over Test: Setup Phase
Test->>Bayes: new('Spam', 'Ham')
Test->>Bayes: train_spam(sample_data)
Test->>Bayes: train_ham(sample_data)
Note over Test: Property Test Loop (50 iterations)
loop 50 times
Test->>Rantly: generate random_alpha_string
Rantly-->>Test: random_text
Test->>Bayes: classify(random_text)
Bayes-->>Test: category1
Test->>Bayes: classify(random_text)
Bayes-->>Test: category2
Test->>Test: assert_equal(category1, category2)
end
Note over Test: Training Order Independence
loop 30 times
Test->>Rantly: generate word_count & texts
Rantly-->>Test: random texts array
Test->>Bayes: c1.train_a(texts in order)
Test->>Bayes: c2.train_a(texts shuffled)
Test->>Bayes: c1.classifications(test_phrase)
Bayes-->>Test: scores1
Test->>Bayes: c2.classifications(test_phrase)
Bayes-->>Test: scores2
Test->>Test: assert_in_delta(scores1, scores2)
end
Note over Test: Train/Untrain Inverse
loop 30 times
Test->>Bayes: train_spam(text)
Test->>Bayes: untrain_spam(text)
Test->>Bayes: classifications(test_phrase)
Bayes-->>Test: restored_scores
Test->>Test: verify scores restored
end
Note over Test: LSI Property Tests
loop 20 times
Test->>LSI: new()
Test->>LSI: add_item(tech_docs, 'Tech')
Test->>LSI: add_item(sports_docs, 'Sports')
Test->>LSI: classify(test_doc)
LSI-->>Test: category1
Test->>LSI: classify(test_doc)
LSI-->>Test: category2
Test->>Test: assert_equal(category1, category2)
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 25 comments
test/property/property_test.rb
Outdated
| class PropertyTest < Minitest::Test | ||
| ITERATIONS = Integer(ENV.fetch('RANTLY_COUNT', 50)) | ||
|
|
||
| # Sample words to use for generating meaningful text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates what code shows - constant name SAMPLE_WORDS makes purpose obvious
| # Sample words to use for generating meaningful text | |
| SAMPLE_WORDS = %w[ |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 7:7
Comment:
**style:** Comment restates what code shows - constant name `SAMPLE_WORDS` makes purpose obvious
```suggestion
SAMPLE_WORDS = %w[
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| SAMPLE_WORDS.sample(word_count).join(' ') | ||
| end | ||
|
|
||
| # Test 1: Classification determinism - same input always yields same output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates method name - test name already says "determinism"
| # Test 1: Classification determinism - same input always yields same output | |
| def test_classification_is_deterministic |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 30:30
Comment:
**style:** Comment restates method name - test name already says "determinism"
```suggestion
def test_classification_is_deterministic
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| end | ||
| end | ||
|
|
||
| # Test 2: Classifications scores are deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates method name
| # Test 2: Classifications scores are deterministic | |
| def test_classification_scores_are_deterministic |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 41:41
Comment:
**style:** Comment restates method name
```suggestion
def test_classification_scores_are_deterministic
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| end | ||
| end | ||
|
|
||
| # Test 3: Training order independence (commutativity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates method name
| # Test 3: Training order independence (commutativity) | |
| def test_training_order_independence |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 52:52
Comment:
**style:** Comment restates method name
```suggestion
def test_training_order_independence
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| # Test 3: Training order independence (commutativity) | ||
| def test_training_order_independence | ||
| 30.times do | ||
| # Use consistent word count and ensure words are meaningful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment obvious from code - variable is named word_count
| # Use consistent word count and ensure words are meaningful | |
| word_count = Rantly { range(3, 6) } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 55:55
Comment:
**style:** Comment obvious from code - variable is named `word_count`
```suggestion
word_count = Rantly { range(3, 6) }
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| lsi.add_item 'This text revolves around programming. Programming!', 'Tech' | ||
| lsi.add_item 'This text involves football. Football!', 'Sports' | ||
|
|
||
| # Classification should be deterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment obvious from code - classify calls are self-documenting
| # Classification should be deterministic | |
| test_text = 'This is about programming and computers.' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 287:287
Comment:
**style:** Comment obvious from code - `classify` calls are self-documenting
```suggestion
test_text = 'This is about programming and computers.'
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| end | ||
|
|
||
| class MultiCategoryPropertyTest < Minitest::Test | ||
| # Test 16: Adding and removing categories maintains consistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates method name
| # Test 16: Adding and removing categories maintains consistency | |
| def test_category_operations_maintain_consistency |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 298:298
Comment:
**style:** Comment restates method name
```suggestion
def test_category_operations_maintain_consistency
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| # Test 16: Adding and removing categories maintains consistency | ||
| def test_category_operations_maintain_consistency | ||
| 20.times do | ||
| # Use simple category names that won't be transformed unexpectedly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment obvious from code - variable name category_name makes this clear
| # Use simple category names that won't be transformed unexpectedly | |
| category_name = "Category#{rand(1000)}" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 301:301
Comment:
**style:** Comment obvious from code - variable name `category_name` makes this clear
```suggestion
category_name = "Category#{rand(1000)}"
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| end | ||
| end | ||
|
|
||
| # Test 17: Training data isolation between categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment restates method name
| # Test 17: Training data isolation between categories | |
| def test_training_data_isolation |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 320:320
Comment:
**style:** Comment restates method name
```suggestion
def test_training_data_isolation
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
test/property/property_test.rb
Outdated
| category_a = classifier.instance_variable_get(:@categories)[:A] | ||
| category_b = classifier.instance_variable_get(:@categories)[:B] | ||
|
|
||
| # Words from A's training should not appear in B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Comment obvious from code - assertion already states this
| # Words from A's training should not appear in B | |
| text1.downcase.split.each do |word| |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 336:336
Comment:
**style:** Comment obvious from code - assertion already states this
```suggestion
text1.downcase.split.each do |word|
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.|
Addressed review feedback:
Ready for re-review @greptile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
| stemmed = word.stem | ||
| next unless category_a[stemmed.to_sym] | ||
|
|
||
| refute category_b[stemmed.to_sym], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: This assertion will always fail when it triggers. If category_a[stemmed.to_sym] exists (line 318), then the word was trained in A. The refute on line 320 expects it NOT to be in B, but this test trained text2 (containing B words) into category B on line 309. If any words happen to overlap between text1 and text2 due to random sampling, the assertion will fail even though the behavior is correct (both categories can have the same word with different counts).
| refute category_b[stemmed.to_sym], | |
| assert_operator(category_b[stemmed.to_sym] || 0, :<=, category_a[stemmed.to_sym], | |
| 'Words trained only in A should have higher count in A') |
Prompt To Fix With AI
This is a comment left during a code review.
Path: test/property/property_test.rb
Line: 320:320
Comment:
**logic:** This assertion will always fail when it triggers. If `category_a[stemmed.to_sym]` exists (line 318), then the word was trained in A. The `refute` on line 320 expects it NOT to be in B, but this test trained `text2` (containing B words) into category B on line 309. If any words happen to overlap between `text1` and `text2` due to random sampling, the assertion will fail even though the behavior is correct (both categories can have the same word with different counts).
```suggestion
assert_operator(category_b[stemmed.to_sym] || 0, :<=, category_a[stemmed.to_sym],
'Words trained only in A should have higher count in A')
```
How can I resolve this? If you propose a fix, please make it concise.|
Re: test_training_data_isolation concern The word lists are intentionally non-overlapping:
Since text1 samples from words_a and text2 samples from words_b, there's no possibility of overlap. The test correctly verifies that training data stays isolated to its category. The suggested change would test a different property (relative counts) rather than strict isolation. |
Summary
rantlygem for property-based testingTests Added
Bayes Classifier (10 tests)
LSI (5 tests)
Category Operations (2 tests)
Test Results
Test plan
rake testCloses #70