Add property-based tests for probabilistic invariants #99

cardmagic · 2025-12-28T23:17:38Z

Summary

Add rantly gem for property-based testing
Create 17 property-based tests verifying probabilistic invariants hold across random inputs
Tests cover Bayes classifier, LSI, and category operations

Tests Added

Bayes Classifier (10 tests)

Classification determinism (same input = same output)
Classification scores determinism
Training order independence (commutativity)
Train/untrain inverse property
Word counts never go negative
Category counts consistency
Valid category returns
All categories in scores hash
Finite log probabilities (no NaN/Infinity)
Multiple training equivalence

LSI (5 tests)

Classification determinism
find_related determinism
search determinism
Uncategorized items handling
Rebuild consistency

Category Operations (2 tests)

Add/remove category consistency
Training data isolation

Test Results

216 runs, 1607 assertions, 0 failures, 0 errors, 0 skips
Line Coverage: 96.31%

Test plan

All existing tests pass
New property tests pass with pure Ruby backend
Tests are included in rake test

Closes #70

Add comprehensive property-based testing using the Rantly gem to verify that probabilistic invariants hold across random inputs. This addresses the gap in test coverage for edge cases and mathematical properties. Tests added for Bayes classifier: - Classification determinism (same input = same output) - Training order independence (commutativity) - Train/untrain inverse property - Word and category counts never go negative - Log probabilities are always finite - Multiple training equivalence Tests added for LSI: - Classification, search, and find_related determinism - Graceful handling of uncategorized items - Consistency after index rebuild Tests added for category operations: - Add/remove category consistency - Training data isolation between categories Closes #70

greptile-apps · 2025-12-28T23:25:18Z

Greptile Summary

Adds comprehensive property-based testing using rantly gem to verify probabilistic invariants across random inputs. Creates 17 tests covering Bayes classifier (10 tests), LSI (5 tests), and category operations (2 tests). Tests verify critical invariants like classification determinism, training order independence, train/untrain inverse property, and data isolation. Also drops Ruby 3.2 from CI test matrix.

Major Changes

Added test/property/property_test.rb with 17 property-based tests (325 lines)
Tests run 50 iterations by default (configurable via RANTLY_COUNT env var)
Verifies Bayes invariants: determinism, commutativity, word count non-negativity, finite log probabilities
Verifies LSI invariants: determinism for classification/search/find_related, rebuild consistency
Verifies category operations: add/remove consistency, training data isolation
Removed Ruby 3.2 from CI test matrix in .github/workflows/ruby.yml
Added rantly gem to Gemfile test group

Confidence Score: 5/5

This PR is safe to merge with no issues identified
Score reflects well-structured property-based tests with sound logic, appropriate test coverage (17 tests verifying key invariants), clean dependency management, and passing CI. One logic comment was left but later withdrawn after deeper analysis confirmed the test logic is correct. The tests add significant value by verifying probabilistic invariants that unit tests cannot easily cover.
No files require special attention

Important Files Changed

Filename	Overview
test/property/property_test.rb	New property-based test file with 17 comprehensive tests verifying probabilistic invariants - well structured, logically sound
.github/workflows/ruby.yml	Removed Ruby 3.2 from CI test matrix - clean infrastructure change
Gemfile	Added `rantly` gem to test group for property-based testing

Sequence Diagram

sequenceDiagram
    participant Test as PropertyTest
    participant Rantly as Rantly Generator
    participant Bayes as Bayes Classifier
    participant LSI as LSI Classifier
    
    Note over Test: Setup Phase
    Test->>Bayes: new('Spam', 'Ham')
    Test->>Bayes: train_spam(sample_data)
    Test->>Bayes: train_ham(sample_data)
    
    Note over Test: Property Test Loop (50 iterations)
    loop 50 times
        Test->>Rantly: generate random_alpha_string
        Rantly-->>Test: random_text
        Test->>Bayes: classify(random_text)
        Bayes-->>Test: category1
        Test->>Bayes: classify(random_text)
        Bayes-->>Test: category2
        Test->>Test: assert_equal(category1, category2)
    end
    
    Note over Test: Training Order Independence
    loop 30 times
        Test->>Rantly: generate word_count & texts
        Rantly-->>Test: random texts array
        Test->>Bayes: c1.train_a(texts in order)
        Test->>Bayes: c2.train_a(texts shuffled)
        Test->>Bayes: c1.classifications(test_phrase)
        Bayes-->>Test: scores1
        Test->>Bayes: c2.classifications(test_phrase)
        Bayes-->>Test: scores2
        Test->>Test: assert_in_delta(scores1, scores2)
    end
    
    Note over Test: Train/Untrain Inverse
    loop 30 times
        Test->>Bayes: train_spam(text)
        Test->>Bayes: untrain_spam(text)
        Test->>Bayes: classifications(test_phrase)
        Bayes-->>Test: restored_scores
        Test->>Test: verify scores restored
    end
    
    Note over Test: LSI Property Tests
    loop 20 times
        Test->>LSI: new()
        Test->>LSI: add_item(tech_docs, 'Tech')
        Test->>LSI: add_item(sports_docs, 'Sports')
        Test->>LSI: classify(test_doc)
        LSI-->>Test: category1
        Test->>LSI: classify(test_doc)
        LSI-->>Test: category2
        Test->>Test: assert_equal(category1, category2)
    end

greptile-apps

_{2 files reviewed, 25 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-12-28T23:25:05Z

test/property/property_test.rb

+class PropertyTest < Minitest::Test
+  ITERATIONS = Integer(ENV.fetch('RANTLY_COUNT', 50))
+
+  # Sample words to use for generating meaningful text


style: Comment restates what code shows - constant name SAMPLE_WORDS makes purpose obvious

Suggested change

# Sample words to use for generating meaningful text

SAMPLE_WORDS = %w[

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 7:7 Comment: **style:** Comment restates what code shows - constant name `SAMPLE_WORDS` makes purpose obvious ```suggestion SAMPLE_WORDS = %w[ ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:06Z

test/property/property_test.rb

+    SAMPLE_WORDS.sample(word_count).join(' ')
+  end
+
+  # Test 1: Classification determinism - same input always yields same output


style: Comment restates method name - test name already says "determinism"

Suggested change

# Test 1: Classification determinism - same input always yields same output

def test_classification_is_deterministic

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 30:30 Comment: **style:** Comment restates method name - test name already says "determinism" ```suggestion def test_classification_is_deterministic ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:06Z

test/property/property_test.rb

+    end
+  end
+
+  # Test 2: Classifications scores are deterministic


style: Comment restates method name

Suggested change

# Test 2: Classifications scores are deterministic

def test_classification_scores_are_deterministic

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 41:41 Comment: **style:** Comment restates method name ```suggestion def test_classification_scores_are_deterministic ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:07Z

test/property/property_test.rb

+    end
+  end
+
+  # Test 3: Training order independence (commutativity)


style: Comment restates method name

Suggested change

# Test 3: Training order independence (commutativity)

def test_training_order_independence

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 52:52 Comment: **style:** Comment restates method name ```suggestion def test_training_order_independence ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:07Z

test/property/property_test.rb

+  # Test 3: Training order independence (commutativity)
+  def test_training_order_independence
+    30.times do
+      # Use consistent word count and ensure words are meaningful


style: Comment obvious from code - variable is named word_count

Suggested change

# Use consistent word count and ensure words are meaningful

word_count = Rantly { range(3, 6) }

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 55:55 Comment: **style:** Comment obvious from code - variable is named `word_count` ```suggestion word_count = Rantly { range(3, 6) } ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:16Z

test/property/property_test.rb

+      lsi.add_item 'This text revolves around programming. Programming!', 'Tech'
+      lsi.add_item 'This text involves football. Football!', 'Sports'
+
+      # Classification should be deterministic


style: Comment obvious from code - classify calls are self-documenting

Suggested change

# Classification should be deterministic

test_text = 'This is about programming and computers.'

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 287:287 Comment: **style:** Comment obvious from code - `classify` calls are self-documenting ```suggestion test_text = 'This is about programming and computers.' ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:16Z

test/property/property_test.rb

+end
+
+class MultiCategoryPropertyTest < Minitest::Test
+  # Test 16: Adding and removing categories maintains consistency


style: Comment restates method name

Suggested change

# Test 16: Adding and removing categories maintains consistency

def test_category_operations_maintain_consistency

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 298:298 Comment: **style:** Comment restates method name ```suggestion def test_category_operations_maintain_consistency ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:17Z

test/property/property_test.rb

+  # Test 16: Adding and removing categories maintains consistency
+  def test_category_operations_maintain_consistency
+    20.times do
+      # Use simple category names that won't be transformed unexpectedly


style: Comment obvious from code - variable name category_name makes this clear

Suggested change

# Use simple category names that won't be transformed unexpectedly

category_name = "Category#{rand(1000)}"

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 301:301 Comment: **style:** Comment obvious from code - variable name `category_name` makes this clear ```suggestion category_name = "Category#{rand(1000)}" ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:17Z

test/property/property_test.rb

+    end
+  end
+
+  # Test 17: Training data isolation between categories


style: Comment restates method name

Suggested change

# Test 17: Training data isolation between categories

def test_training_data_isolation

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 320:320 Comment: **style:** Comment restates method name ```suggestion def test_training_data_isolation ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2025-12-28T23:25:18Z

test/property/property_test.rb

+      category_a = classifier.instance_variable_get(:@categories)[:A]
+      category_b = classifier.instance_variable_get(:@categories)[:B]
+
+      # Words from A's training should not appear in B


style: Comment obvious from code - assertion already states this

Suggested change

# Words from A's training should not appear in B

text1.downcase.split.each do |word|

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 336:336 Comment: **style:** Comment obvious from code - assertion already states this ```suggestion text1.downcase.split.each do |word| ``` Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! How can I resolve this? If you propose a fix, please make it concise.

cardmagic · 2025-12-29T00:05:42Z

Addressed review feedback:

Fixed RuboCop offenses (gem ordering, string quotes, empty lines before assertions)
Removed redundant comments that restated method names

Ready for re-review @greptile

cardmagic · 2025-12-29T00:07:19Z

@greptile

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-12-29T00:10:06Z

test/property/property_test.rb

+        stemmed = word.stem
+        next unless category_a[stemmed.to_sym]
+
+        refute category_b[stemmed.to_sym],


logic: This assertion will always fail when it triggers. If category_a[stemmed.to_sym] exists (line 318), then the word was trained in A. The refute on line 320 expects it NOT to be in B, but this test trained text2 (containing B words) into category B on line 309. If any words happen to overlap between text1 and text2 due to random sampling, the assertion will fail even though the behavior is correct (both categories can have the same word with different counts).

Suggested change

refute category_b[stemmed.to_sym],

assert_operator(category_b[stemmed.to_sym] || 0, :<=, category_a[stemmed.to_sym],

'Words trained only in A should have higher count in A')

Prompt To Fix With AI

This is a comment left during a code review. Path: test/property/property_test.rb Line: 320:320 Comment: **logic:** This assertion will always fail when it triggers. If `category_a[stemmed.to_sym]` exists (line 318), then the word was trained in A. The `refute` on line 320 expects it NOT to be in B, but this test trained `text2` (containing B words) into category B on line 309. If any words happen to overlap between `text1` and `text2` due to random sampling, the assertion will fail even though the behavior is correct (both categories can have the same word with different counts). ```suggestion assert_operator(category_b[stemmed.to_sym] || 0, :<=, category_a[stemmed.to_sym], 'Words trained only in A should have higher count in A') ``` How can I resolve this? If you propose a fix, please make it concise.

cardmagic · 2025-12-29T00:12:07Z

Re: test_training_data_isolation concern

The word lists are intentionally non-overlapping:

words_a = %w[apple banana cherry orange grape]
words_b = %w[dog elephant fox giraffe horse]

Since text1 samples from words_a and text2 samples from words_b, there's no possibility of overlap. The test correctly verifies that training data stays isolated to its category.

The suggested change would test a different property (relative counts) rather than strict isolation.

greptile-apps bot reviewed Dec 28, 2025

View reviewed changes

cardmagic added 2 commits December 28, 2025 16:01

style: fix RuboCop offenses in property tests

6bac00e

refactor: remove redundant comments from property tests

59e7396

ci: drop Ruby 3.2 from test matrix

d91567c

greptile-apps bot reviewed Dec 29, 2025

View reviewed changes

cardmagic merged commit 7d70749 into master Dec 29, 2025
6 checks passed

cardmagic deleted the feature/property-based-tests branch December 29, 2025 00:12

	# Sample words to use for generating meaningful text
	SAMPLE_WORDS = %w[

	# Test 1: Classification determinism - same input always yields same output
	def test_classification_is_deterministic

	# Test 2: Classifications scores are deterministic
	def test_classification_scores_are_deterministic

	# Test 3: Training order independence (commutativity)
	def test_training_order_independence

	# Use consistent word count and ensure words are meaningful
	word_count = Rantly { range(3, 6) }

	# Classification should be deterministic
	test_text = 'This is about programming and computers.'

	# Test 16: Adding and removing categories maintains consistency
	def test_category_operations_maintain_consistency

	# Use simple category names that won't be transformed unexpectedly
	category_name = "Category#{rand(1000)}"

	# Test 17: Training data isolation between categories
	def test_training_data_isolation

	# Words from A's training should not appear in B
	text1.downcase.split.each do \|word\|

	refute category_b[stemmed.to_sym],
	assert_operator(category_b[stemmed.to_sym] \|\| 0, :<=, category_a[stemmed.to_sym],
	'Words trained only in A should have higher count in A')

Add property-based tests for probabilistic invariants #99

Add property-based tests for probabilistic invariants #99

Uh oh!

Conversation

cardmagic commented Dec 28, 2025

Summary

Tests Added

Bayes Classifier (10 tests)

LSI (5 tests)

Category Operations (2 tests)

Test Results

Test plan

Uh oh!

greptile-apps bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Major Changes

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

cardmagic commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cardmagic commented Dec 29, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

cardmagic commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Dec 28, 2025 •

edited

Loading

cardmagic commented Dec 29, 2025 •

edited

Loading