Conversation
andrewdalpino
left a comment
There was a problem hiding this comment.
Hey this is great @Boorinio , before I dive into the review can I ask where you obtained the formula/calculation of the gradient for 1D Conv? Is there a reference?
It's great that you;ve also included pooling layers, I'm going to change the title of the PR to reflect the additional classes.
There was a problem hiding this comment.
Pull request overview
Adds new 1D sequence-processing layers to the NeuralNet stack (convolution + pooling) along with PHPUnit coverage to exercise initialization and basic forward/back/infer flows.
Changes:
- Introduce
Conv1Dparametric hidden layer with stride/padding/bias/L2 support. - Introduce
AvgPool1DandMaxPool1Dhidden layers for 1D downsampling. - Add initial PHPUnit tests for the new layers.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/NeuralNet/Layers/Conv1D.php |
Implements 1D convolution forward/infer/backprop and parameter management. |
src/NeuralNet/Layers/AvgPool1D.php |
Implements 1D average pooling forward/infer/backprop. |
src/NeuralNet/Layers/MaxPool1D.php |
Implements 1D max pooling forward/infer/backprop with argmax tracking. |
tests/NeuralNet/Layers/Conv1DTest.php |
Adds shape/exception-path coverage for Conv1D. |
tests/NeuralNet/Layers/AvgPool1DTest.php |
Adds shape/exception-path coverage for AvgPool1D. |
tests/NeuralNet/Layers/MaxPool1DTest.php |
Adds shape/exception-path coverage for MaxPool1D. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is derived via integer division of input->n() by inputLength without validating divisibility. If n is not an exact multiple of inputLength, the layer will silently truncate the batch and then read undefined offsets. Consider throwing a RuntimeException when ($input->n() % $this->inputLength) !== 0.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $totalColumns = $input->n(); | |
| if ($this->inputLength === 0 || $totalColumns % $this->inputLength !== 0) { | |
| throw new RuntimeException( | |
| 'Input length mismatch: the number of columns (' . $totalColumns | |
| . ') must be an exact multiple of input length (' . $this->inputLength . ').' | |
| ); | |
| } | |
| $batchSize = (int) ($totalColumns / $this->inputLength); |
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is computed using integer division of input->n() by inputLength without checking remainder. If the input length does not evenly divide n, pooling will read undefined offsets. Consider validating ($input->n() % $this->inputLength) === 0 and throwing a RuntimeException when it is not.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $sequenceLength = $input->n(); | |
| if ($sequenceLength % $this->inputLength !== 0) { | |
| throw new RuntimeException('Input length mismatch:' | |
| . " expected a multiple of {$this->inputLength}, got {$sequenceLength}."); | |
| } | |
| $batchSize = intdiv($sequenceLength, $this->inputLength); |
| $flippedKernel = []; | ||
|
|
||
| foreach ($kernelArray as $f => $row) { | ||
| $flippedKernel[$f] = array_reverse($row); |
There was a problem hiding this comment.
computeInputGradient() flips each filter row using array_reverse($row). Since the kernel row is laid out as concatenated per-channel segments, reversing the entire row also reverses channel order, producing incorrect input gradients when inputChannels > 1. Flip only along the temporal axis within each channel segment (keep channel blocks in place) when building the flipped kernel.
| $flippedKernel[$f] = array_reverse($row); | |
| // Flip only along the temporal axis within each channel segment, | |
| // keeping channel blocks in place. | |
| $flippedRow = $row; | |
| for ($c = 0; $c < $this->inputChannels; ++$c) { | |
| $base = $c * $this->kernelSize; | |
| for ($k = 0; $k < $this->kernelSize; ++$k) { | |
| $flippedRow[$base + $k] = $row[$base + ($this->kernelSize - 1 - $k)]; | |
| } | |
| } | |
| $flippedKernel[$f] = $flippedRow; |
| $biasArray = $biases->param()->asArray(); | ||
| $outputArray = $output->asArray(); | ||
|
|
||
| $biased = []; | ||
|
|
||
| foreach ($outputArray as $f => $row) { | ||
| $bias = $biasArray[$f]; | ||
| $biasedRow = []; | ||
|
|
||
| foreach ($row as $value) { | ||
| $biasedRow[] = $value + $bias; | ||
| } | ||
|
|
||
| $biased[] = $biasedRow; | ||
| } | ||
|
|
||
| return Matrix::quick($biased); |
There was a problem hiding this comment.
addBias() manually loops over all output values and also takes an unused $batchSize parameter. Since other layers (e.g. Dense) rely on Tensor broadcasting for Matrix::add(Vector), consider replacing this with a direct add of the bias vector and dropping the unused parameter to simplify and improve performance.
| $biasArray = $biases->param()->asArray(); | |
| $outputArray = $output->asArray(); | |
| $biased = []; | |
| foreach ($outputArray as $f => $row) { | |
| $bias = $biasArray[$f]; | |
| $biasedRow = []; | |
| foreach ($row as $value) { | |
| $biasedRow[] = $value + $bias; | |
| } | |
| $biased[] = $biasedRow; | |
| } | |
| return Matrix::quick($biased); | |
| // Use Tensor broadcasting: add the bias vector across the filter dimension. | |
| return $output->add($biases->param()); |
| $batchSize = (int) ($input->n() / $this->inputLength); | ||
|
|
There was a problem hiding this comment.
$batchSize is computed using integer division of input->n() by inputLength without checking remainder. If the input length does not evenly divide n, pooling will read undefined offsets. Consider validating ($input->n() % $this->inputLength) === 0 and throwing a RuntimeException when it is not.
| $batchSize = (int) ($input->n() / $this->inputLength); | |
| $inputN = $input->n(); | |
| if ($inputN % $this->inputLength !== 0) { | |
| throw new RuntimeException( | |
| 'Input length mismatch: expected the number of timesteps ' | |
| . "({$inputN}) to be evenly divisible by input length ({$this->inputLength})." | |
| ); | |
| } | |
| $batchSize = (int) ($inputN / $this->inputLength); |
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals(2, $infer->m()); | ||
| $this->assertEquals(24, $infer->n()); |
There was a problem hiding this comment.
The test only validates output/gradient shapes, not numerical correctness of the convolution (forward/infer) or backprop math. Other layer tests in this repo typically assert expected values; consider adding at least one small deterministic case (e.g., constant initializer / hand-computable kernel) with assertEqualsWithDelta for forward, gradient, and infer.
| $this->assertEquals(24, $infer->n()); | |
| $this->assertEquals(24, $infer->n()); | |
| // ----------------------------------------------------------------- | |
| // Deterministic numeric test for forward, back, and infer | |
| // using a simple hand-computable configuration. | |
| // ----------------------------------------------------------------- | |
| // Configuration: | |
| // - filters = 1 | |
| // - kernelSize = 3 | |
| // - inputLength = 3 | |
| // - inputChannels = 1 | |
| // - stride = 1 | |
| // - padding = 0 | |
| // - weights initialized to 1.0 | |
| // - bias initialized to 0.0 | |
| // | |
| // Input: [[1, 2, 3]] (1 channel, length 3, batch size 1) | |
| // Expected forward/infer output: | |
| // 1*1 + 1*2 + 1*3 = 6 | |
| // Expected gradient w.r.t. input for upstream gradient [[1]]: | |
| // [[1, 1, 1]] | |
| $deterministicLayer = new Conv1D( | |
| 1, // filters | |
| 3, // kernelSize | |
| 3, // inputLength | |
| 1, // inputChannels | |
| 1, // stride | |
| 0, // padding | |
| 0.0, | |
| true, | |
| new Constant(1.0), // weight initializer | |
| new Constant(0.0) // bias initializer | |
| ); | |
| $deterministicLayer->initialize(3); | |
| // Single-batch input of length 3 on one channel. | |
| $deterministicInput = Matrix::quick([ | |
| [1.0, 2.0, 3.0], | |
| ]); | |
| $forwardDet = $deterministicLayer->forward($deterministicInput); | |
| $this->assertInstanceOf(Matrix::class, $forwardDet); | |
| $forwardArray = $forwardDet->asArray(); | |
| $this->assertEquals(1, $forwardDet->m()); | |
| $this->assertEquals(1, $forwardDet->n()); | |
| $this->assertEqualsWithDelta(6.0, $forwardArray[0][0], 1e-8); | |
| // Upstream gradient is 1 for the single output position. | |
| $prevGradDet = Matrix::quick([ | |
| [1.0], | |
| ]); | |
| $gradientDet = $deterministicLayer->back($prevGradDet, $this->optimizer)->compute(); | |
| $this->assertInstanceOf(Matrix::class, $gradientDet); | |
| $gradientArray = $gradientDet->asArray(); | |
| $this->assertEquals(1, $gradientDet->m()); | |
| $this->assertEquals(3, $gradientDet->n()); | |
| // Each input position should receive gradient 1.0. | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][0], 1e-8); | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][1], 1e-8); | |
| $this->assertEqualsWithDelta(1.0, $gradientArray[0][2], 1e-8); | |
| // Inference should produce the same numeric output as forward. | |
| $inferDet = $deterministicLayer->infer($deterministicInput); | |
| $this->assertInstanceOf(Matrix::class, $inferDet); | |
| $inferArray = $inferDet->asArray(); | |
| $this->assertEquals(1, $inferDet->m()); | |
| $this->assertEquals(1, $inferDet->n()); | |
| $this->assertEqualsWithDelta(6.0, $inferArray[0][0], 1e-8); |
| public function initializeForwardBackInfer() : void | ||
| { | ||
| $fanOut = $this->layer->initialize($this->inputChannels); | ||
|
|
||
| // Width should be number of input channels | ||
| $this->assertEquals($this->inputChannels, $this->layer->width()); | ||
|
|
||
| // Output length: floor((10 - 3) / 1) + 1 = 8 | ||
| $this->assertEquals(8, $this->layer->outputLength()); | ||
|
|
||
| // Fan out should be inputChannels * outputLength = 2 * 8 = 16 | ||
| $this->assertEquals(16, $fanOut); | ||
|
|
||
| // Forward pass | ||
| $forward = $this->layer->forward($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $forward); | ||
|
|
||
| // Output shape: (inputChannels, outputLength * batchSize) = (2, 8 * 3) = (2, 24) | ||
| $this->assertEquals($this->inputChannels, $forward->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $forward->n()); | ||
|
|
||
| // Backward pass | ||
| $outputLength = 8; | ||
| $gradData = []; | ||
|
|
||
| for ($c = 0; $c < $this->inputChannels; ++$c) { | ||
| $row = []; | ||
|
|
||
| for ($b = 0; $b < $this->batchSize; ++$b) { | ||
| for ($t = 0; $t < $outputLength; ++$t) { | ||
| $row[] = 0.01 * ($t + $c); | ||
| } | ||
| } | ||
|
|
||
| $gradData[] = $row; | ||
| } | ||
|
|
||
| $this->prevGrad = new Deferred(function () use ($gradData) { | ||
| return Matrix::quick($gradData); | ||
| }); | ||
|
|
||
| $gradient = $this->layer->back($this->prevGrad, $this->optimizer)->compute(); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $gradient); | ||
|
|
||
| // Gradient shape should match input shape | ||
| $this->assertEquals($this->inputChannels, $gradient->m()); | ||
| $this->assertEquals($this->inputLength * $this->batchSize, $gradient->n()); | ||
|
|
||
| // Inference pass | ||
| $infer = $this->layer->infer($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals($this->inputChannels, $infer->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $infer->n()); | ||
| } |
There was a problem hiding this comment.
The test currently asserts only output/gradient shapes. To better protect against math regressions, consider adding assertions for actual pooled values and backpropagated gradients on a small deterministic input (similar to existing Dense/Dropout layer tests).
| public function initializeForwardBackInfer() : void | ||
| { | ||
| $fanOut = $this->layer->initialize($this->inputChannels); | ||
|
|
||
| // Width should be number of input channels | ||
| $this->assertEquals($this->inputChannels, $this->layer->width()); | ||
|
|
||
| // Output length: floor((10 - 3) / 1) + 1 = 8 | ||
| $this->assertEquals(8, $this->layer->outputLength()); | ||
|
|
||
| // Fan out should be inputChannels * outputLength = 2 * 8 = 16 | ||
| $this->assertEquals(16, $fanOut); | ||
|
|
||
| // Forward pass | ||
| $forward = $this->layer->forward($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $forward); | ||
|
|
||
| // Output shape: (inputChannels, outputLength * batchSize) = (2, 8 * 3) = (2, 24) | ||
| $this->assertEquals($this->inputChannels, $forward->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $forward->n()); | ||
|
|
||
| // Backward pass | ||
| $outputLength = 8; | ||
| $gradData = []; | ||
|
|
||
| for ($c = 0; $c < $this->inputChannels; ++$c) { | ||
| $row = []; | ||
|
|
||
| for ($b = 0; $b < $this->batchSize; ++$b) { | ||
| for ($t = 0; $t < $outputLength; ++$t) { | ||
| $row[] = 0.01 * ($t + $c); | ||
| } | ||
| } | ||
|
|
||
| $gradData[] = $row; | ||
| } | ||
|
|
||
| $this->prevGrad = new Deferred(function () use ($gradData) { | ||
| return Matrix::quick($gradData); | ||
| }); | ||
|
|
||
| $gradient = $this->layer->back($this->prevGrad, $this->optimizer)->compute(); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $gradient); | ||
|
|
||
| // Gradient shape should match input shape | ||
| $this->assertEquals($this->inputChannels, $gradient->m()); | ||
| $this->assertEquals($this->inputLength * $this->batchSize, $gradient->n()); | ||
|
|
||
| // Inference pass | ||
| $infer = $this->layer->infer($this->input); | ||
|
|
||
| $this->assertInstanceOf(Matrix::class, $infer); | ||
| $this->assertEquals($this->inputChannels, $infer->m()); | ||
| $this->assertEquals(8 * $this->batchSize, $infer->n()); | ||
| } |
There was a problem hiding this comment.
The test currently asserts only output/gradient shapes. To better protect against math regressions, consider adding assertions for actual pooled values and backpropagated gradients on a small deterministic input (similar to existing Dense/Dropout layer tests).
| public function negativeStride() : void | ||
| { | ||
| $this->expectException(\Rubix\ML\Exceptions\InvalidArgumentException::class); | ||
|
|
||
| new Conv1D( | ||
| 2, // filters | ||
| 3, // kernelSize | ||
| 10, // inputLength | ||
| 1, // inputChannels | ||
| 0, // stride (invalid) | ||
| 0 // padding | ||
| ); |
There was a problem hiding this comment.
The test name negativeStride() is misleading because it passes 0 for stride (the failure case here is a non-positive stride, not a negative one). Consider renaming the test (e.g. to nonPositiveStride or zeroStride) to match what it validates.
Added numerical checks for the layers as the review from copilot makes sense. https://www.michaelpiseno.com/pages/conv-backprop.html also light context read https://cs231n.stanford.edu/slides/2025/lecture_4.pdf :P |
Adds ConvD1,Max/Avg layers :D