Currently the neural network is tested with single cases at a time. This is not very informative as to the performance on a larger dataset. The index.coffee script should sample 80% of the beer recipes and use them for training. The rest should be stored in a file that can be read by something like output.coffee.
The output.coffee file can then read the test-cases from file and classify each recipe. The number of correctly classified recipes can then easily be calculated. We can operate with a binary classification system were the top result must be the correct one, or we can say that it needs to be in the top N results.