Add 6 new best practices and minor fixes #8

schmelczer · 2022-09-20T09:13:13Z

Motivation

I identified six new potential best practices through conducting two case studies on a production ML pipeline while designing GreatAI. I think these are generalisable enough to be useful for other practitioners.

I created this PR following the advice of @jstvssr.

Changes

The added best practices are:

Parallelise Feature Extraction
Allow Experimentation with the Inference Function
Implement Standard Schemas for Common Prediction Tasks
Cache Production Predictions
Allow Robustly Composing Inference Functions
Keep the Model and its Documentation Together

I also added some smaller fixes:

Updated the Gemfile to include missing dependencies
Updated the practices' unique_id-s because two used to have the same (24)
Reformatted a template expression inside a JS section which stopped Disqus from loading
Resolved the build warnings
Reformatted the README
Removed the extra space above the title of /practices
Automated the sitemap generation so that it includes the new practices

Thank you for taking a look at my PR!

The value of site.best_practices[0] is null because the first best practices is at site.best_practices[1].

xserban · 2022-12-09T07:58:15Z

Hi,

Thanks for the PR. Here is some initial feedback regarding the practices:

Parallelise Feature Extraction -> I think this can be merged with the test feature extraction code into a broader practice regarding feature engineering. I reckon you are targeting the production environment here, where scaling feature extraction is of utmost importance (some inspiration can be found here: http://proceedings.mlr.press/v67/li17a/li17a.pdf). If you agree, I think we should debate whether the practice belongs to deployment or training. Currently it's possitioned in training, but I would be careful to suggest engineers to focus on feature extraction parallelisation while still playing around with feature. Let me know your thoughts.
Allow Experimentation with the Inference Function and Allow Robustly Composing Inference Functions -> I will treat these two practices together, as they both refer to engineering the inference function. Probably it will be better to merge them into one practice and call them the inference API rather than function, since most likely it will serve as an interface to a larger service for inference.

Moreover, judging by the nature of the practices, I would say they fit better to the Coding/Deployment group of practices, rather than the training practices. The way we engineer the inference APIs is similar to implementing continuous integration or shadow deployment. I would call the practice something along the lines of "Design a flexible inferene API than can be used for experimentation.". Flexibility implies both composition and fast experimentation.

Implement Standard Schemas for Common Prediction Tasks -> this is a good practice. Restructuring it will depend on the way we approach point nr. 2 from above. I would also drop "common" and emphasise more the scenarios to which this practice applies (most likely tabular data ).
Cache Production Predictions -> I think the description should talk more about the scenarios where this practice applies. In particular, regarding repetitive predictions that do not involve personalisation or any individual attributes from the users. If you agree, I may take a stance on editing the practice.
Keep the Model and its Documentation Together -> we already have a practice for versioning all model related artefacts, so I suggest to merge the content of this practice with this practice. Otherwise, a better motivation for this practice is needed.

My suggestion is to try to first debate the large structural changes (e.g., if you agree some practices should be merge) and afterwards modify them one by one.
Let me know your thoughts.

schmelczer added 24 commits September 18, 2022 17:38

Format README

d8dd619

Add missing dependency

98c4921

Fix Jekyll warnings

a5717cc

Fix off-by-one error in practice counter

3996003

The value of site.best_practices[0] is null because the first best practices is at site.best_practices[1].

Add caching best practice

37a9673

Mention great-ai as a source for some of the practices

16145ff

Fix syntax error stopping Disqus from loading

114c51e

Add model cards practice

6a0ac86

Add parallel feature extraction practice

76fed24

Add standard schemas practice

5add5b2

Add composability practice

5b56c91

Add allow experimentation practice

3a81bcf

Minor fixes

810bc81

Update IDs

3e13412

Fix link targets

555fcdc

Make heading spacing consistent with the rest of the pages

faec7fb

Improve readability of new practices

c23dfad

Remove unintuitive incremental flag

69d344a

Remove trailing comments

1a5ab02

Fix link

6f0a226

Fix formatting

fd2cb05

Fix automated sitemap generation

82828cb

Remove colon from title

e62a906

Put back missing dollar sign

b17d35c

kvdblom requested a review from xserban October 24, 2022 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 6 new best practices and minor fixes #8

Add 6 new best practices and minor fixes #8

Uh oh!

schmelczer commented Sep 20, 2022

Uh oh!

xserban commented Dec 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add 6 new best practices and minor fixes #8

Are you sure you want to change the base?

Add 6 new best practices and minor fixes #8

Uh oh!

Conversation

schmelczer commented Sep 20, 2022

Motivation

Changes

Uh oh!

xserban commented Dec 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants