Skip to content

Commit c1d4220

Browse files
authored
Merge pull request #32 from JuliaAI/incremental
Add an observation-updatable density estimator to tests
2 parents 168e0c6 + d82eaa5 commit c1d4220

11 files changed

+208
-40
lines changed

Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ julia = "1.6"
1111

1212
[extras]
1313
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
14+
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
1415
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1516
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
1617
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
@@ -23,6 +24,7 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
2324
[targets]
2425
test = [
2526
"DataFrames",
27+
"Distributions",
2628
"LinearAlgebra",
2729
"MLUtils",
2830
"Random",

docs/src/common_implementation_patterns.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# Common Implementation Patterns
22

3-
```@raw html
4-
🚧
5-
```
3+
!!! warning
64

75
This section is only an implementation guide. The definitive specification of the
86
Learn API is given in [Reference](@ref reference).
@@ -25,7 +23,7 @@ implementations fall into one (or more) of the following informally understood p
2523

2624
- [Iterative Algorithms](@ref)
2725

28-
- Incremental Algorithms
26+
- [Incremental Algorithms](@ref): Algorithms that can be updated with new observations.
2927

3028
- [Feature Engineering](@ref): Algorithms for selecting or combining features
3129

@@ -48,7 +46,7 @@ implementations fall into one (or more) of the following informally understood p
4846

4947
- Survival Analysis
5048

51-
- Density Estimation: Algorithms that learn a probability distribution
49+
- [Density Estimation](@ref): Algorithms that learn a probability distribution
5250

5351
- Bayesian Algorithms
5452

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ A key to enabling toolboxes to enhance LearnAPI.jl algorithm functionality is th
9898
implementation of two key additional methods, beyond the usual `fit` and
9999
`predict`/`transform`. Given any training `data` consumed by `fit` (such as `data = (X,
100100
y)` in the example above) [`LearnAPI.features(algorithm, data)`](@ref input) tells us what
101-
part of `data` comprises *features*, which is something that can be passsed onto to
101+
part of `data` comprises *features*, which is something that can be passed onto to
102102
`predict` or `transform` (`X` in the example) while [`LearnAPI.target(algorithm,
103103
data)`](@ref), if implemented, tells us what part comprises the target (`y` in the
104104
example). By explicitly requiring such methods, we free algorithms to consume data in
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,5 @@
11
# Density Estimation
2+
3+
See these examples from tests:
4+
5+
- [normal distribution estimator](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/patterns/incremental_algorithms.jl)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Incremental Algorithms
2+
3+
See these examples from tests:
4+
5+
- [normal distribution estimator](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/patterns/incremental_algorithms.jl)

src/predict_transform.jl

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,6 @@ function DOC_IMPLEMENTED_METHODS(name; overloaded=false)
44
"[`LearnAPI.functions`](@ref) trait. "
55
end
66

7-
const OPERATIONS = (:predict, :transform, :inverse_transform)
8-
const DOC_OPERATIONS_LIST_SYMBOL = join(map(op -> "`:$op`", OPERATIONS), ", ")
9-
const DOC_OPERATIONS_LIST_FUNCTION = join(map(op -> "`LearnAPI.$op`", OPERATIONS), ", ")
10-
117
DOC_MUTATION(op) =
128
"""
139
@@ -66,6 +62,9 @@ which lists all supported target proxies.
6662
6763
The argument `model` is anything returned by a call of the form `fit(algorithm, ...)`.
6864
65+
If `LearnAPI.features(LearnAPI.algorithm(model)) == nothing`, then argument `data` is
66+
omitted. An example is density estimators.
67+
6968
# Example
7069
7170
In the following, `algorithm` is some supervised learning algorithm with
@@ -105,6 +104,7 @@ $(DOC_DATA_INTERFACE(:predict))
105104
106105
"""
107106
predict(model, data) = predict(model, kinds_of_proxy(algorithm(model)) |> first, data)
107+
predict(model) = predict(model, kinds_of_proxy(algorithm(model)) |> first)
108108

109109
# automatic slurping of multiple data arguments:
110110
predict(model, k::KindOfProxy, data1, data2, datas...; kwargs...) =
@@ -167,8 +167,8 @@ $(DOC_MUTATION(:transform))
167167
$(DOC_DATA_INTERFACE(:transform))
168168
169169
"""
170-
transform(model, data1, data2...; kwargs...) =
171-
transform(model, (data1, datas...); kwargs...) # automatic slurping
170+
transform(model, data1, data2, datas...; kwargs...) =
171+
transform(model, (data1, data2, datas...); kwargs...) # automatic slurping
172172

173173
"""
174174
inverse_transform(model, data)

src/types.jl

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -22,27 +22,27 @@ See also [`LearnAPI.KindOfProxy`](@ref).
2222
2323
| type | form of an observation |
2424
|:-------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
25-
| `LearnAPI.Point` | same as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode |
26-
| `LearnAPI.Sampleable` | object that can be sampled to obtain object of the same form as target observation |
27-
| `LearnAPI.Distribution` | explicit probability density/mass function whose sample space is all possible target observations |
28-
| `LearnAPI.LogDistribution` | explicit log-probability density/mass function whose sample space is possible target observations |
29-
| `LearnAPI.Probability`¹ | numerical probability or probability vector |
30-
| `LearnAPI.LogProbability`¹ | log-probability or log-probability vector |
31-
| `LearnAPI.Parametric`¹ | a list of parameters (e.g., mean and variance) describing some distribution |
32-
| `LearnAPI.LabelAmbiguous` | collections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering |
33-
| `LearnAPI.LabelAmbiguousSampleable` | sampleable version of `LabelAmbiguous`; see `Sampleable` above |
34-
| `LearnAPI.LabelAmbiguousDistribution` | pdf/pmf version of `LabelAmbiguous`; see `Distribution` above |
35-
| `LearnAPI.LabelAmbiguousFuzzy` | same as `LabelAmbiguous` but with multiple values of indeterminant number |
36-
| `LearnAPI.Quantile`² | same as target but with quantile interpretation |
37-
| `LearnAPI.Expectile`² | same as target but with expectile interpretation |
38-
| `LearnAPI.ConfidenceInterval`² | confidence interval |
39-
| `LearnAPI.Fuzzy` | finite but possibly varying number of target observations |
40-
| `LearnAPI.ProbabilisticFuzzy` | as for `Fuzzy` but labeled with probabilities (not necessarily summing to one) |
41-
| `LearnAPI.SurvivalFunction` | survival function |
42-
| `LearnAPI.SurvivalDistribution` | probability distribution for survival time |
43-
| `LearnAPI.SurvivalHazardFunction` | hazard function for survival time |
44-
| `LearnAPI.OutlierScore` | numerical score reflecting degree of outlierness (not necessarily normalized) |
45-
| `LearnAPI.Continuous` | real-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls) |
25+
| `Point` | same as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode |
26+
| `Sampleable` | object that can be sampled to obtain object of the same form as target observation |
27+
| `Distribution` | explicit probability density/mass function whose sample space is all possible target observations |
28+
| `LogDistribution` | explicit log-probability density/mass function whose sample space is possible target observations |
29+
| `Probability`¹ | numerical probability or probability vector |
30+
| `LogProbability`¹ | log-probability or log-probability vector |
31+
| `Parametric`¹ | a list of parameters (e.g., mean and variance) describing some distribution |
32+
| `LabelAmbiguous` | collections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering |
33+
| `LabelAmbiguousSampleable` | sampleable version of `LabelAmbiguous`; see `Sampleable` above |
34+
| `LabelAmbiguousDistribution` | pdf/pmf version of `LabelAmbiguous`; see `Distribution` above |
35+
| `LabelAmbiguousFuzzy` | same as `LabelAmbiguous` but with multiple values of indeterminant number |
36+
| `Quantile`² | same as target but with quantile interpretation |
37+
| `Expectile`² | same as target but with expectile interpretation |
38+
| `ConfidenceInterval`² | confidence interval |
39+
| `Fuzzy` | finite but possibly varying number of target observations |
40+
| `ProbabilisticFuzzy` | as for `Fuzzy` but labeled with probabilities (not necessarily summing to one) |
41+
| `SurvivalFunction` | survival function |
42+
| `SurvivalDistribution` | probability distribution for survival time |
43+
| `SurvivalHazardFunction` | hazard function for survival time |
44+
| `OutlierScore` | numerical score reflecting degree of outlierness (not necessarily normalized) |
45+
| `Continuous` | real-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls) |
4646
4747
¹Provided for completeness but discouraged to avoid [ambiguities in
4848
representation](https://github.com/alan-turing-institute/MLJ.jl/blob/dev/paper/paper.md#a-unified-approach-to-probabilistic-predictions-and-their-evaluation).
@@ -86,9 +86,9 @@ space ``Y^n``, where ``Y`` is the space from which the target variable takes its
8686
8787
| type `T` | form of output of `predict(model, ::T, data)` |
8888
|:-------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
89-
| `LearnAPI.JointSampleable` | object that can be sampled to obtain a *vector* whose elements have the form of target observations; the vector length matches the number of observations in `data`. |
90-
| `LearnAPI.JointDistribution` | explicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
91-
| `LearnAPI.JointLogDistribution` | explicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
89+
| `JointSampleable` | object that can be sampled to obtain a *vector* whose elements have the form of target observations; the vector length matches the number of observations in `data`. |
90+
| `JointDistribution` | explicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
91+
| `JointLogDistribution` | explicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
9292
9393
"""
9494
abstract type Joint <: KindOfProxy end
@@ -108,9 +108,9 @@ single object representing a probability distribution.
108108
109109
| type `T` | form of output of `predict(model, ::T)` |
110110
|:--------------------------------:|:-----------------------------------------------------------------------|
111-
| `LearnAPI.SingleSampleable` | object that can be sampled to obtain a single target observation |
112-
| `LearnAPI.SingleDistribution` | explicit probability density/mass function for sampling the target |
113-
| `LearnAPI.SingleLogDistribution` | explicit log-probability density/mass function for sampling the target |
111+
| `SingleSampleable` | object that can be sampled to obtain a single target observation |
112+
| `SingleDistribution` | explicit probability density/mass function for sampling the target |
113+
| `SingleLogDistribution` | explicit log-probability density/mass function for sampling the target |
114114
115115
"""
116116
abstract type Single <: KindOfProxy end

0 commit comments

Comments
 (0)