You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here the default *MLJ* convention is being applied ((cf. [docs](https://alan-turing-institute.github.io/ScientificTypes.jl/dev/#The-MLJ-convention-1)). Detail is obtained in the obvious way; for example:
82
+
83
+
```julia
84
+
julia> sch.names
85
+
(:a, :b, :c, :d, :e)
86
+
```
78
87
79
88
Now you could want to specify that `b` is actually a `Count`, and that `d` and `e` are `Multiclass`; this is done with the `coerce` function:
Copy file name to clipboardExpand all lines: docs/src/index.md
+17-5Lines changed: 17 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ Found
35
35
36
36
- A single method `scitype` for articulating a convention about what scientific type each Julia object can represent. For example, one might declare `scitype(::AbstractFloat) = Continuous`.
37
37
38
-
- A default convention called *mlj*, based on dependencies
38
+
- A default convention called *MLJ*, based on dependencies
39
39
`CategoricalArrays`, `ColorTypes`, and `Tables`, which includes a
40
40
convenience method `coerce` for performing scientific type coercion
41
41
on `AbstractVectors` and columns of tabular data (any table
@@ -122,12 +122,24 @@ Finally there is a `coerce!` method that does in-place coercion provided the dat
122
122
- Developers can define their own conventions using the code in `src/conventions/mlj/` as a template. The active convention is controlled by the value of `ScientificTypes.CONVENTION[1]`.
123
123
124
124
125
+
## Special note on binary data
126
+
127
+
ScientificTypes does not define a separate "binary" scientific
128
+
type. Rather, when binary data has an intrinsic "true" class (for example
129
+
pass/fail in a product test), then it should be assigned an
130
+
`OrderedFactor{2}` scitype, while data with no such class (e.g., gender)
131
+
should be assigned a `Multiclass{2}` scitype. In the former case
132
+
we recommend that the "true" class come after "false" in the ordering
133
+
(corresponding to the usual assignment "false=0" and "true=1"). Of
134
+
course, `Finite{2}` covers both cases of binary data.
135
+
136
+
125
137
## Detailed usage examples
126
138
127
139
```@example 3
128
140
using ScientificTypes
129
141
# activate a convention
130
-
mlj() # redundant as it's the default
142
+
ScientificTypes.set_convention(MLJ) # redundant as it's the default
*Performance note:* Computing type unions over large arrays is
205
217
expensive and, depending on the convention's implementation and the
206
-
array eltype, computing the scitype can be slow. (In the *mlj*
218
+
array eltype, computing the scitype can be slow. (In the *MLJ*
207
219
convention this is mitigated with the help of the
208
220
`ScientificTypes.Scitype` method, of which other conventions could
209
221
make use. Do `?ScientificTypes.Scitype` for details.) An eltype `Any`
210
222
will always be slow and you may want to consider replacing an array
211
-
`A` with `broadcast(idenity, A)` to collapse the eltype and speed up
223
+
`A` with `broadcast(identity, A)` to collapse the eltype and speed up
212
224
the computation.
213
225
214
226
Provided the [Tables.jl](https://github.com/JuliaData/Tables.jl) package is loaded, any table implementing the Tables interface has a scitype encoding the scitypes of its columns:
@@ -246,7 +258,7 @@ Note that `Table(Continuous,Finite)` is a *type* union and not a `Table` *instan
246
258
247
259
## The MLJ convention
248
260
249
-
The table below summarizes the *mlj* convention for representing
261
+
The table below summarizes the *MLJ* convention for representing
250
262
scientific types:
251
263
252
264
Type `T` | `scitype(x)` for `x::T` | package required
0 commit comments