You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A light-weight Julia interface for implementing conventions about the
8
-
scientific interpretation of data, and for performing type coercions
9
-
enforcing those conventions.
7
+
A light-weight, dependency-free Julia interface for implementing conventions
8
+
about the scientific interpretation of data.
9
+
This package should only be used by developers who intend to define their own
10
+
scientific type convention.
11
+
The [MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl) packages implements such a convention used in the [MLJ](https://github.com/alan-turing-institute/MLJ.jl)
12
+
universe.
13
+
14
+
## Purpose
10
15
11
16
The package makes the distinction between between **machine type** and **scientific type**:
12
17
@@ -22,89 +27,169 @@ is used for product numbers (a factor) but also for a person's weight
22
27
type is frequently represented by *different* machine types - both
23
28
`Int` and `Float64` are used to represent weights, for example.
24
29
30
+
### Type hierarchy
25
31
26
-
## Very quick start
32
+
The package provides a hierarchy of Julia types representing data types for use
33
+
in method dispatch (e.g., for trait values). Instances of the types play no
34
+
role.
27
35
28
-
For more information and examples please refer to [the
- An *interface*, for articulating a convention about the scientific
34
-
interpretation of data. This consists of a definition of a scientific
35
-
type hierarchy, and a single function `scitype` with scientific
36
-
types as values. Someone implementing a convention must add methods
37
-
to this function, while the general user just applies it to data, as
38
-
in `scitype(4.5)` (returning `Continuous` in the *MLJ* convention).
55
+
If you want to implement your own convention, you can consider the [MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl) as a blueprint.
39
56
40
-
- A built-in convention, called *MLJ*, active by default.
57
+
When defining a convention you may want to:
41
58
42
-
- Convenience methods for working with scientific types, the most commonly used being
59
+
* declare a new convention,
60
+
* declare new traits,
61
+
* implement custom `schema`, `show` and `info` functions,
62
+
* add explicit `scitype` and `Scitype` definitions,
63
+
* define a `coerce` function.
43
64
44
-
-`schema(X)`, which gives an extended schema of any Tables.jl
45
-
compatible table `X`, including the column scientific types
46
-
implied by the active convention.
47
-
48
-
- `coerce(X, ...)`, which coerces the machine types of `X` to
49
-
reflect a desired scientific type.
65
+
We explain below how these steps may look like taking the MLJ convention as
66
+
an example.
50
67
51
-
For example,
68
+
### Declaring a new convention
69
+
70
+
In the module, define a
52
71
53
72
```julia
54
-
using ScientificTypes, DataFrames
55
-
X =DataFrame(
56
-
a =randn(5),
57
-
b = [-2.0, 1.0, 2.0, missing, 3.0],
58
-
c = [1, 2, 3, 4, 5],
59
-
d = [0, 1, 0, 1, 0],
60
-
e = ['M', 'F', missing, 'M', 'F'],
61
-
)
62
-
sch =schema(X) # schema is overloaded in Scientifictypes
Subsequently you will have functions dispatching over `::MyConvention` for
85
+
instance in the MLJ case:
86
+
87
+
```julia
88
+
ScientificTypes.scitype(::Integer, ::MLJ) = Count
79
89
```
80
90
81
-
Here the default *MLJ* convention is being applied ((cf. [docs](https://alan-turing-institute.github.io/ScientificTypes.jl/dev/#The-MLJ-convention-1)). Detail is obtained in the obvious way; for example:
91
+
### Declaring new traits
92
+
93
+
It's useful to mark containers that meet explicit traits; by default everything
94
+
is marked as `:other`. In the MLJ convention, we specifically consider all
95
+
containers that meet the [`Tables.jl`](https://github.com/JuliaData/Tables.jl)
96
+
interface. In order to declare this you have to add a key to the
97
+
`TRAIT_FUNCTION_GIVEN_NAME` dictionary with a boolean function that verifies
98
+
the trait. This must also be placed in your `__init__` function.
0 commit comments