Skip to content

Commit 6f6e706

Browse files
authored
Barebone scientifictypes (#84) (#85)
1 parent c581e77 commit 6f6e706

27 files changed

+417
-2192
lines changed

.travis.yml

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,3 @@ notifications:
1515
after_success:
1616
# push coverage results to Codecov
1717
- julia -e 'using Pkg; pkg"add Coverage"; using Coverage; Codecov.submit(Codecov.process_folder())'
18-
19-
jobs:
20-
include:
21-
- stage: "Documentation"
22-
julia: 1.2
23-
os: linux
24-
script:
25-
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd()));
26-
Pkg.instantiate()'
27-
- julia --project=docs/ docs/make.jl
28-
after_success: skip

Project.toml

Lines changed: 3 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,14 @@
11
name = "ScientificTypes"
22
uuid = "321657f4-b219-11e9-178b-2701a2544e81"
33
authors = ["Anthony D. Blaom <[email protected]>"]
4-
version = "0.5.1"
5-
6-
[deps]
7-
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
8-
ColorTypes = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
9-
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
10-
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
4+
version = "0.6.0"
115

126
[compat]
13-
CategoricalArrays = "^0.7.3"
14-
ColorTypes = "^0.8"
15-
PrettyTables = "^0.6"
16-
Tables = "^0.2"
177
julia = "1"
188

199
[extras]
20-
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
21-
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
22-
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
10+
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
2311
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
2412

2513
[targets]
26-
test = ["Random", "Test", "CSV", "DataFrames"]
14+
test = ["Test", "Tables"]

README.md

Lines changed: 149 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
11
# ScientificTypes
22

3-
| [MacOS/Linux] | Coverage | Documentation |
4-
| :-----------: | :------: | :-----------: |
5-
| [![Build Status](https://travis-ci.org/alan-turing-institute/ScientificTypes.jl.svg?branch=master)](https://travis-ci.org/alan-turing-institute/ScientificTypes.jl) | [![codecov.io](http://codecov.io/github/alan-turing-institute/ScientificTypes.jl/coverage.svg?branch=master)](http://codecov.io/github/alan-turing-institute/ScientificTypes.jl?branch=master) | [![](https://img.shields.io/badge/docs-stable-blue.svg)](https://alan-turing-institute.github.io/ScientificTypes.jl/stable) |
3+
| [MacOS/Linux] | Coverage |
4+
| :-----------: | :------: |
5+
| [![Build Status](https://travis-ci.org/alan-turing-institute/ScientificTypes.jl.svg?branch=master)](https://travis-ci.org/alan-turing-institute/ScientificTypes.jl) | [![codecov.io](http://codecov.io/github/alan-turing-institute/ScientificTypes.jl/coverage.svg?branch=master)](http://codecov.io/github/alan-turing-institute/ScientificTypes.jl?branch=master) |
66

7-
A light-weight Julia interface for implementing conventions about the
8-
scientific interpretation of data, and for performing type coercions
9-
enforcing those conventions.
7+
A light-weight, dependency-free Julia interface for implementing conventions
8+
about the scientific interpretation of data.
9+
This package should only be used by developers who intend to define their own
10+
scientific type convention.
11+
The [MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl) packages implements such a convention used in the [MLJ](https://github.com/alan-turing-institute/MLJ.jl)
12+
universe.
13+
14+
## Purpose
1015

1116
The package makes the distinction between between **machine type** and **scientific type**:
1217

@@ -22,89 +27,169 @@ is used for product numbers (a factor) but also for a person's weight
2227
type is frequently represented by *different* machine types - both
2328
`Int` and `Float64` are used to represent weights, for example.
2429

30+
### Type hierarchy
2531

26-
## Very quick start
32+
The package provides a hierarchy of Julia types representing data types for use
33+
in method dispatch (e.g., for trait values). Instances of the types play no
34+
role.
2735

28-
For more information and examples please refer to [the
29-
manual](https://alan-turing-institute.github.io/ScientificTypes.jl/dev).
36+
```
37+
Found
38+
├─ Known
39+
│ ├─ Finite
40+
│ │ ├─ Multiclass
41+
│ │ └─ OrderedFactor
42+
│ ├─ Infinite
43+
│ │ ├─ Continuous
44+
│ │ └─ Count
45+
│ ├─ Image
46+
│ │ ├─ ColorImage
47+
│ │ └─ GrayImage
48+
| ├─ Textual
49+
│ └─ Table
50+
└─ Unknown
51+
```
3052

31-
ScientificTypes.jl has three components:
53+
## Defining a new convention
3254

33-
- An *interface*, for articulating a convention about the scientific
34-
interpretation of data. This consists of a definition of a scientific
35-
type hierarchy, and a single function `scitype` with scientific
36-
types as values. Someone implementing a convention must add methods
37-
to this function, while the general user just applies it to data, as
38-
in `scitype(4.5)` (returning `Continuous` in the *MLJ* convention).
55+
If you want to implement your own convention, you can consider the [MLJScientificTypes.jl](https://github.com/alan-turing-institute/MLJScientificTypes.jl) as a blueprint.
3956

40-
- A built-in convention, called *MLJ*, active by default.
57+
When defining a convention you may want to:
4158

42-
- Convenience methods for working with scientific types, the most commonly used being
59+
* declare a new convention,
60+
* declare new traits,
61+
* implement custom `schema`, `show` and `info` functions,
62+
* add explicit `scitype` and `Scitype` definitions,
63+
* define a `coerce` function.
4364

44-
- `schema(X)`, which gives an extended schema of any Tables.jl
45-
compatible table `X`, including the column scientific types
46-
implied by the active convention.
47-
48-
- `coerce(X, ...)`, which coerces the machine types of `X` to
49-
reflect a desired scientific type.
65+
We explain below how these steps may look like taking the MLJ convention as
66+
an example.
5067

51-
For example,
68+
### Declaring a new convention
69+
70+
In the module, define a
5271

5372
```julia
54-
using ScientificTypes, DataFrames
55-
X = DataFrame(
56-
a = randn(5),
57-
b = [-2.0, 1.0, 2.0, missing, 3.0],
58-
c = [1, 2, 3, 4, 5],
59-
d = [0, 1, 0, 1, 0],
60-
e = ['M', 'F', missing, 'M', 'F'],
61-
)
62-
sch = schema(X) # schema is overloaded in Scientifictypes
73+
struct MyConvention <: ScientificTypes.Convention end
6374
```
6475

65-
will print
76+
and add an init function with:
6677

78+
```julia
79+
function __init__()
80+
ScientificTypes.set_convention(MyConvention())
81+
end
6782
```
68-
_.table =
69-
┌─────────┬─────────────────────────┬────────────────────────────┐
70-
│ _.names │ _.types │ _.scitypes │
71-
├─────────┼─────────────────────────┼────────────────────────────┤
72-
│ a │ Float64 │ Continuous │
73-
│ b │ Union{Missing, Float64} │ Union{Missing, Continuous} │
74-
│ c │ Int64 │ Count │
75-
│ d │ Int64 │ Count │
76-
│ e │ Union{Missing, Char} │ Union{Missing, Unknown} │
77-
└─────────┴─────────────────────────┴────────────────────────────┘
78-
_.nrows = 5
83+
84+
Subsequently you will have functions dispatching over `::MyConvention` for
85+
instance in the MLJ case:
86+
87+
```julia
88+
ScientificTypes.scitype(::Integer, ::MLJ) = Count
7989
```
8090

81-
Here the default *MLJ* convention is being applied ((cf. [docs](https://alan-turing-institute.github.io/ScientificTypes.jl/dev/#The-MLJ-convention-1)). Detail is obtained in the obvious way; for example:
91+
### Declaring new traits
92+
93+
It's useful to mark containers that meet explicit traits; by default everything
94+
is marked as `:other`. In the MLJ convention, we specifically consider all
95+
containers that meet the [`Tables.jl`](https://github.com/JuliaData/Tables.jl)
96+
interface. In order to declare this you have to add a key to the
97+
`TRAIT_FUNCTION_GIVEN_NAME` dictionary with a boolean function that verifies
98+
the trait. This must also be placed in your `__init__` function.
99+
In the case of the MLJ convention:
82100

83101
```julia
84-
julia> sch.names
85-
(:a, :b, :c, :d, :e)
102+
function __init__()
103+
ScientificTypes.set_convention(MLJ())
104+
ScientificTypes.TRAIT_FUNCTION_GIVEN_NAME[:table] = Tables.istable
105+
end
86106
```
87107

88-
Now you could want to specify that `b` is actually a `Count`, and that `d` and `e` are `Multiclass`; this is done with the `coerce` function:
108+
### Adding scientific types
109+
110+
You may want to extend the type hierarchy defined above. In the case of the
111+
MLJ convention, we consider a *table* as a scientific type:
89112

90113
```julia
91-
Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
92-
schema(Xc)
114+
struct Table{K} <: Known end
93115
```
94116

95-
which prints
117+
where `K` is a union over the scientific type of each of the columns.
96118

119+
### Implementing custom `schema`, `show` and `info`
120+
121+
If you have added new traits, you *may* want to extend the `schema` function
122+
for objects with that trait. Subsequently you may also want to extend the
123+
`show` of such schemas and the `info` of such objects.
124+
125+
The `Schema` constructor takes 4 tuples:
126+
- the *names* of the features
127+
- their *machine type*
128+
- their *scientific type*
129+
- the *number of rows*
130+
131+
In the MLJ convention:
132+
133+
```julia
134+
function ScientificTypes.schema(X, ::Val{:table}; kw...)
135+
sch = Tables.schema(X)
136+
# ...
137+
return Schema(names, types, stypes, nrows)
138+
end
97139
```
98-
_.table =
99-
┌─────────┬──────────────────────────────────────────────┬───────────────────────────────┐
100-
│ _.names │ _.types │ _.scitypes │
101-
├─────────┼──────────────────────────────────────────────┼───────────────────────────────┤
102-
│ a │ Float64 │ Continuous │
103-
│ b │ Union{Missing, Int64} │ Union{Missing, Count} │
104-
│ c │ Int64 │ Count │
105-
│ d │ CategoricalValue{Int64,UInt8} │ Multiclass{2} │
106-
│ e │ Union{Missing, CategoricalValue{Char,UInt8}} │ Union{Missing, Multiclass{2}} │
107-
└─────────┴──────────────────────────────────────────────┴───────────────────────────────┘
108-
_.nrows = 5
109140

141+
Extending the `show` or `info` is then straightforward.
142+
143+
```julia
144+
ScientificTypes.info(X, ::Val{:table}) = schema(X)
145+
146+
function Base.show(io::IO, ::MIME"text/plain", s::ScientificTypes.Schema)
147+
# ...
148+
end
110149
```
150+
151+
### Adding explicit `scitype` and `Scitype` definitions
152+
153+
The `scitype` functions indicate default mappings from *machine type* to a
154+
*scientific type*. For instance in the MLJ convention:
155+
156+
```julia
157+
ScientificType.scitype(::Integer, ::MLJ) = Count
158+
```
159+
160+
where `::MLJ` refers to the convention.
161+
162+
The `Scitype` functions will typically match a few of your `scitype` functions
163+
to automatically obtain the scientific type of arrays of a type.
164+
For instance in the MLJ convention:
165+
166+
```julia
167+
ST.Scitype(::Type{<:Integer}, ::MLJ) = Count
168+
```
169+
170+
meaning that the scitype of an array such as `[1,2,3]` will directly be
171+
inferred as an array of `Count`.
172+
173+
### Defining a `coerce` function
174+
175+
It may be very useful to define a function allowing you to convert an object
176+
with one scitype to another scitype. In the MLJ convention, this is assumed by
177+
the `coerce` function.
178+
179+
For instance consider the simplified:
180+
181+
```julia
182+
function coerce(y::AbstractArray{T}, T2::Type{<:Union{Missing,Continuous}}
183+
) where T <: Union{Missing,Real}
184+
return float(y)
185+
end
186+
```
187+
188+
This maps an array of Real to an array of `AbstractFloat` (which are mapped to
189+
`Continuous` in the MLJ convention).
190+
191+
Further, if you work with specific containers, you may want to define a
192+
`coerce` function that works on the container by applying `coerce` on each
193+
of the features. In the MLJ convention, we work with tabular objects and
194+
define a `coerce` function which applies specific coercion on each of the
195+
columns.

docs/Project.toml

Lines changed: 0 additions & 10 deletions
This file was deleted.

docs/make.jl

Lines changed: 0 additions & 22 deletions
This file was deleted.

docs/src/assets/custom.css

Lines changed: 0 additions & 54 deletions
This file was deleted.

0 commit comments

Comments
 (0)