Skip to content

Commit d72e3c8

Browse files
authored
Merge pull request #177 from JuliaAI/dev
For a 3.0.0 release
2 parents 13e90c2 + 893c291 commit d72e3c8

23 files changed

+1282
-1002
lines changed

Project.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ ColorTypes = "0.9, 0.10, 0.11"
2020
Distributions = "0.25.1"
2121
PrettyTables = "1"
2222
Reexport = "1.2"
23-
ScientificTypesBase = "2.2"
24-
StatisticalTraits = "2"
25-
Tables = "1"
23+
ScientificTypesBase = "3.0"
24+
StatisticalTraits = "3.0"
25+
Tables = "1.6.1"
2626
julia = "1"
2727

2828
[extras]

README.md

Lines changed: 20 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,8 @@ The module `ScientificTypes` defined in this repo rexports the
4545
scientific types and associated methods defined in [ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl)
4646
and provides:
4747

48-
- a collection of `ScientificTypes.scitype` definitions that
49-
articulate a default convention, importing the module automatically
50-
activating the convention
48+
- a collection of `scitype` definitions that
49+
articulate a default convention.
5150

5251
- a `coerce` function, for changing machine types to reflect a specified
5352
scientific interpretation (scientific type)
@@ -75,17 +74,15 @@ sch = schema(X)
7574
will print
7675

7776
```
78-
_.table =
79-
┌─────────┬─────────────────────────┬────────────────────────────┐
80-
│ _.names │ _.types │ _.scitypes │
81-
├─────────┼─────────────────────────┼────────────────────────────┤
82-
│ a │ Float64 │ Continuous │
83-
│ b │ Union{Missing, Float64} │ Union{Missing, Continuous} │
84-
│ c │ Int64 │ Count │
85-
│ d │ Int64 │ Count │
86-
│ e │ Union{Missing, Char} │ Union{Missing, Unknown} │
87-
└─────────┴─────────────────────────┴────────────────────────────┘
88-
_.nrows = 5
77+
┌───────┬────────────────────────────┬─────────────────────────┐
78+
│ names │ scitypes │ types │
79+
├───────┼────────────────────────────┼─────────────────────────┤
80+
│ a │ Continuous │ Float64 │
81+
│ b │ Union{Missing, Continuous} │ Union{Missing, Float64} │
82+
│ c │ Count │ Int64 │
83+
│ d │ Count │ Int64 │
84+
│ e │ Union{Missing, Unknown} │ Union{Missing, Char} │
85+
└───────┴────────────────────────────┴─────────────────────────┘
8986
```
9087

9188
Detail is obtained in the obvious way; for example:
@@ -105,17 +102,15 @@ schema(Xc)
105102
which prints
106103

107104
```
108-
_.table =
109-
┌─────────┬──────────────────────────────────────────────┬───────────────────────────────┐
110-
│ _.names │ _.types │ _.scitypes │
111-
├─────────┼──────────────────────────────────────────────┼───────────────────────────────┤
112-
│ a │ Float64 │ Continuous │
113-
│ b │ Union{Missing, Int64} │ Union{Missing, Count} │
114-
│ c │ Int64 │ Count │
115-
│ d │ CategoricalValue{Int64,UInt32} │ Multiclass{2} │
116-
│ e │ Union{Missing, CategoricalValue{Char,UInt32}}│ Union{Missing, Multiclass{2}} │
117-
└─────────┴──────────────────────────────────────────────┴───────────────────────────────┘
118-
_.nrows = 5
105+
┌───────┬───────────────────────────────┬────────────────────────────────────────────────┐
106+
│ names │ scitypes │ types │
107+
├───────┼───────────────────────────────┼────────────────────────────────────────────────┤
108+
│ a │ Continuous │ Float64 │
109+
│ b │ Union{Missing, Count} │ Union{Missing, Int64} │
110+
│ c │ Count │ Int64 │
111+
│ d │ Multiclass{2} │ CategoricalValue{Int64, UInt32} │
112+
│ e │ Union{Missing, Multiclass{2}} │ Union{Missing, CategoricalValue{Char, UInt32}} │
113+
└───────┴───────────────────────────────┴────────────────────────────────────────────────┘
119114
120115
```
121116

docs/Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
77

88
[compat]
99
Documenter = "0.25"
10-
ScientificTypesBase = "2.1"
10+
ScientificTypesBase = "3"

docs/src/index.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -463,8 +463,6 @@ before passing to `coerce`.
463463

464464
```@docs
465465
ScientificTypes.scitype
466-
ScientificTypes.elscitype
467-
ScientificTypes.scitype_union
468466
coerce
469467
autotype
470468
```

src/ScientificTypes.jl

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,35 @@ module ScientificTypes
33
# Dependencies
44
using Reexport
55
@reexport using ScientificTypesBase
6+
export scitype, elscitype, scitype_union
67
using Tables
78
using CategoricalArrays
89
using ColorTypes
910
using PrettyTables
1011
using Dates
1112
import Distributions
1213

13-
import StatisticalTraits: info
14-
15-
# re-export from StatisticalTraits
16-
export info
17-
1814
# exports
1915
export coerce, coerce!, autotype, schema, levels, levels!
2016

21-
# -------------------------------------------------------------
17+
# --------------------------------------------------------------------------------------
2218
# Abbreviations
2319

24-
const ST = ScientificTypesBase
25-
const Arr = AbstractArray
20+
const ST = ScientificTypesBase
21+
const Arr = AbstractArray
2622
const CArr = CategoricalArray
27-
const Cat = CategoricalValue
23+
const Cat = CategoricalValue
24+
const COLS_SPECIALIZATION_THRESHOLD = 30
25+
const ROWS_SPECIALIZATION_THRESHOLD = 10000
26+
const SCHEMA_SPECIALIZATION_THRESHOLD = Tables.SCHEMA_SPECIALIZATION_THRESHOLD
2827

29-
# Indicate the convention, see init.jl where it is set.
28+
#---------------------------------------------------------------------------------------
29+
# Define convention
3030
struct DefaultConvention <: Convention end
31-
32-
include("init.jl")
31+
const CONV = DefaultConvention()
32+
# -------------------------------------------------------------
33+
# vtrait function, returns either `Val{:table}()` or `Val{:other}()`
34+
vtrait(X) = Val{ifelse(Tables.istable(X), :table, :other)}()
3335

3436
# -------------------------------------------------------------
3537
# Includes

src/autotype.jl

Lines changed: 52 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,13 @@ an array based on rules
1212
autotype, `only_changes` should be true.
1313
* `rules=(:few_to_finite,)`: the set of rules to apply.
1414
"""
15-
autotype(X; kw...) = _autotype(X, Val(ST.trait(X)); kw...)
15+
autotype(X; kw...) = _autotype(X, vtrait(X); kw...)
1616

1717
# For an array object (trait:other)
18-
function _autotype(X::Arr, ::Val{:other};
19-
rules::NTuple{N,Symbol} where N=(:few_to_finite,))
18+
function _autotype(
19+
X::Arr, ::Val{:other};
20+
rules::NTuple{N,Symbol} where N = (:few_to_finite,)
21+
)
2022
# check that the rules are recognised
2123
_check_rules(rules)
2224
# inspect the current element scitype
@@ -36,8 +38,11 @@ function _autotype(X::Arr, ::Val{:other};
3638
end
3739

3840
# For a table object (trait:table)
39-
function _autotype(X, ::Val{:table}; only_changes::Bool=true,
40-
rules::NTuple{N,Symbol} where N=(:few_to_finite,))
41+
function _autotype(
42+
X, ::Val{:table};
43+
only_changes::Bool=true,
44+
rules::NTuple{N,Symbol} where N = (:few_to_finite,)
45+
)
4146
# check that the rules are recognised
4247
_check_rules(rules)
4348
# recuperate the schema of `X`
@@ -59,7 +64,7 @@ function _autotype(X, ::Val{:table}; only_changes::Bool=true,
5964
# doesn't really matter, there are few rules and sugg is fast
6065
sugg_type = stype
6166
for rule in rules
62-
sugg_type = eval(:($rule($sugg_type, $col, $sch.nrows)))
67+
sugg_type = eval(:($rule($sugg_type, $col, $(nrows(X)))))
6368
end
6469
# store the suggested type
6570
suggested_types[name] = sugg_type
@@ -76,16 +81,49 @@ function _autotype(X, ::Val{:table}; only_changes::Bool=true,
7681
end
7782

7883
# convenience functions to pass a single rule at the time
79-
autotype(X, rule::Symbol; args...) =
80-
autotype(X; rules=(rule,), args...)
84+
autotype(X, rule::Symbol; args...) = autotype(X; rules=(rule,), args...)
8185
# convenience function to splat rules
82-
autotype(X, rules::NTuple{N,Symbol} where N; args...) =
83-
autotype(X; rules=rules, args...)
86+
autotype(X, rules::NTuple{N,Symbol} where N; args...) = autotype(X; rules=rules, args...)
87+
88+
"""
89+
nrows(X)
90+
91+
Helper method to return the number of rows a table `X` has.
92+
93+
**Note**
94+
A more general version of this method is defined in `MLJModelInterface.jl`.
95+
This method is needed here in order for `auto_type` method to run.
96+
"""
97+
function nrows(X)
98+
if !Tables.istable(X)
99+
throw(ArgumentError("input argument must be a Tables.jl compatible table"))
100+
end
101+
if Tables.rowaccess(X)
102+
rows = Tables.rows(X)
103+
return _nrows_rat(Base.IteratorSize(typeof(rows)), rows)
104+
105+
else
106+
cols = Tables.columns(X)
107+
return _nrows_cat(cols)
108+
end
109+
end
110+
111+
# number of rows for columnaccessed table
112+
function _nrows_cat(cols)
113+
names = Tables.columnnames(cols)
114+
!isempty(names) || return 0
115+
return length(Tables.getcolumn(cols, names[1]))
116+
end
117+
118+
# number of rows for rowaccessed table
119+
_nrows_rat(::Base.HasShape, rows) = size(rows, 1)
120+
_nrows_rat(::Base.HasLength, rows) = length(rows)
121+
_nrows_rat(iter_size, rows) = length(collect(rows))
84122

85123
# -----------------------------------------------------------------
86124
# rules
87125

88-
function _check_rules(rules::NTuple{N,Symbol} where N)
126+
function _check_rules(rules::NTuple{N, Symbol} where N)
89127
for rule in rules
90128
rule in (:few_to_finite,
91129
:discrete_to_continuous,
@@ -161,15 +199,14 @@ end
161199
Helper function to suggest a finite type corresponding to `T` when there are
162200
few unique values.
163201
"""
164-
function sugg_finite(::Type{<:Union{Missing,T}}) where T
165-
T <: Real && return OrderedFactor
166-
return Multiclass
202+
function sugg_finite(type::Type)
203+
return ifelse(nonmissing(type) <: Real, OrderedFactor, Multiclass)
167204
end
168205

169206
"""
170207
T_or_Union_Missing_T(type, T)
171208
172-
Helper function to return either `T` or `Union{Missing,T}`.
209+
Helper function to return either `T` or `Union{Missing, T}`.
173210
"""
174211
function T_or_Union_Missing_T(type::Type, T::Type)
175212
return ifelse(type >: Missing, Union{Missing, T}, T)

src/coerce.jl

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,35 @@
1-
const ColKey = Union{Symbol,AbstractString}
1+
const ColKey = Union{Symbol, AbstractString}
22

33
"""
4-
coerce(A, specs...; tight=false, verbosity=1)
4+
coerce(A, S)
55
6-
Given a table `A`, return a copy of `A`, ensuring that the element
6+
Return new version of the array `A` whose scientific element type is `S`.
7+
8+
```
9+
julia> v = coerce([3, 7, 5], Continuous)
10+
3-element Vector{Float64}:
11+
3.0
12+
7.0
13+
5.0
14+
15+
julia> scitype(v)
16+
AbstractVector{Continuous}
17+
18+
```
19+
coerce(X, specs...; tight=false, verbosity=1)
20+
21+
Given a table `X`, return a copy of `X`, ensuring that the element
722
scitypes of the columns match the new specification, `specs`. There
8-
are three valid specifiations:
23+
are three valid specifications:
924
1025
(i) one or more `column_name=>Scitype` pairs:
1126
12-
coerce(X, col1=>Sciyype1, col2=>Scitype2, ... ; verbosity=1)
27+
coerce(X, col1=>Scitype1, col2=>Scitype2, ... ; verbosity=1)
1328
1429
(ii) one or more `OldScitype=>NewScitype` pairs (`OldScitype` covering
1530
both the `OldScitype` and `Union{Missing,OldScitype}` cases):
1631
17-
coerce(X, OldScitype1=>NewSciyype1, OldScitype2=>NewScitype2, ... ; verbosity=1)
32+
coerce(X, OldScitype1=>NewScitype1, OldScitype2=>NewScitype2, ... ; verbosity=1)
1833
1934
(iii) a dictionary of scientific types keyed on column names:
2035
@@ -24,7 +39,7 @@ where `ColKey = Union{Symbol,AbstractString}`.
2439
2540
### Examples
2641
27-
Specifiying `column_name=>Scitype` pairs:
42+
Specifying `column_name=>Scitype` pairs:
2843
2944
```
3045
using CategoricalArrays, DataFrames, Tables
@@ -45,7 +60,7 @@ Xc = coerce(X, Count=>Continuous)
4560
schema(Xfixed).scitypes # (Continuous, Continuous, Continuous)
4661
```
4762
"""
48-
coerce(X, a...; kw...) = coerce(Val(ST.trait(X)), X, a...; kw...)
63+
coerce(X, a...; kw...) = coerce(vtrait(X), X, a...; kw...)
4964

5065
# Non tabular data is not supported
5166
coerce(::Val{:other}, X, a...; kw...) =
@@ -156,14 +171,14 @@ end
156171
"""
157172
coerce!(X, ...)
158173
159-
Same as [`ScientificTypes.coerce`](@ref) except it does the modification in
160-
place provided `X` supports in-place modification (at the moment, only the
161-
DataFrame! does). An error is thrown otherwise. The arguments are the same as
162-
`coerce`.
174+
Same as [`ScientificTypes.coerce`](@ref) except it does the
175+
modification in place provided `X` supports in-place modification (eg,
176+
DataFrames). An error is thrown otherwise. The arguments are the same
177+
as `coerce`.
163178
164179
"""
165180
coerce!(X, a...; kw...) = begin
166-
coerce!(Val(ST.trait(X)), X, a...; kw...)
181+
coerce!(vtrait(X), X, a...; kw...)
167182
end
168183

169184
coerce!(::Val{:other}, X, a...; kw...) =

src/convention/coerce.jl

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,35 @@ F = Real
223223
_2d{C} =AbstractArray{C} where C<:Union{ColorTypes.AbstractRGB, ColorTypes.Gray}
224224

225225
# Single Image
226+
227+
"""
228+
coerce(image::AbstractArray{<:Real, N}, I)
229+
230+
Given a an array called `image` representing one or more images,
231+
return a transformed version of the data so as to enforce an
232+
appropriate scientific interpretation `I`:
233+
234+
single or collection ? | N | I | `scitype` of result
235+
-----------------------|---|--------------------|----------------------------
236+
single | 2 | `GrayImage` | `GrayImage{W,H}`
237+
single | 3 | `ColorImage` | `ColorImage{W,H}`
238+
collection | 3 | `GrayImage` | `AbstractVector{<:GrayImage}`
239+
collection | 4 (W x H x {1} x C)| `GrayImage` | `AbstractVector{<:GrayImage}`
240+
collection | 4 | `ColorImage` | `AbstractVector{<:ColorImage}`
241+
242+
```
243+
imgs = rand(10, 10, 3, 5)
244+
v = coerce(imgs, ColorImage)
245+
246+
julia> typeof(v)
247+
Vector{Matrix{ColorTypes.RGB{Float64}}}
248+
249+
julia> scitype(v)
250+
AbstractVector{ColorImage{10, 10}}
251+
252+
```
253+
254+
"""
226255
function coerce(y::Arr{<:F, 2}, T2::Type{GrayImage})
227256
return ColorTypes.Gray.(y)
228257
end
@@ -249,4 +278,4 @@ end
249278

250279
function coerce(y::_4Dcollection, T2::Type{ColorImage})
251280
return [broadcast(ColorTypes.RGB, y[:,:,1, idx], y[:,:,2,idx], y[:,:,3, idx]) for idx=1:size(y,4)]
252-
end
281+
end

0 commit comments

Comments
 (0)