Skip to content

Consider relaxing type requirement in df.cast(verify = true) operation to allow subtype relation #1549

@koperagen

Description

@koperagen

cast(verify = true) will fail if some column has more specific runtime type than expected, for example Int and not Number?, or even Int?:

@DataSchema
data class Input(val b: Number?)

fun process(df: DataFrame<Input>) {
  df.forEach { println("row ${index()}: $b") }
}

fun main() {
  dataFrameOf("b" to columnOf(123)).cast<Input>(verify = true)
}

My intention with this operation to process different dataframes that can be processed, fail fast anything that would otherwise fail with classcast exception. But here it's not the case. This code will work without verify = true or if i switch to convertTo.
I think it somewhat hurts writing reliable programs. It's not clear that cast might fail and what's the alternative. convertTo is more reliable, but it has its own caveat of filling missed columns with nulls

dataFrameOf("b" to columnOf(123)).cast<Input>(verify = false)
dataFrameOf("b" to columnOf(123)).convertTo<Input>()

We need to improve cast reliability to serve as accurate fail fast mechanism

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions