-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Before I start explaining: I'm willing to work on the PR if you're interested, but I thought it better to discuss it with you first :-)
So, we're using flink-scala-api for type-information (I work with @arnaud-daroussin). One thing we've noted is that if we used it "as intended" (by just importing org.apache.flinkx.api.serializers._
everywhere), it leads to very high compilation times. With the old Flink API, the full clean-compile took around 160 seconds, and with flink-scala-api it moved up to 200 seconds. However, we managed to cut quite a lot of it by using semi-auto derivation instead of full-auto derivation: we've reduced the time down to 140 seconds, even less than before the migration.
I'm not sure how familiar you are with semi-auto vs full-auto derivation? The idea is that instead of importing the macro everywhere, we declare implicit TypeInformation
vals in the companion objects of all classes, and they're automatically found (hence semi-auto: they're declared manually, but found automatically). In addition to faster compile times, semi-auto also had the advantage of letting us create custom TypeInformations for certain class where the macro would have worked, but wouldn't have been as optimized for runtime performance. => You trade convenience for control.
So for example, instead of:
import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flinkx.api.serializers._
final case class Alert(message: String)
final case class Notification(alerts: List[Alert])
object Job {
val info = implicitly[TypeInformation[Notification]]
}
We have:
import org.apache.flink.api.common.typeinfo.TypeInformation
// Don't import deriveTypeInformation
import org.apache.flinkx.api.serializers.{deriveTypeInformation => _, _}
final case class Alert(message: String)
object Alert {
implicit val alertInfo: TypeInformation[Alert] = org.apache.flinkx.api.serializers.deriveTypeInformation
}
final case class Notification(alerts: List[Alert])
object Notification {
implicit val notificationInfo: TypeInformation[Notification] = // some custom stuff
}
object Job {
val info = implicitly[TypeInformation[Notification]]
}
The issue is that flink-scala-api doesn't really support semi-auto derivation natively.
So, we had to jump through some hoops. As you can see, we have to be careful to never import deriveTypeInformation
, because it would have a higher priority as an implicit (being already in the scope) than the one on the entity's companion object. That's very error-prone: it's easy to miss (we did it a few times), because if you do everything seems to work "mostly" fine. So instead, we just created our own class that copied everything from org.apache.flinkx.api.serializers
except deriveTypeInformation
.
Another issue is that it doesn't notice when a type-information is missing, because deriveTypeInformation
ends up calling itself if necessary. So for example, this shouldn't compile in semi-auto, but it does:
import org.apache.flink.api.common.typeinfo.TypeInformation
// Don't import deriveTypeInformation
import org.apache.flinkx.api.serializers.{deriveTypeInformation => _, _}
final case class Alert(message: String)
object Alert {
// No TypeInformation declared
}
final case class Notification(alerts: List[Alert])
object Notification {
// note that deriveTypeInformation is not in the implicit context, we call it by its full name
// so it shouldn't find a way to get a TypeInformation[Alert]
implicit val notificationInfo: TypeInformation[Notification] = org.apache.flinkx.api.serializers.deriveTypeInformation
}
object Job {
val info = implicitly[TypeInformation[Notification]]
}
OK, that was a wall of text, sorry 😅
So: what do you think about supporting both auto and semi-auto derivation?
That's something projects like Circe are already doing. The idea would be to have two separate packages for the derivation of serializers and type-informations, called auto
and semiauto
. The generic type-informations (for stuff like Option
, List
, etc.) would be in a parent trait, inherited both by auto and semi-auto, and the macro would be the only thing being different between the two. Note that on the semi-auto derivation, the cache is not necessary, because the declared type-information vals are doing the job.