Type safety and Spark Datasets in Scala

Hacker Noon CC BY-SA 2/01/2019

Type safety and #spark Datasets in #scala
▻https://hackernoon.com/type-safety-and-spark-datasets-in-scala-20fa582024fc?source=rss----3a814

Working with Spark Datasets have been quite interesting and most of the time rewarding in our current project. It has a simple yet powerful API that abstracts out the need to code in complex transformations and computations. To be honest, we also have a fairly straightforward use case: few domain entities, fewer transformations based on simple joins.However, there are also few things that have been counterproductive to us but I am going to focus on one of them: lack of type safety in some operations, particularly, joins.dataSetA.join(dataSetB, "columnA")The above code will fail on runtime if either of dataSetA and dataSetB (or both) don’t have “columnA” column. This is a waste of resources at multiple levels: from precious CPU cycles to developer’s time. In the remainder of this (...)

#dataset #generic-programming #shapeless

Hacker Noon CC BY-SA

Type safety and Spark Datasets in Scala – Hacker Noon

?source=rss----3a8144eabfe3---4