Apache #spark — Tips and Tricks for better #performance
▻https://hackernoon.com/apache-spark-tips-and-tricks-for-better-performance-cf2397cac11?source=r
Apache Spark — Tips and Tricks for better performanceApache Spark is quickly gaining steam both in the headlines and real-world adoption. Top use cases are Streaming Data, Machine Learning, Interactive Analysis and more. Many known companies uses it like Uber, Pinterest and more. So after working with Spark for more then 3 years in production, I’m happy to share my tips and tricks for better performance.Lets start :)1 - Avoid using Custom UDFs:UDF (user defined function) :Column-based functions that extend the vocabulary of Spark SQL’s DSL.Why we should avoid them?From the Spark Apache docs:“Use the higher-level standard Column-based functions withDataset operators whenever possible before reverting tousing your own custom UDF functions since UDFs are ablackbox for Spark and so it does not even (...)