As a software engineer at Spotify working on data and machine learning infrastructure, Neville Li has been driving the adoption of Scala and new tools for data processing, including Beam, Scalding, Spark, Storm and Parquet. Prior to Spotify, he worked on search quality at Yahoo! and old school distributed systems like MPI.
We spoke with Neville in advance of Teaching Scala: A Roundtable Discussion at Scala Days in New York on June 20th, which he is joining as a panelist alongside the inventor of Scala language, Martin Odersky, Ryan Tanner (Twitter), Kelley Robinson (Twilio), Maciej Gorywoda (WIRE), Mark Lewis (Trinity University) and Heather Miller (Scala Center), and asked him about his Scala story.
Tell us about your background and your current role at Spotify.
I did research work on information retrieval way back in grad school and worked on big data at Yahoo Search before joining Spotify. At Spotify I worked on music recommendations, data and machine learning infrastructures. I picked up Scala a few years ago as we were transitioning from Python-based data processing to Scalding, a Scala library for data processing on Cascading and Hadoop. My current role involves working with Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and various other Scala libraries for machine learning and data processing.
What’s the biggest highlight of your career so far?
I created Scio while Spotify was moving to Google Cloud. It’s been the API of choice for data processing on Google Cloud at Spotify since it was announced 2 years ago. Today there are 300+ developers at Spotify writing data pipelines in Scio on a daily basis and dozens of external companies using it. It’s been a great learning experience and very rewarding journey so far.
Why did you pick Scala and what kind of problems does it solve for you?
We picked Scala mostly for data processing and machine learning since it combines the agility of Python, which a lot of people from a data science background are already familiar with, and the performance and ecosystem of JVM and Hadoop. The functional programming paradigm is a great fit for data processing. Type safety and functional composition gives us great confidence in building robust and correct data pipelines.
Whom would you like to connect with at the conference?
Anyone working with Scala and data, including those from Spark, Flink, Scalding, Beam communities and those working with type level Scala libraries like Shapeless, Cats and ScalaCheck.
If you could invite one person to Scala Days, who would that be and why?
Probably Oscar Boykin, the author of Scalding, Algebird and many other Twitter OSS libraries for data processing. I learned so much from these libraries early in my career working with Scala and would love to see his talks again.
Don’t miss Teaching Scala: A Roundtable Discussion at Scala Days in New York on June 20th. Book your ticket now.