A transgender, Canadian, open-source developer advocate at Google with a focus on Apache Spark, BEAM, and related “big data” tools, Holden is the co-author of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a commiter on and PMC on Apache Spark and committer on SystemML & Mahout projects. In her own words, she was “tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal”.
In advance of her talk, Keeping the “fun” in Apache Spark: Datasets and FP at Scala Days in New York in June, we spoke to Holden about her dive into the world of big data, the biggest challenge of her career and how her assembly code tattoo is connected with her first introduction to functional programming.
What’s your background and what does your current role involve?
I went to school at the University of Waterloo in Canada, which later led to me getting an assembly code tattoo on my back, and was also my first introduction to functional programming. I had a fascination for search and recommendation problems, so I went through a series of jobs at Amazon, Foursquare—where I was introduced to Scala—and Google. Since then I’ve worked primarily on big data at Databricks, Alpine, and IBM because, as it turns out, most search and recommendation problems really end up being bottlenecked on the data). I’ve recently returned to Google as a developer advocate for OSS Big Data tools, including Spark, as well as other fun tools.
What’s the biggest highlight of your career so far?
It’s a toss-up between finishing co-writing Learning Spark and becoming an Apache Spark committer. These are both things, which nearing the end felt impossible, and if I’ve had a Bud Light Lime (or too many coffees) I’ll talk about.
Why did you pick Apache Spark and other big data tools? What kind of problems does it solve for you?
I love functional programming, and Spark seemed like a great way to not only work in a language I loved (Scala), but also teach folks important functional programming concepts and solve the very real problem of training my models and building my indexes.
What’s the biggest challenge Spark developers are facing today and what’s one thing that could address this challenge?
I think the big challenge we’re facing is integrating non-JVM code, and it’s getting a lot better with Apache Arrow’s integration. I’m looking forward to this getting even better.
Who should attend your talk at Scala Days and why?
I think primarily folks who are interested in continuing to keep up with the advances in Apache Spark but want to keep the awesome functional programming tools we are used to. Secondly, I think anyone who’s interested in what the pipeline looks like for new Scala developers. Spark has driven a lot of Scala adoption, and with the shifting APIs, the fundamentals folks are going to learn will be a bit different.
Whom would you like to connect with at the conference?
So since, just like all humans, I have an ego—I love hearing from people whom the software/books/videos I work on have helped. Those are the things which I hold with me on the difficult days when nothing seems to work I want to just go take a bath.
I’d love to talk to people who have parallel data problems that aren’t being served well in the Spark of today so we can figure out how to change Spark or develop other tools (like Beam or improve Arrow) to meet their needs.
I also was recently at conference with a Queer bird-of-a-feather session, and I really appreciated getting to know all those folks and I’d love to know more queer & trans folks in the Scala community, so also please feel free to come say hi if that’s you.
If you could invite one person to Scala Days, who would that be and why?
Maybe this is silly, but I’d invite the person who introduced me to functional programming, Professor Prabhakar Ragde. Getting through university was difficult for me, and he did a really great job of encouraging me and making me feel supported with some challenges that arose in part from my learning disability. He also taught a class that led up to another class where one of the books I co-wrote was used and I think that’s incredibly neat.
Don’t miss Holden’s talk ‘Keeping the “fun” in Apache Spark: Datasets and FP’ at Scala Days in New York on June 20th. Book your spot now.