Meet Maxime Beauchemin, a 2023 Person to Watch
When it comes to prolific contributors to open source projects in the big data space, Maxime Beauchemin is definitely somebody you should know. As a data engineer at Airbnb, Beauchemin created multiple tools that he subsequently released to the world, including Apache Airflow, the popular data pipeline creation and management tool, and Apache Superset, which provides BI and analytics capabilities. He is also the founder and CEO of Preset, the commercial entity behind Superset.
We recently caught up with Beauchemin, who we named a Person to Watch for 2023.
Datanami: You’ve created two successful open source projects, Apache Superset and Apache Airflow. What do you attribute the success to? What made them successful?
Maxime Beauchemin: Most people are familiar with the idea of “product market fit” (PMF), a term coined by Marc Andreessen more than 15 years ago, and I like to think of a proxy for it in open source that I’d call “project community fit” (PCF). So it’s not just about the quality of the project, or how much you invest into it, it’s about building the right thing at the right time for the right people, and riding the momentum. I think reading about PMF and doing the mind exercise to translate the ideas to an open source project is fairly straightforward and informs finding PCF fairly well. The dynamics aren’t identical but they’re similar. If anything open source has better network effects (because it’s free by definition, and welcomes contributions) and snowballs better than a product in a market.
In any case, the ideas behind PMF were foreign to me back when I started both projects at Airbnb back in 2014/2016, and just wanted to build something that was going to be useful at Airbnb, and put it out there just in case someone outside of Airbnb may be interested to pick it up and collaborate or even just use it. My thinking was “if I’m building something for Airbnb that’s not a competitive advantage, why limit my impact to Airbnb?” Looking back, I think what worked for me was to build with passion, and to engage as directly as possible with anyone showing any kind of interest, whether it’d be on GitHub, email, Slack, or looking for conversation. For a long time, I honored and handled every single touch point. I also went beyond just writing software and did a lot of things that I’d now call “product marketing,” finding good names for the project, did some decent messaging/positioning, built half decent websites with nice screenshots, maintained decent docs, …
Both projects hit a point where I couldn’t keep up. From that point on, the projects have a life of their own. That’s OSS “escape velocity.” Feels great to reach this point!
Datanami: Do you think data engineering gets the respect it deserves? Why does it seem perpetually overlooked in the data space?
Beauchemin: The world isn’t always a fair place, but I think generally things (people, ideas, concepts, projects) tend to get the respect they deserve over time. In many ways historically data engineering, (maybe thinking about the pre-pipeline as code era, call it drag-and-drop ETL days) didn’t show a lot of self-respect either, especially when measured from the perspective of software engineering.
Arguably data engineering didn’t come into being until mid-2010s, tried to catch up/integrate software engineering practices, and while doing so missed out on the devops movement, only to try to catch up on some of that over the past five years or so through the lagging data ops movement. I think the gap in respect is reasonable when measured against software engineering practices, but is that fair!? We don’t measure other functions by SWE practices standard.
In the end, respect should be based on business impact, not solely around code/PDLC rigor and maturity. On the impact front, there are some real problems too. I talk about it in an article title “the downfall of the data engineer,” and some of these problems are preventing data engineering from delivering more impact and get respect from the organization as a whole.
Datanami: Is it getting easier or harder to be a data engineer in 2023?
Beauchemin: Clearly easier, the role is better defined, the stack/tooling has evolved, best practices increasingly well defined, and expectations around the role are more clear than ever before. Oh and the modern data stack is amazing, you can get started in minutes, get a world-class-scale-to-infinity cloud data warehouse setup in minute, set up Apache Superset instantly on top of it using Preset, do data integration with Airbyte or Fivetran without a hitch, set up Airflow through Astronomer, DBT Cloud. All this infrastructure is at your fingertips, pay-as-you-go and frankly amazing! The pool of articles and resources around best practices is only increasing too, communities exist now, … So much easier than it used to be.
Datanami: Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
Beauchemin: I’m a huge snowboarder. Grew up riding 50 days a year in the Quebec city scene in the 90s, and recently moved to Tahoe to be able to get back into riding regularly. Before the move, going to ride from the Bay Area while having three young kids was very difficult, so I didn’t ride much for the past decade. But now I’m back on the mountain! Oh and the kids are getting good now, so we often ride together!
You can read the rest of the interviews with the 2023 class of Big Data Wire’s People to Watch here.