Why Truly Open Communities are Vital to Open Source Technology
Every company in the world that uses software uses open source software.
It just happens, because open source is packaged in so many places and arrives without anyone deciding anything. The forms filed by publicly traded companies are processed by open source technology, and you can’t use Windows today without using open source.
In areas where there’s the latitude to make a decision of open versus proprietary technologies, many tech executives and engineers will choose open source, usually for supposed lower cost or lack of entanglement in the form of vendor lock-in.
But these are the wrong reasons. Whether the code is open or not isn’t the point. What’s important is the people behind the code, and whether you have the opportunity to participate in the ongoing development of the technology.
Open Source Has Come a Long Way
On a personal level, I’ve been involved in open source for as long as I can remember, most notably since 2007 as a member, committer, and board member for The Apache Software Foundation. But my career isn’t all in open source. I also spent a decade in academia (as a principal investigator at New Mexico State University), I’ve been chief scientist at a number of startups with successful exits, and I’m now a senior technology fellow and CTO at a major corporation, Hewlett Packard Enterprise. I’ve seen open source from both sides.
Trust me when I say that those of us involved in the dawn of the open source era never envisioned that it would become ubiquitous. Back in the 1970s, open source was people standing in the parking lot after the monthly or weekly meetings of the 6502 Interest Group, trading floppies full of code.
The arc of open source, from people just hanging out and trading stuff with one another to it becoming the underpinnings of pretty much every business in the world, is rather stunning. We didn’t plan for that to happen. We were just doing what we thought was cool.
I think that heritage is pertinent today, because open source is still about people and community and sharing and participation. But now everything is on a much bigger scale. And of course, floppies are no longer involved.
Not All Open Source is Created Equal
It’s actually not easy to get open source right, which is highlighted by the fact that there are so few examples of super successful open source projects. Two success stories that come to mind are The Linux Foundation, and Linux in particular, and The Apache Software Foundation and its 300 projects. Python, Go, Julia, and R are also worth mentioning for their innovations in gluing in contributors very directly, meaning that the languages themselves and their libraries directly reference the contributors.
Another example that looks promising is the Presto Foundation, which has a vibrant community and some really significant contributions happening. For reasons outlined below, however, it’s too soon to put Presto Foundation squarely in the success category.
What these thriving open source examples have in common is a commitment to openness at the people level, meaning governance by the community and an environment where everyone can benefit from—and help direct—the technology.
On the flip side are a bunch of open source failures. For instance:
- The Couch community split into Couchbase and CouchDB. One was an Apache project, the other a private project. As a result of the split, neither made it at the level they plausibly could have.
- Apache Tinker was written to a large degree by one guy who brought the project to Apache but couldn’t let it go. He wanted to dominate and own the project, which is completely antithetical to Apache community building. As a result Apache Tinker has largely become irrelevant, not much more than an esoteric footnote.
- Cloudera and Hortonworks tried to dominate the Hadoop project. They fought over it and wanted to own it, which meant that Hadoop was limited to the vision and limitations of those two companies and didn’t catch the next wave of technology. As a result Hadoop is becoming irrelevant very quickly, to the point that companies today try hard to avoid being classified as a Hadoop company. It’s a poisoned term now, and I attribute it largely to the fact that the two companies tried to own it and had public arguments about who was contributing more lines of code and so on.
Disputes about ownership of an open source project, and by extension who runs the community, can kill the community along with the project.
How can you tell the difference between a true open source project and one that uses open source technology but isn’t truly open? Get answers to the following questions:
- Is one single entity in charge of the project?
- Does that entity act as ultimate gatekeeper, applying only its roadmap to the development of the technology?
- Is everyone welcome to join the community, or is membership restricted?
- Can anyone contribute code, or do contributions have a transactional basis?
- Is anyone trying to own the community as opposed to building the community?
Let me be clear: I’m not making the argument that open = good and proprietary = bad. Rather, the important thing is to make sure that any open source technology you choose is run by a genuinely open community. Because even if you never contribute code to the project, knowing you can if you want to is important. Just as living in a country where you can vote in free and fair elections is preferable to the alternative—even if you don’t exercise your right to vote—aligning with open source communities where everyone has the ability to influence the direction of the technology is an inherently good thing.
An Interesting Case Still to be Determined
Unfolding right now is a real-world drama about an open source community dealing with a fork in its recent history. I’m talking about Presto and Trino, with the former being the PrestoDB from the Presto Foundation, operated by the Linux Foundation, and the latter being Presto open source technology operated by Starburst, a commercial company.
Full disclosure: I’m on the governing board of the Presto Foundation, so I’m not an impartial observer here. But I have no more inside track information on how this will unfold than anyone else in the world.
To give a brief synopsis of the situation, the PrestoDB SQL query engine was originally developed by Facebook in 2012. In 2018, the four main developers left Facebook and forked PrestoDB into PrestoSQL, now known as Trino. A year later, Facebook donated the original PrestoDB project to the Linux Foundation and established the Presto Foundation. That same year, the Trino creators joined Starburst. Confusing things further, the founding members of the Presto Foundation include Facebook, Uber, Twitter, and Alibaba.
Trino claims the same open source roots as Presto, but its governance is through Starburst, a private company. The Presto Foundation, in contrast, is a good example of community governance and participation.
Will the Trino fork cause the Presto technology to fizzle out, as happened with the Couchbase/CouchDB split? Will Trino’s strong governance of its fork of the technology lead to irrelevance in the future? And if so, will it take down the Presto open community with it?
Only time will tell. Whichever way the Presto/Trino situation ends up, it will probably serve as a lesson for future technologists—either about how it’s possible to survive a fork of open source technology into an open community and commercial ownership, or about the dangers of stifling a community.
Stay tuned, and let’s talk again in a few years.
About the author: Ted Dunning is a fellow in the Office of the CTO at Hewlett Packard Enterprise. He also is a PMC, member, and committer at the Apache Software Foundation, where he worked on the Apache Mahout, Apache ZooKeeper, and Apache Drill projects. Ted previously was the CTO at MapR Technologies, and has a PhD in computing science from the University of Sheffield.
Related Items:
Will the Presto Community Ever Be United Again?
Why Cheap Learning Is In Your Future
A Peek at the Future of the Open Data Architecture