Anaconda’s 2021 State of Data Science Report Highlights Support for Open Source, Impacts from COVID-19
Data science platform provider Anaconda has released its 2021 State of Data Science report. From April to May 2021, the researchers surveyed 4,299 respondents from over 140 countries, diving into the changes, trends, and areas for growth in the data science industry. Responses highlighted Python’s continued popularity, bolstered the case for open-source software, and showed the damage done by the pandemic.
A difficult year – for most
Of course, much has changed since that same time last year, when the researchers were collecting data for the 2020 report. “Given that 2020 and 2021 were affected by the COVID-19 pandemic,” they wrote in the 2021 report, “we took this opportunity to ask questions around how the pandemic impacted work and how organizations invested in the field.” And, indeed, a plurality – 37% – reported that their organization’s investment in data science had decreased due to the pandemic (interestingly, 26% reported that it led to an increase in data science investment). Of those who said investment decreased, 39% reported team layoffs and 45% reported diminished budgets.
The state of data science
Python continued to dominate the languages used for data science, with a whopping 85% of respondents reporting that they used Python at least “sometimes” (34% “always”). SQL landed in second with 62% reporting they used SQL at least “sometimes” (15% “always”), while the least-used language listed, Go, was used at least “sometimes” by just 25% of respondents. In terms of system acquisition for data science, 60% reported that CPU/GPU performance was the most important consideration, followed by memory (46%) and the approval of the IT department (45%). (Peripheral support (9%) and style and design (16%) brought up the rear.)
Data scientists reported positive sentiments toward AutoML in the field, with 55% saying they hoped to see more automation and AutoML in data science and just 4% reporting concern about automation and AutoML. Open-source data science software is also experiencing success, with 87% affirming that their organization allowed the use of open-source software, 65% reporting that their employer encouraged the team to contribute to open-source projects, and 54% reporting that they had funding specific to open-source project development.
Data science in the workplace
Respondents reported spending a narrow plurality of their time (22%) on data preparation, followed by data cleansing (17%) and reporting and presentation (17%). Model training (12%) and deployment (11%), meanwhile, took up the smallest portions of time on average. In terms of roadblocks to deployment, 27% of respondents cited “meeting IT security standards” and 24% cited “re-coding models from Python/R to another language.”
Data scientists reported strong confidence in the applicability of their work: 53% said that “many” or “all” business decisions were based on the insights interpreted by their team, with just 12% unsure how their insights were being used. On the other hand, just 36% reported that their organization’s decision-makers were “very data literate,” which many of the respondents cited as an impediment to impacting business decisions.
Beyond 2021
Looking into the future, 31% of respondents cited social impacts from bias in data and models as the biggest problem to tackle in AI and machine learning, followed by impacts to individual privacy (21%), job loss from automation (19%), information warfare (15%), and lack of diversity and inclusion (10%). 40% of respondents indicated that their organization had an active plan for fairness and bias mitigation, with a similar percent reporting that there was an active plan for ensuring model explainability and interpretability.
Of course, while data scientists face challenges from within, the profession also faces a wide range of misconceptions. Asked their views on the biggest myth about data science, 33% answered with the idea that data scientists would soon be replaced by AI; 31%, that having access to more data translated to greater accuracy; 19%, that data scientists don’t know how to code; and 15%, that lots of data was necessary for data science.
Anaconda distilled the insights from the report into four key lessons:
- Python will continue to dominate the data science field and beyond.
- Enterprises are ready to contribute to open-source innovation.
- Sentiment toward automation will continue to grow.
- Preventing bias and developing ethical data science is critical.
“Over the past year, we’ve seen the power of data science for good and the innovation that can happen when the open-source data science community is supported,” said Peter Wang, CEO and co-founder of Anaconda. “This year’s State of Data Science report indicates the field is continuing to show impact, as some organizations increase their investments, many leaders display baseline data literacy, and about half of the respondents see their work involved in many business decisions. There are still clear areas for growth, especially in implementing ethical frameworks and education for data science and machine learning work. I’m excited by the progress the industry continues to make, especially as new generations enter the field, and look forward to its ongoing transformation.”
To read the full 2021 State of Data Science report from Anaconda, click here. Demographically, of the respondents – pulled from social media, email blasts, and Anaconda’s website – 72% were male; 50% were between the ages of 25 and 40; 68% held at least a bachelor’s degree; and 27% were students. 81% reported working for a team, with 44% working for a team of six to ten people.
Related Items
Data Prep Still Dominates Data Scientists’ Time, Survey Finds
Why Data Scientists and ML Engineers Shouldn’t Worry About the Rise of AutoML
Governance, Privacy, and Ethics at the Forefront of Data in 2021