Data Science Ed: 5 Tips for Undergrads
Demand for data scientists continues to rise as organizations seek new ways to monetize their data. Those who want to be considered for top data science jobs need a solid education behind them, including postgraduate degrees in many cases. In the meantime, what sort of classes should undergraduates take to prepare them for data science careers? We talk to a pair of Silicon Valley analytic experts to find out.
Data scientist is currently the number one job in the country, according to Glassdoor’s latest rankings. With a base salary of $110,000 and nearly 4,200 openings, it’s clear that demand for the job is currently outpacing supply. In fact, four of the five top jobs in country are related to data science, according to Glassdoor, which also found high demand for DevOps engineers, data engineers, and analytics managers. (Tax manager rounds out the top five.)
Universities around the country are responding to the ongoing surge in demand by creating formalized degree programs for data science, including master’s and PhD.-level programs in many schools. It’s probably just a matter of time before colleges and universities begin creating entire departments devoted to data science.
For many students, getting the right educational foundation for a career in data science will require a bit of foresight and planning. The good news is that most colleges offer the undergraduate courses that are key to preparing for data science jobs.
Romer Rosales, the director of AI at LinkedIn, recently shared several tips on how undergrads can prepare themselves for data science careers.
1. Math and Comp Sci
Having a foundation in mathematics and computer science is the best way to prepare oneself for a data science career, Rosales says. “Math and computer science are critical,” he tells Datanami. “If you want to become an expert, those two areas are quite important.”
With a mastery of math and distributed systems, a prospective data scientist will not have the basic skills required to truly understand the inner workings of machine learning. “A lot of the development and analysis of machine learning algorithms rely on things as core as linear algebra, for example, or basic calculus,” he says.
Knowing how distributed systems work will help data scientists build predictive applications that can crunch massive amounts of data.
“A lot of machine learning has to do with processing large amounts of data,” Rosales says. “The ability to solve larger and larger problems efficiently depends a lot on the ability to distribute the computations efficiency across different machines. A distributed systems background will help you enormously.”
2. Stats and Information Theory
Statistics and probability are also important to data science success, Rosales says. “So much of machine learning has to do with probability,” he says. “The language in which many of the problems are accommodated are given in terms of probability.”
Another valuable course that undergraduates can take to bolster their data science careers is information theory, Rosales says. This class, which is usually offered through a university’s computer science department, can set the stage for a better understanding of how machine learning works, including its limitations, he says.
Being able to measure the uncertainty level of a variable, and calculate how that can impact other variables, is critical for developing an expertise in machine learning, he says. For example, it allows students to ask questions, like “Can I classify this particular entity into black or white, or A or B, given the information that I have about this entity? Can tell whether this thing that I’m looking at is a person or is not a person?” Rosales says.
In addition to helping students think about machine learning at a higher level of abstraction, information theory also helps data scientists determine what cannot be done with machine learning, Rosales says.
“Many of the limits of what is possible in machine learning can be tied to some form of information theory,” Rosales says. “I wouldn’t say that it’s a requirement for applying machine learning. But if you really want to understand machine learning more deeply, I think information theory is… extremely useful.”
3. Optimization
Having a firm understanding in optimization is also very important for prospective data scientists because it helps them understand what they can do with machine learning algorithms, Rosales says.
“So many machine learning tasks are related to optimization,” he says. “Optimization is also helpful to understand how certain problems are easier to solve than others from a computational point of view. There is a large theory in the optimization field that tells you very formally that this problem will require an exponential amount of time to solve optimally versus this other problem that can be solved quadratically or in polynomial time.”
You can’t go to a big data conference these days without running into a class on optimization, which just shows how important it has become for large-scale, real-world machine learning applications. “A grounding in optimization will help you not only to formulate problems but also to help you understand what are the limits in terms of scalability, of how your problem can be solved,” he says.
4. Soft Skills Matter
While it’s critical to have the “hard skills” like math and computer science, one shouldn’t overlook the roles that “soft skills” can play in data s
ience success, says Ashish Thusoo, CEO and co-founder of Qubole.
“To be successful, data scientists need a mix of soft social skills and hard technical skills,” Thusoo tells Datanami. “For data science students, it’s not only crucial to understand the technology, but it’s equally as valuable to learn how to function in teams, collaborate, and teach.”
Having soft skills like curiosity, creativity, problem solving, communication, and collaboration in your quiver of personal tools will help students with some of the intangible aspects of being a successful data scientist. “Great data scientists will iterate quickly, looking at problems from a variety of angles to find the best approach to creating insights and answering questions,” Thusoo says.
5. Lifetime of Learning
Thusoo says data science students should practice ABLE, which stands for Always Be Learning and Educating. “Data science is still a nascent field so practitioners should keep their knowledge and skill sets fresh, and help educate those around them,” he says.
Even if you’re not planning on becoming a data scientist who builds machine learning algorithms from scratch for a living, having a solid foundation in analytics will be very helpful for your career, Thusoo says.
“The future of data science education is rapidly evolving,” he says. “Long gone are the days of data science being a specialized track for only a small portion of students. In fact, everyone entering the workforce should have some type of data analytics skills. Universities need to make data analytics skills a mandatory graduating requirement, just like programming has become part of the core curriculum.”
Related Items:
Continuing Your Data Science Education
Taking the Data Scientist Out of Data Science
Finding the Right Path for Your Data Science Education