Data science is a field that requires subject matter expertise (e.g., biology if you plan to do bioinformatics), programming skills, and training in mathematics and statistics. Data science as a service allows companies to get business insights leveraging advanced analytics technologies, including deep learning, without investing in in-house data science competencies. Data scientists help a company process a huge pool of information from a variety of sources. An expert data science team can help you quickly embrace data science for meeting particular advanced analytics objectives.
Glassdoor ranks data science as the #2 job in America for 2021. There are many professions in data science with similar names: for example, ML developers and ML engineers. In this article, we will talk about different roles in data science, how professions in this field differ and what is expected from candidates for different vacancies.
The main task of a data scientist is to improve the quality of machine learning models. In general, his or her work can be divided into two blocks. The first one is to work with the finished model in the project. It is necessary to continuously assess its quality and find what can be improved. Online and offline metrics, as well as feedback from testers, help in this. The second is the research part itself: finding new architectures and signals for prediction.
Here is what a data scientist needs to know:
Python to develop models.
C++ to put code into production.
Deep learning frameworks (TensorFlow, PyTorch, Caffe, or others).
Data structures and algorithms.
A lot of time data scientist is collecting, cleaning, and analyzing data for useful insights. After preparing the data, they spend the rest of the time training new models e.g., preparing data on a cluster and writing infrastructure for effective training. It’s also part of the job to duplicate the model: you have to write the model and check that it behaves as expected on real data, and then optimize its performance. An interesting fact is that the job of a data scientist also involves skills from the roles of ML engineer, data engineer, and data scientist.
The responsibilities of an ML engineer are very similar to a data scientist. But in contrast, there is no need to prepare publications in scientific journals and regularly develop new technologies. Much more important than for a data scientist is the ability to write effective and readable code that colleagues can then make sense of.
Here is what an ML engineer needs to know:
Python and C++ to develop models and train algorithms.
Probability theory, statistics, and discrete mathematics.
Deep Learning frameworks (TensorFlow, PyTorch, Caffe, or others).
It is also useful for ML engineers to have collaborative development tools. They should be able to not only train high-quality models but should also be able to create services based on them that can withstand a high load. This may require mastery of both lower-level programming languages and techniques for optimizing machine learning models.
Data engineers are in charge of preparing data for subsequent analysis. Their job is to first gather data from social networks, websites, blogs, and other external and internal sources, and then bring it into a structured form that can be sent to the data analyst.
Imagine you need to make an apple pie. First, you need to find the flour, apple, eggs, milk, and other ingredients from the recipe. That’s what the data engineer does, just looking for and bringing in the right data. And the data analyst will make the pie himself, or rather look for patterns among the found data.
Here is what a data engineer needs to know:
How to design storage, set up data collection, and data pipelines.
How to build ETL processes.
C++, Python, or Java.
SQL for working with databases.
In addition, engineers create and maintain the storage infrastructure. They are also responsible for the ETL system – extracting, transforming, and loading data into one repository. It is safe to say that they are responsible for buying and storing the ingredients for the pie. Thus, the data analyst can pick them up any time to make a dish and be sure that everything is in place and nothing has gone bad.
Data analysts help a company improve metrics and solve intermediate goals rather than blindly move toward big goals (doubling revenue in a year). More often than not, they work closely with salespeople.
The task of the data analyst is to process a large amount of data and find patterns in it. For example, they may find out that most often toothbrushes are bought by married men from 30 to 40 years old. Data analysts help companies better understand their customers and, which in-turn brings in more sales.
Here is what a data analyst needs to know:
Python to process data.
Mathematical statistics to choose the right methods to process the data.
SQL dialects like ClickHouse.
DataLens, Tableau, PowerBI, and other dashboard tools.
Big data tools such as Hadoop, Hive, or Spark.
In their work, data analysts use the knowledge of mathematical statistics, which allows them to find patterns and assist with predicting the behavior of users. Data analysts also conduct tests, check how users react to the new interface, and help optimize business processes.
There are many directions and tasks in data science for those who like exact sciences. You can do science-intensive tasks as a data scientist, implement new technologies as an ML engineer, look for useful patterns for business as a data analyst, or collect and structure data if you choose to work as a data engineer. In addition, your choices can be based not only on your expertise but also on the problems you want to solve: perhaps you dream of moving science forward and creating technology that others will use.