There’s one sure thing you can say about data science — it’s a lot of things. Data science is not necessarily one single discipline, skillset or methodology. This is why data science is always said to be an ‘interdisciplinary branch’ of science that combines mathematics, human behavioral analysis and workflow studies, flexible use of logic systems and a core employment of algorithms.
This makes being a data scientist pretty hard work, as if algorithmic logic wasn’t already pretty tough.
More than just data analytics, more than just big data insight, more than just the ability to handle new streams of raw unstructured data and more than just knowing how to drive a database while blindfolded, data scientists have to understand business and be flexible super-performers. So what core attributes make a good data scientist?
Simon Asplen-Taylor is interim chief data officer (CDO) and founder at data analytics advisory company Datatick. He has previously served at casino and online gaming company Rank Group where he and his team have made use of WhereScape technologies for data science centric work, using WhereScape’s data warehouse automation & big data software.
1 – Ain’t no homogeneity
Asplen-Taylor suggests that the first thing we need to realize is that the data scientist’s role is never homogenous. Different skills are required for different tasks in different roles in different ‘digital workflows’ in different industry verticals in different world markets.
2 – Data is a business thing
He advises that organizations who want to embrace data science competently need to have a data strategy that is aligned to the business goals – and, crucially, it needs to be written by a ‘business savvy’ chief data officer (CDO) who can align all the capabilities of data to the business – increasing revenues, reducing costs, reducing risk, increasing customer and employee satisfaction.
3 – Data scientists are experimental
“The work of data scientists is, by definition, experimental. They need to be allowed to experiment and the outcomes may or may not be successful, but do enough experiments in the right areas… and you will find the value,” said Asplen-Taylor. “Considering problem solving experimentation further, data scientists need to follow not to lead i.e. they need to be given a problem to fix, which means they need business analysts to define the problem… and, after their experimentation phase, they need someone to test the outcome of their projects, validate the results (so they are not marking their own homework) and they need IT people who will put their models into a production environment… and to then document them (which is key from a data privacy perspective – ensuring that what they are doing is transparent) and support the models.”
4 – This is cowboy (person) country
Data scientists call the corralling process of bringing different data sets together ‘data wrangling’ in homage to the cattle corralling process that cowboys (now, in 2020, cowpersons, obviously) do out on the range. Asplen-Taylor explains that the reason he and his coworkers saddle up in this way is that if data sets are not engineered properly and ‘productionized’ so that they can be run every day, then they will fail.
“The data sets need to be built, automated and deployed to an environment where the data scientists can access them. The vast majority of companies’ data sources that are valuable for generating value are within their existing structured systems – so data scientists should first focus their attention on using this data. As the function matures then they can go after different more elusive data sets … but it’s not the starting point,” he said.
5 – The governator factor
It’s no surprise to see data governance and data quality also called out in this list of top-7 traits. This discipline sits at something of an adjunct to the data scientist i.e. an organization with a fully-fledged IT department should have separately defined data quality team, but the data scientists should know who they are and how competently they will be able to act.
6 – Clear and present process
“There needs to be a clear process for Data Science so that people in the business know how the projects work. A good industry wide process exists – it’s called the CRISP-DM life cycle (Chapman, Clinton, Kerber, et al, 1999), explained Asplen-Taylor. “It was first set up for data mining, one aspect of data science, but can be applied to all. In this way everyone knows the stages of the lifecycle and timescales and resources can be applied. Today people think it’s just magic, it isn’t.”
7 – Company-wide mentality
As a final factor in this list, Asplen-Taylor says that data scientists need to work with a data architecture that is company-wide. If data scientists define their own architecture and it’s not wholly integrated across the business then they will duplicate much of what has been done already. That’s why the software engineering team (i.e. the programmer/developers) needs to build fast and automate, working closely with the data science team.
“If all of the above does not happen then the data science people will revert to what is easiest i.e. they will compete with existing Business Intelligence (BI) teams, build their own reports and dashboards and do very little actual science. Companies already know how to do BI and reporting well, and it’s not something data scientists should get into.
It’s early days, still
As a parting comment, Asplen-Taylor issues a small plea. He says that, for the most part, data scientists are generally inexperienced (compared to many other long-established IT roles) and so have probably not been in the job that long — hence the view that they need to be managed carefully by the CIO, CTO or other C-suite ‘head suit’.
Your organization’s IT department could now be developing this role, so just remember… it’s not rocket science, it’s data science rocketing.