In the past few years we’ve seen an interesting tendency where data scientists became the most sought-after professionals amongst all data professionals, when in fact, the market wasn’t sure who an ideal data scientist was. Lots of companies aimed at hiring “unicorns” who were good at computer programming, data cleansing, machine learning, statistics, and had some domain knowledge of certain industries. Since the market now finally understands that it’s almost impossible to find that one person who “has it all” (including a PhD degree in Physics from a top university in the US) the tendency is to look for a few people who have complementary skills and can work together in a team or in different teams to solve data-related challenges. For instance, data engineering, a field that only a year ago or two was considered part of a unicorn data scientist’s skillet, is becoming a field of its own due to the understanding that companies that do not define data pipelines that are robust enough to ensure the collection of the most relevant data could see their analytics projects failing. In addition, data engineering requires a specialization in building and maintaining data pipelines, and data lakes for different purposes using cloud technologies, and these skills are very different from the skills that “behavior scientists” who do quantitative analysis need. The same situation can open the door for a growing demand for more data-related professionals.
I personally think that another field that is going to see a big boost in the next few years is meta data management, which could be called “data curator”. Nowadays, many organizations face a situation where they have data stored in many different locations and systems (aka “data silos”) and most employees lack the understanding of how the data was collected, where it originated from, and what kind of information is contained in it. It’s not enough to dump everything in a data lake, if there’s no data catalog or other source of information that explains all these things. A data curator could be the person building that data catalog and maintaining it, or in other words, curating the collection of data available to the entire organization. According to a definition I found by Tomer Shiran, CEO and co-founder of analytics startup Dremio , a data curator could be sitting between the data engineers and the data consumers (e.g. analysts, data scientists) and helps facilitating the communication between them. I think that an ideal data curator will have to have a bit of technical knowledge (e.g. SQL, and knowledge of databases), as well as the ability to work with metadata management tools, in addition to good facilitation skills since the role could involve interaction with lots of different stakeholders in order to define business terms and reach consensus.
This role can go in many different ways, from impact analysis helping developers understand what BI reports are impacted by a new changes in the database, data lineage reports showing business users where their data came from , all the way to building a business glossary together with business users to define certain elements of the data (i.e. what is a “shipping date”? ). Data curation might seem like a luxury to certain organizations now but given our reality that involves so much data sources it is likely to become a necessity in the future.