The rise of the citizen data scientist
- 07 April, 2016 22:00
When Mark Pickett was a captain in the Marines, he knew he couldn't be there to make every decision for his soldiers.
"You can't rehearse every scenario, and there will be times when you can't communicate," he explained. "You want to groom your Marines to be able to rely on themselves and their unit."
It's not so different in the business world in this era of big data.
Now senior director for online analytics and business intelligence at Sears, Pickett has been an early champion of the so-called citizen data scientist movement, by which employees in multiple parts of an organization are empowered with the analytics tools and skills to get the answers they need from their data.
"The business understands the business more deeply than we ever could," he said. "We're trying to coach these people up and provide them with the data they need to craft their own reporting and do their own analyses."
In Sears' case, the motivation is particularly strong. Though a retail business overall, the company is in many ways a conglomeration of numerous vertical businesses, each focusing on different product types.
"We have a very multicategory sort of business, from lawn and garden to appliances to clothing and jewelry to mattresses," Pickett said. "My team is built to support all of them, but we'll never understand their businesses the way they do."
By curating the right tools -- in Sears' case, Platfora's big-data analytics platform for Hadoop -- Pickett's group aims to enable businesspeople to answer 80 percent of their data questions themselves. More than 300 trained citizen data scientists at the company are now using those tools to generate thousands of data-analysis reports each week without any assistance.
"The only reason we'd touch one is if someone had questions, or needed data added," Pickett said.
A new generation of tools
Sears may have a particularly pressing need by virtue of the diverse nature of its business, but companies of all kinds are feeling the acute shortage of trained data scientists today. Even for those lucky enough to snag such a professional, "janitorial" tasks such as data preparation are still taking up an inordinate proportion of those workers' time.
Empowering businesspeople to do much of the analysis themselves frees up highly trained data scientists to focus on the things that require their expertise -- or so the thinking goes.
"Companies have more and more data," said Lukas Biewald, CEO and founder at data-focused crowdsourcing site CrowdFlower.
"You wouldn't have just one department creating spreadsheets that no one can modify," Biewald added. "Modern businesses need to think about these data tools in the same way they think about Excel."
Gartner predicts that the market for self-service data-preparation tools will reach $1 billion by 2019.
"Large enterprises are moving to data lakes, so all the data is in one place," said Jason Zintak, Platfora's president and CEO.
Next, companies need to help their employees make the most of it. Platfora bills its Hadoop-focused platform as a way to let anyone within a company run analyses across the entire organization's data, including transactions, customer interactions and machine data.
'They can build their own reports'
In many ways, the citizen data scientist represents an evolution of the traditional business analyst role.
"When I think about the traditional business analyst, they'd have a good understanding of the business but were not necessarily conversant with regard to the data," Sears' Pickett said.
Such professionals have often been focused on gleaning insights from Excel or other reporting tools without necessarily working knee-deep in the data, in other words.
In today's data-rich era, there's better literacy in terms of what data-focused questions to ask, Pickett added, reflecting the move from relational databases and spreadsheets to data lakes and more sophisticated analytical tools.
"What I'm observing is that people who have a strong understanding of the business now have some capability in terms of the data," he explained. "They can build their own reports, they know what attributes go together and they know what questions to ask not just from a business perspective but from a data perspective."
Not everyone is sold on the citizen data scientist concept, however.
'A recipe for disaster'
"I don't like the 'citizen data scientist' term," said Gregory Piatetsky-Shapiro, president of KDnuggets, an analytics and data-science consultancy.
For one thing, "the term implies that people without much training can do the work of a data scientist," Piatetsky-Shapiro said.
It's all too easy to discount the importance of education, in other words, even as big data is in many ways making it more important than ever before. With statistics at its core, data science often relies on an understanding of the assumptions underlying various statistical techniques, for example -- factors that aren't always apparent to those who haven't formally learned about them.
"Would you trust your teeth to a 'citizen dentist' or fly in a plane piloted by 'citizen pilot'?" Piatetsky-Shapiro asked. "Having untrained citizen data scientists analyze the data may be easy, but if they will be making decisions without proper training in data analysis and without an understanding of the business, it is a recipe for disaster."
Platfora's Zintak says built-in corporate governance structures can address that issue by controlling security and access levels, for example. At Sears, two weeks of training for the company's 300+ citizen data scientists have helped as well.
'Data is viral - everybody wants it'
Sears finalized its migration from a DB2 relational database management system to a Hadoop data lake in 2015. It had already adopted Platfora for a small group of specialists, but it wasn't long before the need for broader availability became clear.
"Data is viral -- everybody wants it," Pickett said. "It quickly became apparent that we had to solve for the volume of data requested by people by enabling them to become self-sufficient."
Focusing on the 300 or so people who handled many of the reporting needs for their teams, Sears' own in-house experts conducted the training to bring those users up to speed. Topics covered included nomenclature and data-set manipulation, for example.
Today, those employees request data, not reports, he said: "That's when we knew this was starting to take shape."
Now freed up from the bulk of the company's ad hoc reporting needs, Pickett's team can focus on higher-level tasks such as data curation, model building and governance.
'Start small and just do it'
Overall, Pickett touts decentralized decision-making as one of the chief benefits of the citizen data scientist model.
"It's not just about reducing reliance on us," he said. "It's empowering people to become more capable with their own data, and that's enabling them to think about their business in new ways."
If Pickett had to do it all over again, he'd make the transition to the citizen data scientist model sooner, he said.
"There's no easy way to make this paradigm shift, so rather than try to plan everything out, start small and just do it," he advised. "Once people understand what's available to them, the adoption becomes viral. They'll have tons of questions, and that's awesome."