Plotted on a line chart, America's per capita consumption of mozzarella cheese apparently correlates with the number of civil engineering doctorates awarded annually. Ditto the divorce rate in Maine and per capita consumption of margarine.
Of course, common sense balks at both conclusions. Neither of them are true.
For data scientists at Johnson & Johnson like Manoj Pandey, these humorous examples teach valuable lessons, which he calls "pitfalls of data science." Here's an inside look at some of those lessons, why they're such a key part of our approach to data science—and how they're helping us change the future of health.
Interpretability and Visibility in Data Science
New diagnostic tools for patients with Alzheimer's disease, digital tools to detect and prevent dengue—we're driving health impact through data science in a number of bold and surprising ways, as the inaugural Data Science Showcase at Johnson & Johnson made clear.
But, Manoj points out, as these and other data science applications become more complex, interpretability and visibility must become priorities in turn.
Consider the following example: In the future, doctors might consult advanced machine learning algorithms in order to recommend the best course of treatment for patients. In that case, patients will surely want to know why those recommendations were made. And doctors will want to be able to validate their decisions, too.
"We can't let these models become 'black boxes,'" Manoj argued, "because then we risk losing visibility into them."
So while we're doing groundbreaking work in data science at Johnson & Johnson, we're doing it with these key priorities very much in mind.
"We want to be able to not only extract valuable insights from machines, but also share why that decision has been made," Manoj said.
Causation in Data Science
Another priority raised by data science has to do with causation—the reason why a given thing happens.
"While identifying a correlation is easy and generally useful in predictive models," Manoj explained, "understanding causation is much more difficult. It requires us to experiment and have robust assumptions from the start."
As data science becomes a key component of investigative studies, for example, it requires us to take into account ethical considerations around data collection, study design, data analysis, the dissemination and application of findings and more.
"Data science holds tremendous power to generate insights in these studies," Manoj said. "But causality is key, particularly in terms of the conclusions these studies draw. And in the spirit of Our Credo, it's critical to have transparency in terms of where the data comes from, how and under what circumstances it was generated and so on."
Bringing Critical Controls to Data Science
The community of data scientists at Johnson & Johnson is keenly aware that reaching the right conclusions requires starting with the right approach.
As Manoj framed it, "One of the important questions is whether you start by identifying the problem you want to solve and then back-track to your data, or vice versa. In the latter case, that can make it challenging to design your approach in a way that fits the problem."
So we're taking steps to correct for that—for example, by rolling out new types of controls, which Manoj refers to as “gates," for data science projects.
"Gates help ensure we have sufficient data to solve the problem, and that we're always taking the right approach from the outset," he explained.
Ready to Join Our Community of Data Scientists?
We’re using data science to drive extraordinary real-world impact, positively impact people's lives around the world and change the trajectory of human health. Check out all of the opportunities we have available in data science right now, as well as all of the ways you can join the Johnson & Johnson Family of Companies. Plus, if you want to stay in touch—and get updates about jobs that might interest you going forward—be sure to sign up for our Global Talent Hub.