Recently I was asked by a group of professionals, “How would you define or explain data science to someone non-technical?” I think for many individuals this question still persists and it is becoming more of a burning question with each passing day. Our world is becoming more technology driven and we are collecting more and more data, but data by itself does not help organizations much on its own. This is where the opportunity for Data Science lives.
The Brief Answer
Succinctly, Data Science is the science of learning from data; it studies the methods involved in the analysis and processing of data and proposes technology to improve methods in an evidence-based manner.
The Expanded Answer
In expanded form, Data Science has become a fourth approach to scientific discovery, in addition to experimentation, modeling, and computation.
This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and interdisciplinary applications.
The role “Data Scientist” means a professional who uses scientific methods to liberate and create meaning from raw data.
“Big Data” as a criterion for meaningful distinction between statistics and data science should be dismissed (another words data-science=”big data” framework is not getting at anything very intrinsic about Data Science or Statistics).
Data Science faces essential questions of a lasting nature and using scientifically rigorous techniques to attack those questions. Data Science is the science of learning from data.
Since 1962, John Tukey prophesied that something like today’s Data Science would be coming. Tukey published in The Annals of Mathematical Statistics “The Future of Data Analysis” a shocking statement:
For a long time, I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statics evolve, I have had cause to wonder and to doubt. … All in all I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.
The activities of a Data Scientist:
- Data Exploration and Preparation
- Exploration: Data Scientist devote serious time and effort to exploring data to sanity-check its most basic properties, and to expose unexpected features
- Preparation: Data cleaning
- Data Representation and Transformation
- Databases: Data Scientists need to know the structures, transformation, and algorithms involved in using various data types.
- Mathematical representation: Data Scientist develop their abilities for knowing how and when to use mathematical structures to represent special types of data.
- Computing with Data
- Data Scientist should know several languages, as many final solutions require using five or six programming languages in concert.
- Data Modeling
- Data Scientists have the ability to infer how the data was created
- Data Scientist have the ability to construct methods which predict well over some given dataset.
- Data Visualization and Presentation
- Data Scientists use conventional charts and plots
- Data Scientist create dashboards for monitoring data processing pipelines that access streaming or widely distributed data
- Data Scientist develop visualizations to present conclusions to a non-technical audience
- Science about Data Science
- Data Scientist are doing science about data science when they identify commonly-occurring analysis/processing workflows and they have a continually evolving, evidence-based approach.
As a Data Scientist understanding what’s behind the question of “what is Data Science” is the best way to get support for your role – and for you. The more you know, the better able you will be to help your organization build their Data Science capabilities and confidence. Putting a Data Science context to a William Arthur Ward quote – The mediocre Data Scientist tells. The good Data Scientist explains. The superior Data Scientist demonstrates. The great Data Scientist inspires…help your organization to imagine the possibilities with Data Science.