28.04.2020 by Milena Riedl

From Big Data to Smart Data

In the first article of our Big Data series, we have already given you a first insight into the term Big Data and pointed out which benefits can be generated by data processing for production and especially for thermal analysis. In this next article, we would like to introduce the term Data Science in more detail and show some of its common methods.

By Michaela Lang & Jigyasa Sakhuja, Data Scientists at NETZSCH Analyzing & Testing

Definition of Data Science

As the term Data Science already describes, it is the science that deals with extracting valuable information from data. The goal is to use this information to improve a specific process in quality and efficiency, or even to gain new insights out of it. With the help of Data Science, it is possible to uncover correlations that cannot be easily recognized. The domain of Data Science comprises numerous different areas of expertise. Besides the mathematic / statistics and computer science, specialist knowledge plays a very important role. Especially in thermal analysis, it is necessary to understand and interpret the chemical and physical processes correctly in order not to derive wrong conclusions from measured data sets and to use the right methods for analysis.

At NETZSCH Analyzing & Testing all necessary areas of expertise are available, so that with this advantage NETZSCH Analyzing & Testing is able to apply Data Science methods in the field of thermal analysis.

In the next section, we would like to present some methods of data analysis that are used in Data Science.

Data Analytic Techniques

With a large amount of qualitative data, a data scientist can start the main task — turning the data set into valuable information. After data preprocessing, the data analysis can begin. In the following, it is described how to approach this challenge.

Data Exploration

With Data Exploration, the goal is to understand the data in a basic way. The structure of the data must be identified and distribution of the values is examined. With Data Exploration, we see first correlations between the data, and it enables us to find out which method is best to apply for the analysis.

Predictive Analysis

It is a subset of Business Intelligence and Business Analytics. During Predictive Analysis, the data sets are evaluated for patterns to be able to predict trends and future outputs. Several methods can be used for Predictive Analysis. In the following, we would like to give a short overview of some of these applications:

  • Machine Learning:

It is an application of Artificial Intelligence, which enables the system to automatically gain knowledge and improve itself from the experience over time without being manually programmed.

The Machine Learning methods acquire information from existing data by extracting patterns in large data sets. In general, dependencies are recognized and learned by the system, called the Machine Learning model, so that predictions about future events or outcomes can be made by the model even with new unknown data.

  • Linear / Non-linear Regression:

Linear Regression is one of the most powerful and basic algorithms for the Predictive Analysis. Its main goal is to predict the variables, i.e., a target variable that is based on one or more independent variables. With Linear Regression, it is possible to identify a linear relationship between a target variable and one or more predictor variables with existing data sets so that a linear function can be generated to describe the dependency.

In contrast, with Non-linear Regression a non-linear function is defined to explain the relationship between the variables.

With the information about the known relationship, it is easy to make predictions with new data.

  • Classification:

The Classification involves the assignment of data to a specific category. It is a classical machine learning method. The criteria and pattern for assigning data to a certain category were learned from existing categorical data and can now be applied to classify new data correctly.

  • Linear/Non-Linear Classification:

Linear Classification is used when you deal with a high number of features, whereas a non-linear classifier is used when the data is not linearly separable.

  • Logistic Regression:

It is a classification technique used to predict the probability of a new observation belonging to the particular category. Some of the examples are e-mail spam, fraud detection, online transaction frauds, etc.

Prescriptive Analysis

The main focus area is to find the best solution for the current data scenario. In addition to the Predictive Analysis, the Prescriptive Analysis provides recommendations on how to use the predicted information to influence the future. The goal is to use the information of prediction to analyze what decisions must be made to get the predicted result or to prevent it.

The best prerequisite for good data analysis is a close exchange of the data scientists with the specialist department where the data to be analyzed come from. With years of experience and knowledge in thermal analysis, NETZSCH can apply Data Science methods in its field of expertise.