Dive into the World of Data Mining! Part 3: Optimization of model
Good, better, best.
Never let it rest.
´Till your good is better
And your better is best.
The target users of RapidMiner Studio are managers and staff in controlling. Therefore the User-interface is rather optimized and crafted towards the needs of unexperienced users. So crafting work- or data-flows is not that difficult and Rapidminer will give plenty of help for decisions or even a quick evaluation on what other users have chosen. However, because most of those users usually have no experience in programming, machine learning, deep math or deep statistic knowledge, it is pretty hard for them to find the perfect combination of parameters for the elements in the workflow. So in this part we will learn how to evaluate and optimize the process in RapidMiner to find the best model, with the most precision.
Neural Network vs. Support Vector Machine
There are at core two tools or Project-Modules, witch are make use of supervised learning methods. The first is a Neural Network, the second is called a Support Vector Machine. These are most popular for classification and pattern recognition problems, but each work a little bit differently than the other and they both have different functions.
“SVMs have been developed in the reverse order to the development of neural networks (NNs). SVMs evolved from the sound theory to the implementation and experiments, while the NNs followed more heuristic path, from applications and extensive experimentation to the theory.”
To put it a bit simpler: The Support Vector Machine creates a line between two sets of data. It is a kernel-based algorithm and transforms data into a high-dimensional space to construct an hyperplane that maximizes the distance to the nearest data point of any of the input classes. Although SVM is originally designed to train binary classifiers, an extension for multiple class problems, is possible by converting the multiclass problem into several binary classification ones, using one-versus-all or one-versus-one approaches. [Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995;20(3):273–297.]
Artificial Neural Network are the collection of mathematical models that imitate the properties of the human nervous system and functions of adaptive biological learning. Something that tries to incorporate ‘human-like intelligence’ within a computer system. It also works by the non-linear relationship between the input and the output by adjusting the weight values internally.
Optimization of our prediction model
As you saw in Part 2 https://blog.novatec-gmbh.de/data-mining-p2-building-model/, the performance of the prediction, especially root mean squared error was too high, so the results of our prediction are not
The higher this figure is, the more inaccurate our predicted data will be. But do not worry, the default Settings don’t match every data-set equally good. So, in this part we will optimize our neural network parameters to increase our prediction accuracy and increase the performance values. As you can match, the algorithm of neural network is so complicated that Rapidminer displays it like a „black box“, so understanding the impact of changing our parameters to tweak the result, is really hard and sometimes impossible. Moreover, it is not viable to try all combinations of parameters manually per brute force to increase the accuracy. But there is a operator, witch is aple to help us here. You find it under: „Optimize Parameters“ [Modelling->Optinization->Parameters]. This operator finds the optimal values of the selected parameters from the operators in its sub-processes. It executes the sub-processes with all the combinations we selected and then delivers the optimal parameter values through to the parameter port. The performance vector for optimal values is delivered through the performance port. Any additional results of the sub-processes are delivered through the result port. But this is not the only way to improve the performance, other parameter optimization schemes are also available. The Optimize Parameters (Evolutionary) operator might be useful if the best ranges and/or dependencies are not known at all. Another operator which works similar to this is the Loop Parameters operator. But in contrast to the optimization operators, this operator simply iterates through all parameter combinations. This might be especially useful for plotting purposes. But for now, we focus on our first Optimization. First of all put in the „Optimize Parameter“ after the „Select Attribute“ operators, as shown in Figure 1 ,the one that provides the test and training data, and connect it with the „Apply Model“ operator as shown. In th e„Optimize Parameters (Grid)“ itself, we’ll open a new sub-process and paste all operators and functions from the previous „Cross Validation“Operator in it. You should be care-full with the right connections and the sequence of all operators.
Fig. 1 Optimization process for NN or SVM
Fig. 2 Subprocess Cross Validaion
Fig. 3 Subprocess Test and Train for NN
When all operators in your process and sub-process are connected, we can select the parameters we want to optimize. Just click on „Optimize Parameter“ find the „Operators“, you want to change and select them with blue arrows. In our case we want to find the best combination of parameters for the neural network. You can choose a minimum and a maximum of the learning_rate, for example, the number of steps between them and the scale method. In the bottom of the Window, you can preview the number of applied combinations. But beware, the more combinations you select there, the more time it will take the optimization process to try them all.
Fig. 4 Selection of parameter
In the next Figure you can see the optimized rates and the number of cycles for our neural network. I tried just 6 parameters with a few steps between them. So it is still possible to find a better combination for an even more accurate prediction.
Fig. 5 Parameter & Performance set for NN
Visual Prediction data is also definitely better than in our 2. Part. Here we can see the the red line (our prediction) in most of cases predicted a little bit more accidents, as it happened. But on the other hand, the prediction line imitate even the small peaks or tendencies of red line.
Fig. 6 Prediction Visualization
Fig. 7 Prediction data
Optimization with Support Vector Machine
You can try the same optimization process with Support Vector Machine. Just replace the operator „Neural Net“ with Operator „Support Vector Machine“ and choose the parameter to optimize. I tried with just three random parameters, because the optimization process for more parameters can take too much time and not always with much more better prediction.
In the next figure you can see my optimization results.
Fig. 8 Parameter & Performance set for SVM
At the End…
Make your own competition between neural network and support vector machine or maybe also with deep learning and find the best optimization model for your data.
I would be really happy, if you leave a comment with your optimization results or ideas 🙂