Quantcast
Channel: Search results matching tag 'data mining'
Viewing all articles
Browse latest Browse all 11

Data Mining Algorithms – Support Vector Machines

$
0
0

Support vector machines are both, unsupervised and supervised learning models for classification and regression analysis (supervised) and for anomaly detection (unsupervised). Given a set of training examples, each marked as belonging to one of categories, an SVM training algorithm builds a model that assigns new examples into one category. An SVM model is a representation of the cases as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

A support vector machine constructs a hyper-plane or set of hyper-planes in a high-dimensional space defined by the input variables, which can be used for classification, regression, or other tasks. A SVM is a discrete linear classifier. A good separation is achieved by the hyper-plane that has the largest distance to the nearest training data point of any class (so-called functional margin). The larger the margin the lower the generalization error. Let me show you this on a graphical example. Of course, I am showing a two-dimensional space defined by only two input variables, and therefore my separating hyper-plane is just a line.

The first figure shows the two-dimensional space with cases and a possible single-dimensional hyper-plane (line). Of course, you can see that this line cannot be a separator at all, because there are some cases on the line, or said differently, on both sides of the line.

image

The next try is better. The line is separating the cases in the space. However, this is not the best possible separation. Some cases are pretty close to the line.

image

The third picture shows the best possible separation. The hyper-plane that separates the cases the best is found, and the model is trained.

image

Support Vector Machines are powerful for some specific classifications:

  • Text and hypertext categorization
  • Images classification
  • Classifications in medicine
  • Hand-written characters recognition

One-class SVM can be used for anomaly detection, like detection of dirty data, or fraud detection. It uses a classification function without parameters, the one selected for the separation without regard to a target variable. Cases that are close to the separation hyper-plane are the suspicious cases. Therefore, the result is dichotomous: 1=regular case, 0=outlier.


Viewing all articles
Browse latest Browse all 11

Trending Articles