SVM in Python

In this post:

1. About SVM

As SVM (Support Vector Machine) like algorithm is good documented, we will not discuss in current material SVM theory, we will start directly presenting a little bit dataset that we will use with SVM, and then using effectively SVM. Nevertheless, for some more details about SVM algorithm we can check it here.

To exemplify classification using SVM in Python we will use wine dataset from scikit-learn (sklearn) module sklearn package have many datasets that can be used in tests (iris, wine, files, digits, reast_cancer, diabetes, linnerud, sample_image, sample_images, svmlight_file, svmlight_files )

2. Things about dataset used

a) To load wine dataset we will use load_wine method:

# import section

from sklearn import datasets

#load wine data set

ds_wine=datasets.load_wine()


b) Let's see some things about wine dataset. Any phython dataset is characterized by features and targets (label name). For wine: 

#check features of dataset

print("Features: ", ds_wine.feature_names)

# print wine type (i.e labels )

print("Labels: ", ds_wine.target_names)


#output:

Features:  ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 

'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

Labels:  ['class_0' 'class_1' 'class_2']

We see wine features means: 'alcohol', 'malic_acid', 'ash', etc. Wine labels (or targets, or types ) means 'class_0', 'class_1', 'class_2'


Dataset has 178 samples. Our SVM classification from this example will classify wines. 

c) Let's print first 2 records of our wine dataset:

print(ds_wine.data[0:2])


#output

[[1.423e+01 1.710e+00 2.430e+00 1.560e+01 1.270e+02 2.800e+00 3.060e+00

  2.800e-01 2.290e+00 5.640e+00 1.040e+00 3.920e+00 1.065e+03]

 [1.320e+01 1.780e+00 2.140e+00 1.120e+01 1.000e+02 2.650e+00 2.760e+00

  2.600e-01 1.280e+00 4.380e+00 1.050e+00 3.400e+00 1.050e+03]]

We observe comparing with output from a) for the first record 

alcohol=1.423e+01

malic_acid=1.710e+00

ash=2.430e+00

etc

In general, to determine how many records are in datasets and how many features each record has we use the shape method:

#shape of the dataset

print(ds_wine.data.shape)

#output

(178, 13)

This means ds_win dataset has 178 records, and as we see each one has 13 features.


3. Applying SVM classification to dataset

Now that we see what wine dataset looks like, we will apply SVM classification to it. 

Like in any machine learning algorithm we will need part for data for training the model, and another part for testing it.

3.1. Train and test dataset

For the SVM model we will choose train and test dataset from ds_wine initial dataset,using train_test_split method.

# Import train_test_split function

from sklearn.model_selection import train_test_split

# split ds_wine

dsx_train, dsx_test, dsy_train, dsy_test = train_test_split(ds_wine.data, ds_wine.target, test_size=0.15,random_state=109) # 85% training and 15% test

print(dsx_train[0:2])

print(dsy_train[0:2])

#output:

[[1.279e+01 2.670e+00 2.480e+00 2.200e+01 1.120e+02 1.480e+00 1.360e+00

  2.400e-01 1.260e+00 1.080e+01 4.800e-01 1.470e+00 4.800e+02]

 [1.438e+01 1.870e+00 2.380e+00 1.200e+01 1.020e+02 3.300e+00 3.640e+00

  2.900e-01 2.960e+00 7.500e+00 1.200e+00 3.000e+00 1.547e+03]]

[2 0]

Above we printed the first two records from train dataset, referring to dsy_train we see target for first record is 2, i.e. 'class_2' label, and the second record have target 0, i.e. 'class_0' label.


3.2. Create SVM classifier

To generate SVM model, we will create first SVM classifier object using SVC method:

from sklearn import svm

myclassifier = svm.SVC(kernel='linear')

We used method SVC (letters come from Support Vector Classification), method have many parameters, we used only kernel, a linear kernel. Kernel can also be 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'.


3.3. Train and test SVM model

To train the model use "fit" method with train dataset then to test classification, use "predict" method with test dataset:

#train model with fit method using our dsx_train dataset

myclassifier.fit(dsx_train, dsy_train)

dsy_pred = myclassifier.predict(dsx_test)

#output: 

[0 0 1 2 0 1 0 0 1 0 2 1 2 2 0 1 1 0 0 1 2 1 0 2 0 0 1]


To understand more intuitive, we will take a record from dataset, and predict in which target it is, let's take first record from dsx_test it looks like: 

print(dsx_test[0])

#output

[1.330e+01 1.720e+00 2.140e+00 1.700e+01 9.400e+01 2.400e+00 2.190e+00

 2.700e-01 1.350e+00 3.950e+00 1.020e+00 2.770e+00 1.285e+03]

and appropriate value (target) in reality is dsy_test[0]

print(dsy_test[0])

#output:

0

Predicted target for dsx_test[0] is:

dsy_pred_0=myclassifier.predict([dsx_test[0]])

print(dsy_pred_0)

#output:

[0]

We see predicted value dsy_pred_0 is for this record the same with real value dsy_test[0].


4. SVM model accuracy

To evaluate accuracy for entire test dataset we use accuracy_score method:

from sklearn import metrics

print("SVM Accuracy:",metrics.accuracy_score(dsy_test, dsy_pred))

#output:

SVM Accuracy: 0.9259259259259259

Precision is good, it is about 92%


5. Full code sample 

#import section

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn import svm

from sklearn import metrics


#load wine data set

ds_wine=datasets.load_wine()

#check features of dataset

print("Features: ", ds_wine.feature_names)

# print wine type (i.e labels )

print("Labels: ", ds_wine.target_names)

#print first 2 records of our wind dataset

print(ds_wine.data[0:2])

#shape of the dataset

print(ds_wine.data.shape)

# split ds_wine

dsx_train, dsx_test, dsy_train, dsy_test = train_test_split(ds_wine.data, ds_wine.target, test_size=0.15,random_state=109) # 85% training and 15% test

print(dsx_train[0:2])

print(dsy_train[0:2])

# Generate SVM model, creating first SVM classifier object

myclassifier = svm.SVC(kernel='linear') # Linear Kernel

#train model with fit method using our X_train dataset

myclassifier.fit(dsx_train, dsy_train)

#make prediction for test dataset X_test

dsy_pred = myclassifier.predict(dsx_test)

print(dsy_pred)

print(dsx_test[0])

print(dsy_test[0])

dsy_pred_0=myclassifier.predict([dsx_test[0]])

print(dsy_pred_0)

print("SVM Accuracy:",metrics.accuracy_score(dsy_test, dsy_pred))