Classification using logistic regression

This demo shows how to compute show a separation hyperplane to classify objects. The hyperplane separates one class of objects from another.

Contents

See also external module loadModelData.m this file: demoDataGen.m data files: FakeData.mat, FakeData.csv.

Generate model data in 2D

Consider N + (alpha * N) objects of two classes. Let objects be random, distributed normally, the multivariate random variable of the 1st class could be described by the unit diagonal covariance matrix, and 2nd’s by the matrix diag(sig2).

The simplest way to get data

The simplest and dirty way to get model data is to type them by hands in the code.

X = [0,1; 1,0];
y = [  0;   1];

Load from a mat or csv file

% the data could be loaded from external mat-file
% load FakeData.mat % file must contain X and y (do not forget - convention)
% or from Comma-Separated-Values worksheet
% dummy = dlmread('FakeData.csv', ',');
% X = dummy(:, 1:end-1); % this indices are under convention
% y = dummy(:, end);
% clear dummy % no needs to keep a big storage

Generate data by a module

The data could be generated by external module of special purpose

N = 100; % 1st class contains N objects
alpha = 1.5; % 2st class contains alpha*N ones
sig2 = 1; % assume 2nd class has the same variance as the 1st
dist2 = 4;
%
% % later we move this piece of code in a separate file
% [X, y] = loadModelData(N, alpha, sig2, dist2);
N2 = floor(alpha * N); % calculate the size of the 2nd class
cls1X = randn(N, 2); % generate random objects of the 1st class
% generate a random distance from the center of the 1st class to the center
% of the 2nd
ShiftClass2 = repmat( ...
    dist2 * [sin(pi*rand)  cos(pi*rand)], ...
    [N2,1]); %
% generate random objects of the 2nd class
cls2X = sig2 * randn(N2, 2) + ShiftClass2;
% combine the objects
X = [cls1X; cls2X];
% assign class labels: 0s and 1s
y = [zeros(size(cls1X,1),1); ones(size(cls2X,1),1)];
%end % of module

The given data are X and y

X is a matrix of (N1+N2 by 2), and y is an (N1+N2)-column vector. Indices for objects are very useful to deal with during the plot.

X; y; % note that y contains only 1s and 0s
idx1 = find(y == 0); % object indices for the 1st class
idx2 = find(y == 1);
% no more variables are needed

Preliminary plot

Plot the given data to observe possible obstacles to classification.

h = figure; hold on
plot(X(idx1,1), X(idx1,2), 'r*');
plot(X(idx2,1), X(idx2,2), 'b*');
axis tight
xlabel('x_1');
ylabel('x_2');
% close(h);

Classification using logistic regression - Preliminary plot

Classification: fit the algorithm parameters

Use a logistic regression as a classification algorithm to classify objects

bHat = glmfit(X,y,'binomial');

Classification: use the model

We have the model y = 1/(1+exp(-Xb)) and its parameters bHat. Use the model to get the estimations of class labels.

yHat = glmval(bHat, X, 'logit'); % variant of classification
yHat = 1./(1+exp( -[ones( size(X,1),1 ), X] *bHat)); % variant of classification
% formed as an inline function
classify = inline( '1./(1+exp( -[ones( size(X,1),1 ), X] *b))', 'b', 'X');
% separation hyperplane, formed as a function (here it is a line)
separateXLim = inline( '(-b(1)- YLim*b(3))/b(2)', 'b','YLim');

% example of classification model usage
yHat = classify(bHat,X);

Plot the results in 2D

% the objects could be surrounded by circles
idx1 = find(yHat < 1/2); % object indices for the 1st class
idx2 = find(yHat >= 1/2);
plot(X(idx1,1), X(idx1,2), 'ro');
plot(X(idx2,1), X(idx2,2), 'bo');
% or separated by plane
plot(separateXLim(bHat,YLim), YLim, 'b-');

Classification using logistic regression - plot the results in 2D

Plot in 3D and save

To plot 3D surface of probability we should compute the probability function in each point of the X1 by X2 – plane.

% to do that create a grid
GRIDSIZE = 10;
linX1 = linspace(min(X(:,1)), max(X(:,1)), GRIDSIZE); % grid of feature values
linX2 = linspace(min(X(:,2)), max(X(:,2)), GRIDSIZE);
[grdX1 grdX2] = meshgrid(linX1, linX2); % cartesian product of two grid sets

grdX1 = grdX1(:); % vectorize the obtained matrices
grdX2 = grdX2(:);

Tri = delaunay(grdX1,grdX2); % make Delauney triangulation
trisurf(Tri,grdX1,grdX2,classify(bHat,[grdX1 grdX2])-1/2,...
'FaceColor','red','EdgeColor','none');
camlight left; lighting phong; % alpha(0.8);
% the plane was shifted down to 1/2

% to save the picture use the figure handle h
% saveas(h, 'demoDataGen_saved.png', 'png');
% saveas(h, 'demoDataGen.eps', 'psc2');
close(h);

Classification using logistic regression -  plot in 3D and save

Vadim Victor

Vadim V. Strijov, Data Analysis & Machine Learning professor at the FRCCSC of the RAS, Doctor of Physics and mathematics sciences

You may also like...