- Real Time Signals India
K M ALGORITHM
Updated: Jan 15, 2018
Cluster Analysis
Cluster Analysis groups data objects based only on information found in data
that describes the objects and their relationships. The objective of clustering is
to group similar objects within a group and be different from the objects in other
groups.
Types of Clustering :
Hierarchical Clustering: - A set of nested clusters organized as a hierarchical
tree
Partitioning Clustering: - A division data objects into non-overlapping
subsets (clusters) such that each data object is in exactly one subset
K-Means :
1. Partitional clustering approach
2. Each cluster is associated with a centroid.
3. Each point is assigned ot the cluster with the closest centroid.
4. The number of clusters K must be specified.
K-Means Algorithm :
Randomly initialize k cluster centroids µ 1 , µ 2 , µ 3 ….. µ k ∈ R n
Repeat
{
for i = 1 to m
c {j} := index(from 1 to k) of cluster centroid closest to x {i}
for k = 1 to m
µ k := average(mean) of points assigned to cluster k
}
Select k arbitarary data points as centres and then iteratively perform the
following steps :
Centers to clusters : Assign each data point to the cluster corresponding
to its nearest center.(ties are broken arbiterarily)
Clusters to centers : After the assignment of data points to k clusters,
compute the new centers as cluster’s center of gravity.
Note :
If a data point is assigned to a new center during the Centers to Clusters step : the squared error distrotion is reduced because the center must becloser to the point than the previous center was.
If the center is moved during the clusters to center step : the squared error distoriton is reduced since the center of gravity is the only point minimizing the distoriton.

% MATLAB CODE FOR K MEANS ALGORITHM
% AUTHOR : Real Time Signals
clc;
%filename = 'C:\Users\user\Desktop\RTS\test.xlsx'; % load the data into the excel file
a = linspace(0,3*pi,200);
b = cos(a) + rand(1,200);
x= [a.',b.'];
%x= xlsread(filename);
K = 5; %Number of clusters
scatter(x(:,1),x(:,2));
[M,dim]= size(x);
map = zeros(1,M); % Array tells to which cluster the data points are mapped
C= zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points
%-----------Chose optimum J i.e cost function-------------%
no_simulations =100;
J_min =10^10;
map_min = zeros(1,M); % Array tells to which cluster the data points are mapped for min J
C_min = zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points for min J
for s = 1: no_simulations
%------Initialization--------------%
% chose K random points and assign them as the cluster centroids
temp_index_array = zeros(1,K);
for j =1:K
index = randi(M); % choose a random index
% Check if that index is already assigned; If yes repeat till you find
% an available index
while(find(temp_index_array==index))
index = randi(M);
end
%assign the index
temp_index_array(j) = index;
C(j,:)= x(index,:);
end
%-----------K-Means Algorithm-------------%
Jpre=10^6;
J=10^5;
while((Jpre - J) > 0.001)
aggr = zeros(K,dim);
occ = zeros(1,K); % counts the occurance of x in that particular cluster
% Assign every element of x to a cluster
for i=1:M
min_dist = norm(x(i,:)-C(1,:));
% Search for the cluster midpoint whose distance is least from the
% considered element
map_index =1;
for j=2:K
diff = norm(x(i,:)-C(j,:));
if(min_dist >= diff)
min_dist = diff;
map_index =j;
end
end
map(i) = map_index; % Map x[i] to that particular cluster
aggr(map_index,:) = aggr(map_index,:) + x(i,:); % Sum up the points of that cluster
occ(map_index)= occ(map_index) + 1; % Increment the number of points mapped to that cluster
end
% Cost function calculation
Jpre = J;
J=0;
for i = 1:M
J = J + norm(x(i,:) - C(map(i),:)).^2;
end
% Update the centroid once all the points have been mapped
for j=1 :K
C(j,:) = aggr(j,:)/occ(j);
end
end
if(J < J_min)
J_min =J;
map_min = map;
C_min = C;
end
end
J_min
%%%% Assume K = 5 for plotting purposes
index_1 = find(map_min == 1);
x1 = x(index_1,:);
index_2 = find(map_min == 2);
x2 = x(index_2,:);
index_3 = find(map_min == 3);
x3 = x(index_3,:);
index_4 = find(map_min == 4);
x4 = x(index_4,:);
index_5 = find(map_min == 5);
x5 = x(index_5,:);
figure;
scatter(x1(:,1),x1(:,2),'r');
hold on;
scatter(x2(:,1),x2(:,2),'g');
hold on;
scatter(x3(:,1),x3(:,2),'b');
hold on;
scatter(x4(:,1),x4(:,2),'y');
hold on;
scatter(x5(:,1),x5(:,2),'o');
Results For:
Cluster Size = 1

Cluster Size = 3

Plot of cost function vs the no of clusters
