Search
• Real Time Signals India

# K M ALGORITHM

Updated: Jan 15, 2018

Cluster Analysis

Cluster Analysis groups data objects based only on information found in data

that describes the objects and their relationships. The objective of clustering is

to group similar objects within a group and be different from the objects in other

groups.

Types of Clustering :

Hierarchical Clustering: - A set of nested clusters organized as a hierarchical

tree

Partitioning Clustering: - A division data objects into non-overlapping

subsets (clusters) such that each data object is in exactly one subset

K-Means :

1. Partitional clustering approach

2. Each cluster is associated with a centroid.

3. Each point is assigned ot the cluster with the closest centroid.

4. The number of clusters K must be specified.

K-Means Algorithm :

Randomly initialize k cluster centroids µ 1 , µ 2 , µ 3 ….. µ k ∈ R n

Repeat

{

for i = 1 to m

c {j} := index(from 1 to k) of cluster centroid closest to x {i}

for k = 1 to m

µ k := average(mean) of points assigned to cluster k

}

Select k arbitarary data points as centres and then iteratively perform the

following steps :

Centers to clusters : Assign each data point to the cluster corresponding

to its nearest center.(ties are broken arbiterarily)

Clusters to centers : After the assignment of data points to k clusters,

compute the new centers as cluster’s center of gravity.

Note :

• If a data point is assigned to a new center during the Centers to Clusters step : the squared error distrotion is reduced because the center must becloser to the point than the previous center was.

• If the center is moved during the clusters to center step : the squared error distoriton is reduced since the center of gravity is the only point minimizing the distoriton.

% MATLAB CODE FOR K MEANS ALGORITHM

% AUTHOR : Real Time Signals

clc;

%filename = 'C:\Users\user\Desktop\RTS\test.xlsx'; % load the data into the excel file

a = linspace(0,3*pi,200);

b = cos(a) + rand(1,200);

x= [a.',b.'];

K = 5; %Number of clusters

scatter(x(:,1),x(:,2));

[M,dim]= size(x);

map = zeros(1,M); % Array tells to which cluster the data points are mapped

C= zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points

%-----------Chose optimum J i.e cost function-------------%

no_simulations =100;

J_min =10^10;

map_min = zeros(1,M); % Array tells to which cluster the data points are mapped for min J

C_min = zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points for min J

for s = 1: no_simulations

%------Initialization--------------%

% chose K random points and assign them as the cluster centroids

temp_index_array = zeros(1,K);

for j =1:K

index = randi(M); % choose a random index

% Check if that index is already assigned; If yes repeat till you find

% an available index

while(find(temp_index_array==index))

index = randi(M);

end

%assign the index

temp_index_array(j) = index;

C(j,:)= x(index,:);

end

%-----------K-Means Algorithm-------------%

Jpre=10^6;

J=10^5;

while((Jpre - J) > 0.001)

aggr = zeros(K,dim);

occ = zeros(1,K); % counts the occurance of x in that particular cluster

% Assign every element of x to a cluster

for i=1:M

min_dist = norm(x(i,:)-C(1,:));

% Search for the cluster midpoint whose distance is least from the

% considered element

map_index =1;

for j=2:K

diff = norm(x(i,:)-C(j,:));

if(min_dist >= diff)

min_dist = diff;

map_index =j;

end

end

map(i) = map_index; % Map x[i] to that particular cluster

aggr(map_index,:) = aggr(map_index,:) + x(i,:); % Sum up the points of that cluster

occ(map_index)= occ(map_index) + 1; % Increment the number of points mapped to that cluster

end

% Cost function calculation

Jpre = J;

J=0;

for i = 1:M

J = J + norm(x(i,:) - C(map(i),:)).^2;

end

% Update the centroid once all the points have been mapped

for j=1 :K

C(j,:) = aggr(j,:)/occ(j);

end

end

if(J < J_min)

J_min =J;

map_min = map;

C_min = C;

end

end

J_min

%%%% Assume K = 5 for plotting purposes

index_1 = find(map_min == 1);

x1 = x(index_1,:);

index_2 = find(map_min == 2);

x2 = x(index_2,:);

index_3 = find(map_min == 3);

x3 = x(index_3,:);

index_4 = find(map_min == 4);

x4 = x(index_4,:);

index_5 = find(map_min == 5);

x5 = x(index_5,:);

figure;

scatter(x1(:,1),x1(:,2),'r');

hold on;

scatter(x2(:,1),x2(:,2),'g');

hold on;

scatter(x3(:,1),x3(:,2),'b');

hold on;

scatter(x4(:,1),x4(:,2),'y');

hold on;

scatter(x5(:,1),x5(:,2),'o');

Results For:

Cluster Size = 1

Cluster Size = 3

Plot of cost function vs the no of clusters