- Real Time Signals India

# K M ALGORITHM

Updated: Jan 15, 2018

**Cluster Analysis**

Cluster Analysis groups data objects based only on information found in data

that describes the objects and their relationships. The objective of clustering is

to group similar objects within a group and be different from the objects in other

groups.

**Types of Clustering :**

**Hierarchical Clustering:** - A set of nested clusters organized as a hierarchical

tree

**Partitioning Clustering:** - A division data objects into non-overlapping

subsets (clusters) such that each data object is in exactly one subset

K-Means :

1. Partitional clustering approach

2. Each cluster is associated with a centroid.

3. Each point is assigned ot the cluster with the closest centroid.

4. The number of clusters K must be specified.

**K-Means Algorithm :**

** Randomly initialize k cluster centroids µ 1 , µ 2 , µ 3 ….. µ k ∈ R n**

** Repeat **

** {**

** for i = 1 to m**

** c {j} := index(from 1 to k) of cluster centroid closest to x {i}**

** for k = 1 to m**

** µ k := average(mean) of points assigned to cluster k**

** }**

Select k arbitarary data points as centres and then iteratively perform the

following steps :

**Centers to clusters :** Assign each data point to the cluster corresponding

to its nearest center.(ties are broken arbiterarily)

**Clusters to centers :** After the assignment of data points to k clusters,

compute the new centers as cluster’s center of gravity.

Note :

If a data point is assigned to a new center during the Centers to

**Clusters step :**the squared error distrotion is reduced because the center must becloser to the point than the previous center was.If the center is moved during the clusters to center step : the squared error distoriton is reduced since the center of gravity is the only point minimizing the distoriton.

**% MATLAB CODE FOR K MEANS ALGORITHM**

% AUTHOR : **Real Time Signals **

clc;

%filename = 'C:\Users\user\Desktop\RTS\test.xlsx'; % load the data into the excel file

a = linspace(0,3*pi,200);

b = cos(a) + rand(1,200);

x= [a.',b.'];

%x= xlsread(filename);

K = 5; %Number of clusters

scatter(x(:,1),x(:,2));

[M,dim]= size(x);

map = zeros(1,M); % Array tells to which cluster the data points are mapped

C= zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points

%-----------Chose optimum J i.e cost function-------------%

no_simulations =100;

J_min =10^10;

map_min = zeros(1,M); % Array tells to which cluster the data points are mapped for min J

C_min = zeros(K,dim); % Centroid of the points mapped to that cluster; K vector points for min J

for s = 1: no_simulations

%------Initialization--------------%

% chose K random points and assign them as the cluster centroids

temp_index_array = zeros(1,K);

for j =1:K

index = randi(M); % choose a random index

% Check if that index is already assigned; If yes repeat till you find

% an available index

while(find(temp_index_array==index))

index = randi(M);

end

%assign the index

temp_index_array(j) = index;

C(j,:)= x(index,:);

end

%-----------K-Means Algorithm-------------%

Jpre=10^6;

J=10^5;

while((Jpre - J) > 0.001)

aggr = zeros(K,dim);

occ = zeros(1,K); % counts the occurance of x in that particular cluster

% Assign every element of x to a cluster

for i=1:M

min_dist = norm(x(i,:)-C(1,:));

% Search for the cluster midpoint whose distance is least from the

% considered element

map_index =1;

for j=2:K

diff = norm(x(i,:)-C(j,:));

if(min_dist >= diff)

min_dist = diff;

map_index =j;

end

end

map(i) = map_index; % Map x[i] to that particular cluster

aggr(map_index,:) = aggr(map_index,:) + x(i,:); % Sum up the points of that cluster

occ(map_index)= occ(map_index) + 1; % Increment the number of points mapped to that cluster

end

% Cost function calculation

Jpre = J;

J=0;

for i = 1:M

J = J + norm(x(i,:) - C(map(i),:)).^2;

end

% Update the centroid once all the points have been mapped

for j=1 :K

C(j,:) = aggr(j,:)/occ(j);

end

end

if(J < J_min)

J_min =J;

map_min = map;

C_min = C;

end

end

J_min

%%%% Assume K = 5 for plotting purposes

index_1 = find(map_min == 1);

x1 = x(index_1,:);

index_2 = find(map_min == 2);

x2 = x(index_2,:);

index_3 = find(map_min == 3);

x3 = x(index_3,:);

index_4 = find(map_min == 4);

x4 = x(index_4,:);

index_5 = find(map_min == 5);

x5 = x(index_5,:);

figure;

scatter(x1(:,1),x1(:,2),'r');

hold on;

scatter(x2(:,1),x2(:,2),'g');

hold on;

scatter(x3(:,1),x3(:,2),'b');

hold on;

scatter(x4(:,1),x4(:,2),'y');

hold on;

scatter(x5(:,1),x5(:,2),'o');

**Results For:**

Cluster Size = 1

Cluster Size = 3

Plot of cost function vs the no of clusters