# k - Nearest Neighbor Search

Join the DZone community and get the full member experience.

Join For FreeA k-nearest neighbor search identifies the top k nearest neighbors to a query. The problem is: given a dataset

Now, we will test this function on a random bidimensional dataset:

The red point is the query vector and the blue ones represent the data. The blue points surrounded by a black circle are the nearest neighbors.

*D*of vectors in a d-dimensional space and a query point*x*in the same space, find the closest point in*D*to*x*. The following function performs a k-nearest neighbor search using the euclidean distance:from numpy import random,argsort,sqrt from pylab import plot,show def knn_search(x, D, K): """ find K nearest neighbours of data among D """ ndata = D.shape[1] K = K if K < ndata else ndata # euclidean distances from the other points sqd = sqrt(((D - x[:,:ndata])**2).sum(axis=0)) idx = argsort(sqd) # sorting # return the indexes of K nearest neighbours return idx[:K]The function computes the euclidean distance between every point of

*D*and*x*then returns the indexes of the points for which the distance is smaller.Now, we will test this function on a random bidimensional dataset:

# knn_search test data = random.rand(2,200) # random dataset x = random.rand(2,1) # query point # performing the search neig_idx = knn_search(x,data,10) # plotting the data and the input point plot(data[0,:],data[1,:],'ob',x[0,0],x[1,0],'or') # highlighting the neighbours plot(data[0,neig_idx],data[1,neig_idx],'o', markerfacecolor='None',markersize=15,markeredgewidth=1) show()The result is as follows:

The red point is the query vector and the blue ones represent the data. The blue points surrounded by a black circle are the nearest neighbors.

Topics:

Published at DZone with permission of Giuseppe Vettigli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Comments