Downloads

Identification of level of resemblance between web based documents

Authors

Surbhi, Kakar,1 | Surbhi, Kakar,2

Abstract

One of the biggest challenges today on web is to deal with the “Big data” problem. Finding documents which are near duplicates of each other is another challenge which is in turn brought up by Big data. In this paper the author focuses on finding out the near duplicate documents using a technique called shingling. This paper also presents the different types of shingling that can be used. Further, a measure called the Jaccard coefficient is discussed which can be used to judge the degree of similarity between the documents

Article Details

Published

2013-11-30

Section

Articles

How to Cite

Identification of level of resemblance between web based documents. (2013). International Journal of Engineering and Computer Science, 2(11). http://www.ijecs.in/index.php/ijecs/article/view/2123