Downloads
Identification of level of resemblance between web based documents
Authors
Abstract
One of the biggest challenges today on web is to deal with the “Big data” problem. Finding documents which are near duplicates of each other is another challenge which is in turn brought up by Big data. In this paper the author focuses on finding out the near duplicate documents using a technique called shingling. This paper also presents the different types of shingling that can be used. Further, a measure called the Jaccard coefficient is discussed which can be used to judge the degree of similarity between the documents
Article Details
Published
2013-11-30
Issue
Section
Articles
How to Cite
Identification of level of resemblance between web based documents. (2013). International Journal of Engineering and Computer Science, 2(11). http://www.ijecs.in/index.php/ijecs/article/view/2123