Downloads

On the Use of Side Information for Text Mining using Clustering and Classification Techniques-A Survey

Authors

Subamanikandan A, Arulmurugan R1

Abstract

Text mining application, side information is available along with text documents. Such side information may be contain different kinds, such as links in the document, document provenance information, user-access behavior from web logs or other non-textual attributes. Such attributes may contain large amount of information in the clustering purposes. However, the relative information is difficult to estimate, when some of information is noisy data. In such cases, it can be risky to incorporate side-information into the mining process, because it can either improve the quality of the representation for the mining process, or can add noise to the process. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We then show how to extend the approach to the classification problem.

Article Details

Published

2014-11-28

Section

Articles

How to Cite

On the Use of Side Information for Text Mining using Clustering and Classification Techniques-A Survey. (2014). International Journal of Engineering and Computer Science, 3(11). http://www.ijecs.in/index.php/ijecs/article/view/2305