Analysis and Experimental Evaluation of a Simple Algorithm for Collaborative Filtering in Planted Partition Models: Extended Abstract

Abstract

We introduce a random planted model of bi-categorical data to model the problem of collaborative filtering or categorical clustering. We adapt the ideas of an algorithm due to Condon and Karp [4] to develop a simple linear time algorithm to discover the underlying hidden structure of a graph generated according to the planted model with high probability. We also give applications to the probabilistic analysis of Latent Semantic Indexing (LSI) in the probabilistic corpus models introduced by Papadimitriou et al [12]. We carry out an experimental analysis that shows that the algorithm might work quite well in practice.

Topics

    1 Figures and Tables

    Download Full PDF Version (Non-Commercial Use)