We introduce a random planted model of bi-categorical data to model the problem of collaborative filtering or categorical clustering. We adapt the ideas of an algorithm due to Condon and Karp  to develop a simple linear time algorithm to discover the underlying hidden structure of a graph generated according to the planted model with high probability. We also give applications to the probabilistic analysis of Latent Semantic Indexing (LSI) in the probabilistic corpus models introduced by Papadimitriou et al . We carry out an experimental analysis that shows that the algorithm might work quite well in practice.
Download Full PDF Version (Non-Commercial Use)