Factorized similarity learning in networks
2014 IEEE International Conference on Data Mining, 2014•ieeexplore.ieee.org
The problem of similarity learning is relevant to many data mining applications, such as
recommender systems, classification, and retrieval. This problem is particularly challenging
in the context of networks, which contain different aspects such as the topological structure,
content, and user supervision. These different aspects need to be combined effectively, in
order to create a holistic similarity function. In particular, while most similarity learning
methods in networks such as Sim Rank utilize the topological structure, the user supervision …
recommender systems, classification, and retrieval. This problem is particularly challenging
in the context of networks, which contain different aspects such as the topological structure,
content, and user supervision. These different aspects need to be combined effectively, in
order to create a holistic similarity function. In particular, while most similarity learning
methods in networks such as Sim Rank utilize the topological structure, the user supervision …
The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as Sim Rank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a Factorized Similarity Learning (FSL) is proposed to integrate the link, node content, and user supervision into an uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge-loss alternatively. To facilitate efficient computation on large scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA datasets. The results show that FSL is robust, efficient, and outperforms the state-of-the-art.
ieeexplore.ieee.org