Accelerating pairwise SimRank estimation over static and dynamic graphs

Wang, Yue; Chen, Lei; Che, Yulin; Luo, Qiong

doi:10.1007/s00778-018-0521-x

Accelerating pairwise SimRank estimation over static and dynamic graphs

Regular Paper
Published: 21 September 2018

Volume 28, pages 99–122, (2019)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Yue Wang ORCID: orcid.org/0000-0002-8618-9806¹,
Lei Chen¹,
Yulin Che¹ &
…
Qiong Luo¹

1005 Accesses
19 Citations
Explore all metrics

Abstract

Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people often issue SimRank queries in a pairwise manner, each of which needs to estimate an approximate SimRank value within a specified accuracy for a given pair of nodes. These pairwise SimRank queries are often processed on real-life graphs, which typically evolve over time, requiring efficient algorithms that can query pairwise SimRank under dynamic graph updates. However, current single-pair SimRank solutions are either static or inefficient in handling dynamic cases with good-quality results. Observing that the sample size is the major factor that determines the efficiency and the accuracy in Monte Carlo methods to estimate pairwise SimRank, in this paper, we propose three algorithms to query pairwise SimRank over static and dynamic graphs efficiently, by using different sample reduction strategies. The accuracy of our algorithms is guaranteed by the different invariants we propose for pairwise SimRank. We show that our algorithms outperform the state-of-the-art static and dynamic solutions for pairwise SimRank estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Dynamical SimRank search on time-varying networks

Article Open access 22 November 2017

Hierarchical All-Pairs SimRank Calculation

SimRank*: effective and scalable pairwise similarity search based on graph topology

Article Open access 11 January 2019

Notes

If $|In(a)|=0$, the SimRank score between a and any other node is 0.
Sometimes, the PSE problem is also referred as the single-pair SimRank problem [7, 14, 21].
$\delta $ is the failure probability.
d is the average in-degree of G.
The details of the derivation of the concentration inequality, which determines N, are shown in Sect. 3.
Since $N=\frac{c^{2}\ln {\frac{2}{\delta }}}{2\epsilon ^{2}}= 0.6^{2} \times \frac{\ln {\frac{2}{0.01}}}{2 * (0.01)^{2}} \approx 9537$.
We assume $\epsilon $ is a rationale number, instead of a real number; otherwise, $\epsilon $ is uncountable and the problem is undecidable.
Mathematically, we can view $\mathbf {p} = \overrightarrow{vec}(\mathbf {P})$ and $\mathbf {p} = \overrightarrow{vec}(\mathbf {R})$, where $\mathbf {p}, \mathbf {r} \in {\mathbb {R}}^{n^{2} \times 1}$ and $\mathbf {P},\mathbf {R} \in {\mathbb {R}}^{n \times n}$.
https://snap.stanford.edu/data/index.html.
http://konect.uni-koblenz.de/networks/.
If both algorithms A and B achieve the $\epsilon $-bound, and B has a smaller ME than A, we cannot conclude that B has higher accuracy than A, since $\epsilon $ is a predefined accuracy parameter. In contrast, this indicates that B may have unnecessary redundant computational cost.
Since the ground truth all-pairs SimRank scores of these four small data sets can be computed within several minutes, it would be unaffordable to run the algorithm which has a total running time longer than 1 h.

References

Abbassi, Z., Mirrokni, V.S.: A recommender system based on local random walks and spectral methods. In: WebKDD/SNA-KDD (2007). https://doi.org/10.1145/1348549.1348561
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: FOCS (2006)
Antonellis, I., Garcia-Molina, H., Chang, C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421, (2008). http://www.vldb.org/pvldb/1/1453903.pdf
Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Onizuka, M.: Efficient search algorithm for SimRank. In: ICDE (2013)
He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: KDD (2010). https://doi.org/10.1145/1835804.1835874
He, J., Liu, H., Yu, J.X., Li, P., He, W., Du, X.: Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst. 42, 107–122 (2014). https://doi.org/10.1016/j.is.2013.12.008
Article Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963). http://www.jstor.org/stable/2282952
Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, 20–24 May 2003, pp. 271–279 (2003). https://doi.org/10.1145/775152.775191
Jiang, M., Fu, A.W., Wong, R.C., Wang, K.: READS: a random walk approach for efficient and accurate dynamic SimRank. PVLDB 10(9), 937–948 (2017)
Google Scholar
Kusumoto, M., Maehara, T., Kawarabayashi, K: Scalable similarity search for SimRank. In: SIGMOD (2014). https://doi.org/10.1145/2588555.2610526
Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT (2010a). https://doi.org/10.1145/1739041.1739098
Li, P., Liu, H., Yu, J.X., He, J., Du, X.: Fast single-pair SimRank computation. In: Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, pp. 571–582 (2010b)
Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)
Google Scholar
Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)
Article Google Scholar
Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Lu, J.: Probesim: scalable single-source and top-k SimRank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017). http://www.vldb.org/pvldb/vol11/p14-liu.pdf
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for SimRank computation. VLDB 1(1), 422–433 (2008)
Google Scholar
Lofgren, P., Banerjee, S., Goel, A.: Personalized PageRank estimation and search: a bidirectional approach. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ACM, New York (WSDM ’16), pp. 163–172 (2016). https://doi.org/10.1145/2835776.2835823
Lu, J., Gong, Z., Lin, X.: A novel and fast SimRank algorithm. IEEE Trans. Knowl. Data Eng. (2017). https://doi.org/10.1109/TKDE.2016.2626282
Google Scholar
Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient SimRank computation via linearization (2014). CoRR arXiv:1411.7228
Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network. In: Proceedings of the First Workshop on Online Social Networks (WOSN 2008), Seattle, 17–22 Aug 2008, pp. 25–30 (2008). https://doi.org/10.1145/1397735.1397742
Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for SimRank over large dynamic graphs. PVLDB 8(8), 838–849 (2015)
Google Scholar
Spirin, N., Han, J.: Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl. 13(2), 50–64 (2012)
Article Google Scholar
Tao, W., Yu, M., Li, G.: Efficient top-k SimRank-based similarity join. PVLDB 8(3):317–328, (2014). http://www.vldb.org/pvldb/vol8/p317-tao.pdf
Tian, B., Xiao, X.: Sling: A near-optimal index structure for SimRank. SIGMOD (2016).https://doi.org/10.1145/2882903.2915243
Wang, Y., Lian, X., Chen, L.: Efficient SimRank tracking in dynamic graphs. In: ICDE (2018)
Yin, X., Han, J., Yu, P.S.: Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB (2006)
Yoon, M., Jin, W., Kang, U.: Fast and accurate random walk with restart on dynamic graphs with guarantees (2017). CoRR arXiv:1712.00595
Yu, W., McCann, J.A.: Efficient partial-pairs SimRank search for large networks. PVLDB 8(5), 569–580 (2015)
Google Scholar
Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. WWW 15(3) (2012). https://doi.org/10.1007/s11280-010-0100-6
Yu, W., Lin, X., Zhang, W.: Towards efficient SimRank computation on large networks. In: ICDE, pp. 601–612 (2013a). https://doi.org/10.1109/ICDE.2013.6544859
Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013b)
Google Scholar
Yu, W., Lin, X., Zhang, W.: Fast incremental SimRank on link-evolving graphs. In: ICDE, pp. 304–315 (2014). https://doi.org/10.1109/ICDE.2014.6816660
Yu, W., Lin, X., Zhang, W., McCann, J.A.: Dynamical simrank search on time-varying networks. VLDB J. 27(1), 79–104 (2018). https://doi.org/10.1007/s00778-017-0488-z
Article Google Scholar
Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM, pp. 553–562 (2009)
Zheng, W., Zou, L., Chen, L., Zhao, D.: Efficient simrank-based similarity join. ACM Trans. Database Syst. 42(3), 16:1–16:37 (2017). https://doi.org/10.1145/3083899
Article MathSciNet Google Scholar

Download references

Acknowledgements

Yue Wang and Lei Chen are supported in part by the Hong Kong RGC GRF Project 16214716, National Grand Fundamental Research 973 Program of China under Grant 2014CB340303, the National Science Foundation of China (NSFC) under Grant No. 61729201, Science and Technology Planning Project of Guangdong Province, China, No. 2015B010110006, Huawei Co.Ltd Collaboration Project, YBCB2009041-45, Hong Kong ITC ITF grants ITS/391/15FX and ITS/212/16FP, Microsoft Research Asia Collaborative Research Grant, and WeChat-HKUST Joint Lab on Artificial Intelligence Technology, RDC-17182280. Yulin Che and Qiong Luo are supported in part by grants 16206414 from the Hong Kong Research Grants Council and MRA11EG01 from Microsoft.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong SAR, China
Yue Wang, Lei Chen, Yulin Che & Qiong Luo

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Che
View author publications
You can also search for this author in PubMed Google Scholar
Qiong Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Chen, L., Che, Y. et al. Accelerating pairwise SimRank estimation over static and dynamic graphs. The VLDB Journal 28, 99–122 (2019). https://doi.org/10.1007/s00778-018-0521-x

Download citation

Received: 14 April 2018
Revised: 09 August 2018
Accepted: 12 September 2018
Published: 21 September 2018
Issue Date: 01 February 2019
DOI: https://doi.org/10.1007/s00778-018-0521-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating pairwise SimRank estimation over static and dynamic graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamical SimRank search on time-varying networks

Hierarchical All-Pairs SimRank Calculation

SimRank*: effective and scalable pairwise similarity search based on graph topology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Accelerating pairwise SimRank estimation over static and dynamic graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamical SimRank search on time-varying networks

Hierarchical All-Pairs SimRank Calculation

SimRank*: effective and scalable pairwise similarity search based on graph topology

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation