iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/2939672.2939761
Keeping it Short and Simple | Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining skip to main content
10.1145/2939672.2939761acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns

Published: 13 August 2016 Publication History

Abstract

We study how to obtain concise descriptions of discrete multivariate sequential data. In particular, how to do so in terms of rich multivariate sequential patterns that can capture potentially highly interesting (cor)relations between sequences. To this end we allow our pattern language to span over the domains (alphabets) of all sequences, allow patterns to overlap temporally, as well as allow for gaps in their occurrences. We formalise our goal by the Minimum Description Length principle, by which our objective is to discover the set of patterns that provides the most succinct description of the data. To discover high-quality pattern sets directly from data, we introduce Ditto, a highly efficient algorithm that approximates the ideal result very well.
Experiments show that Ditto correctly discovers the patterns planted in synthetic data. Moreover, it scales favourably with the length of the data, the number of attributes, the alphabet sizes. On real data, ranging from sensor networks to annotated text, Ditto discovers easily interpretable summaries that provide clear insight in both the univariate and multivariate structure.

References

[1]
R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, Los Alamitos, CA, USA, 1995. IEEE Computer Society.
[2]
R. Bathoorn, A. Koopman, and A. Siebes. Reducing the frequent pattern set. In ICDM-Workshop, pages 1--5, 2006.
[3]
R. Bertens and A. Siebes. Characterising seismic data. In SDM, pages 884--892. SIAM, 2014.
[4]
K. Budhathoki and J. Vreeken. The difference and the norm -- characterising similarities and differences between databases. In ECML PKDD. Springer, 2015.
[5]
Y.-C. Chen, J.-C. Jiang, W.-C. Peng, and S.-Y. Lee. An efficient algorithm for mining time interval-based patterns in large database. In CIKM, pages 49--58. ACM, 2010.
[6]
B. Chiu, E. Keogh, and S. Lonardi. Probabilistic discovery of time series motifs. In KDD, pages 493--498. ACM, 2003.
[7]
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience New York, 2006.
[8]
P. Grünwald. The Minimum Description Length Principle. MIT Press, 2007.
[9]
Y. He, J. Wang, and L. Zhou. Efficient incremental mining of frequent sequence generators. In DASFAA, pages 168--182, 2011.
[10]
H. T. Lam, F. Mörchen, D. Fradkin, and T. Calders. Mining compressing sequential patterns. In SDM, 2012.
[11]
H. T. Lam, F. Mörchen, D. Fradkin, and T. Calders. Mining compressing sequential patterns. Statistical Analysis and Data Mining, 7(1):34--52, 2014.
[12]
J. Lin, E. Keogh, P. Patel, and S. Lonardi. Finding motifs in time series. In KDD-TDM, pages 53--68, 2002.
[13]
J. Lin, E. Keogh, L. Wei, and S. Lonardi. Experiencing sax: A novel symbolic representation of time series. Data Min. Knowl. Disc., 15(2):107--144, Oct. 2007.
[14]
H. Mannila and C. Meek. Global partial orders from sequential data. In KDD, pages 161--168, 2000.
[15]
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. In KDD, pages 181--192, 1994.
[16]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc., 1(3):259--289, 1997.
[17]
D. Minnen, C. Isbell, I. Essa, and T. Starner. Detecting subdimensional motifs: An efficient algorithm for generalized multivariate pattern discovery. In ICDM, pages 601--606. IEEE, 2007.
[18]
F. Mörchen and A. Ultsch. Efficient mining of understandable patterns from multivariate interval time series. Data Min. Knowl. Disc., 15(2):181--215, 2007.
[19]
T. Oates and P. R. Cohen. Searching for structure in multiple streams of data. In ICML, pages 346--354, 1996.
[20]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416. ACM, 1999.
[21]
J. Pei, J. Han, B. Mortazaviasl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE TKDE, 16(11):1424--1440, 2004.
[22]
A. Siebes, J. Vreeken, and M. van Leeuwen. Item sets that compress. In SDM, pages 393--404. SIAM, 2006.
[23]
K. Smets and J. Vreeken. The odd one out: Identifying and characterising anomalies. In SDM, pages 804--815. SIAM, 2011.
[24]
K. Smets and J. Vreeken. Slim: Directly mining descriptive patterns. In SDM, pages 236--247. SIAM, 2012.
[25]
Y. Tanaka, K. Iwamoto, and K. Uehara. Discovery of time-series motif from multi-dimensional data based on mdl principle. Mach. Learn., 58(2--3):269--300, 2005.
[26]
N. Tatti. Significance of episodes based on minimal windows. In ICDM, pages 513--522, 2009.
[27]
N. Tatti and B. Cule. Mining closed episodes with simultaneous events. In KDD, pages 1172--1180, 2011.
[28]
N. Tatti and J. Vreeken. The long and the short of it: Summarizing event sequences with serial episodes. In KDD, pages 462--470. ACM, 2012.
[29]
U. Vespier, A. J. Knobbe, S. Nijssen, and J. Vanschoren. MDL-based analysis of time series at multiple time-scales. In ECML PKDD, pages 371--386. Springer, 2012.
[30]
U. Vespier, S. Nijssen, and A. Knobbe. Mining characteristic multi-scale motifs in sensor-based time series. In CIKM, pages 2393--2398. ACM, 2013.
[31]
J. Vreeken. Causal inference by direction of information. In SDM. SIAM, 2015.
[32]
J. Vreeken, M. van Leeuwen, and A. Siebes. Krimp: Mining itemsets that compress. Data Min. Knowl. Disc., 23(1):169--214, 2011.
[33]
J. Wang and J. Han. Bide: Efficient mining of frequent closed sequences. In ICDE, page 79, 2004.
[34]
G. Webb and J. Vreeken. Efficient discovery of the most interesting associations. ACM TKDD, 8(3):1--31, 2014.
[35]
C.-W. Wu, Y.-F. Lin, P. S. Yu, and V. S. Tseng. Mining high utility episodes in complex event sequences. In KDD, pages 536--544. ACM, 2013.

Cited By

View all
  • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
  • (2024)IMVis: Visual analytics for influence maximization algorithm evaluation in hypergraphsVisual Informatics10.1016/j.visinf.2024.04.0068:2(13-26)Online publication date: Jun-2024
  • (2023)Visual Analytics of Co-Occurrences to Discover Subspaces in Structured DataACM Transactions on Interactive Intelligent Systems10.1145/357903113:2(1-49)Online publication date: 19-Jun-2023
  • Show More Cited By

Index Terms

  1. Keeping it Short and Simple: Summarising Complex Event Sequences with Multivariate Patterns

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2016
    2176 pages
    ISBN:9781450342322
    DOI:10.1145/2939672
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MDL
    2. multivariate patterns
    3. multivariate sequences
    4. summarisation

    Qualifiers

    • Research-article

    Funding Sources

    • Dutch national program COMMIT
    • Cluster of Excellence 'Multimodal Computing and Interaction' within the Excellence Initiative of the German Federal Government

    Conference

    KDD '16
    Sponsor:

    Acceptance Rates

    KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Breadth-First Search Approach for Mining Serial Episodes with Simultaneous EventsProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632445(36-44)Online publication date: 4-Jan-2024
    • (2024)IMVis: Visual analytics for influence maximization algorithm evaluation in hypergraphsVisual Informatics10.1016/j.visinf.2024.04.0068:2(13-26)Online publication date: Jun-2024
    • (2023)Visual Analytics of Co-Occurrences to Discover Subspaces in Structured DataACM Transactions on Interactive Intelligent Systems10.1145/357903113:2(1-49)Online publication date: 19-Jun-2023
    • (2023)Efficient Depth-First Search Approach for Mining Injective General EpisodesProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571012(1-9)Online publication date: 4-Jan-2023
    • (2023)FlexEvent: going beyond Case‐Centric Exploration and Analysis of Multivariate Event SequencesComputer Graphics Forum10.1111/cgf.1482042:3(161-172)Online publication date: 27-Jun-2023
    • (2023)Interactive Transformations and Visual Assessment of Noisy Event Sequences: An Application in En-Route Air Traffic Control2023 IEEE 16th Pacific Visualization Symposium (PacificVis)10.1109/PacificVis56936.2023.00017(92-101)Online publication date: Apr-2023
    • (2022)TacticFlow: Visual Analytics of Ever-Changing Tactics in Racket SportsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311483228:1(835-845)Online publication date: 1-Jan-2022
    • (2022)Discovering Representative Attribute-stars via Minimum Description Length2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00010(68-80)Online publication date: May-2022
    • (2022)The minimum description length principle for pattern mining: a surveyData Mining and Knowledge Discovery10.1007/s10618-022-00846-z36:5(1679-1727)Online publication date: 4-Jul-2022
    • (2022)Omen: discovering sequential patterns with reliable prediction delaysKnowledge and Information Systems10.1007/s10115-022-01660-164:4(1013-1045)Online publication date: 5-Mar-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media