• Title/Summary/Keyword: data pattern

Search Result 8,369, Processing Time 0.035 seconds

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

  • Choi, Semok;Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.9
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

Conceptual Pattern Matching of Time Series Data using Hidden Markov Model (은닉 마코프 모델을 이용한 시계열 데이터의 의미기반 패턴 매칭)

  • Cho, Young-Hee;Jeon, Jin-Ho;Lee, Gye-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.5
    • /
    • pp.44-51
    • /
    • 2008
  • Pattern matching and pattern searching in time series data have been active issues in a number of disciplines. This paper suggests a novel pattern matching technology which can be used in the field of stock market analysis as well as in forecasting stock market trend. First, we define conceptual patterns, and extract data forming each pattern from given time series, and then generate learning model using Hidden Markov Model. The results show that the context-based pattern matching makes the matching more accountable and the method would be effectively used in real world applications. This is because the pattern for new data sequence carries not only the matching itself but also a given context in which the data implies.

An Analysis on the Design Factors for Pants Pattern - Focused on Crotch Region - (팬츠패턴의 설계요인 분석 - 밑위관련 항목을 중심으로 -)

  • Moon, Jee-Hyun;Jeon, Eun-Kyung
    • Fashion & Textile Research Journal
    • /
    • v.13 no.3
    • /
    • pp.382-389
    • /
    • 2011
  • Pants is an item of clothes of which physical suitability for improvement of appearance and movement is greatly emphasized. The pattern design of pants is very important because the fitness of crotch region, which is difficult to measure, is structurally essential in pants. This study gives detailed pattern development data for the future pattern design by considering design factors related with crotch region of pants and by suggesting related data. For this study, we measured and analyzed the measuring size of each parts of pattern and the design methods related with crotch region selecting 20 textbooks used in universities and institutes of higher education. Through this study, crotch region of pants pattern related design data were obtained and this result could be a primary information in development and education of pants pattern.

Recent Technique Analysis, Infant Commodity Pattern Analysis Scenario and Performance Analysis of Incremental Weighted Maximal Representative Pattern Mining (점진적 가중화 맥시멀 대표 패턴 마이닝의 최신 기법 분석, 유아들의 물품 패턴 분석 시나리오 및 성능 분석)

  • Yun, Unil;Yun, Eunmi
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.39-48
    • /
    • 2020
  • Data mining techniques have been suggested to find efficiently meaningful and useful information. Especially, in the big data environments, as data becomes accumulated in several applications, related pattern mining methods have been proposed. Recently, instead of analyzing not only static data stored already in files or databases, mining dynamic data incrementally generated in a real time is considered as more interesting research areas because these dynamic data can be only one time read. With this reason, researches of how these dynamic data are mined efficiently have been studied. Moreover, approaches of mining representative patterns such as maximal pattern mining have been proposed since a huge number of result patterns as mining results are generated. As another issue, to discover more meaningful patterns in real world, weights of items in weighted pattern mining have been used, In real situation, profits, costs, and so on of items can be utilized as weights. In this paper, we analyzed weighted maximal pattern mining approaches for data generated incrementally. Maximal representative pattern mining techniques, and incremental pattern mining methods. And then, the application scenarios for analyzing the required commodity patterns in infants are presented by applying weighting representative pattern mining. Furthermore, the performance of state-of-the-art algorithms have been evaluated. As a result, we show that incremental weighted maximal pattern mining technique has better performance than incremental weighted pattern mining and weighted maximal pattern mining.

Development of 2D Tight-fitting Pattern from 3D Scan Data (3D 스캔 데이터를 활용한 밀착 패턴원형 개발)

  • Jeong, Yeon-Hee;Hong, Kyung-Hi
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.30 no.1 s.149
    • /
    • pp.157-166
    • /
    • 2006
  • The human body, which is composed of concave and convex curvatures, makes it difficult to transfer into 2D patterns directly from 3D data. In previous studies. Jeong, et al.(2004) suggested the block method was fester and easier when dealing with the triangular patches of male's upper dress form. Although the block method is useful to make a pattern, the information(area, length, etc.) from a 2D pattern would be different depending on the direction of the block method. As a result horizontal and diagonal block methods were suggested as optimal methods for 2D tight-fitting patterns. These block methods were closer to the original area of the 3D scan data than the vertical block method. The total area of the 2D pattern obtained by the horizontal and diagonal block methods showed little differences. In case of the horizontal and diagonal block methods, the total error of the 2D pattern area ranged from $0.01\%\~0.25\%$. In comparing the length of the 2D pattern with that of the 3D scan data, the obtained 2D pattern was $0.1\~0.2cm$ shorter than the 3D scan data, which was within the acceptable range of errors in making clothes. 3D space distribution images between the body surface and the experimental clothing were also measured and $3\%$ enlargement of the original pattern was verified as the adequate adjustment.

Trend-based Sequential Pattern Discovery from Time-Series Data (시계열 데이터로부터의 경향성 기반 순차패턴 탐색)

  • 오용생;이동하;남도원;이전영
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.27-45
    • /
    • 2001
  • Sequential discovery from time series data has mainly concerned about events or item sets. Recently, the research has stated to applied to the numerical data. An example is sensor information generated by checking a machine state. The numerical data hardly have the same valuers while making patterns. So, it is important to extract suitable number of pattern features, which can be transformed to events or item sets and be applied to sequential pattern mining tasks. The popular methods to extract the patterns are sliding window and clustering. The results of these methods are sensitive to window sine or clustering parameters; that makes users to apply data mining task repeatedly and to interpret the results. This paper suggests the method to retrieve pattern features making numerical data into vector of an angle and a magnitude. The retrieved pattern features using this method make the result easy to understand and sequential patterns finding fast. We define an inclusion relation among pattern features using angles and magnitudes of vectors. Using this relation, we can fad sequential patterns faster than other methods, which use all data by reducing the data size.

  • PDF

User Modeling Using User Preference and User Life Pattern Based on Personal Bio Data and SNS Data

  • Song, Hyejin;Lee, Kihoon;Moon, Nammee
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.645-654
    • /
    • 2019
  • The purpose of this study was to collect and analyze personal bio data and social network services (SNS) data, derive user preference and user life pattern, and propose intuitive and precise user modeling. This study not only tried to conduct eye tracking experiments using various smart devices to be the ground of the recommendation system considering the attribute of smart devices, but also derived classification preference by analyzing eye tracking data of collected bio data and SNS data. In addition, this study intended to combine and analyze preference of the common classification of the two types of data, derive final preference by each smart device, and based on user life pattern extracted from final preference and collected bio data (amount of activity, sleep), draw the similarity between users using Pearson correlation coefficient. Through derivation of preference considering the attribute of smart devices, it could be found that users would be influenced by smart devices. With user modeling using user behavior pattern, eye tracking, and user preference, this study tried to contribute to the research on the recommendation system that should precisely reflect user tendency.

Light-weight Preservation of Access Pattern Privacy in Un-trusted Storage

  • Yang, Ka;Zhang, Jinsheng;Zhang, Wensheng;Qiao, Daji
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.5
    • /
    • pp.282-296
    • /
    • 2013
  • With the emergence of cloud computing, more and more sensitive user data are outsourced to remote storage servers. The privacy of users' access pattern to the data should be protected to prevent un-trusted storage servers from inferring users' private information or launching stealthy attacks. Meanwhile, the privacy protection schemes should be efficient as cloud users often use thin client devices to access the data. In this paper, we propose a lightweight scheme to protect the privacy of data access pattern. Comparing with existing state-of-the-art solutions, our scheme incurs less communication and computational overhead, requires significantly less storage space at the user side, while consuming similar storage space at the server. Rigorous proofs and extensive evaluations have been conducted to show that the proposed scheme can hide the data access pattern effectively in the long run after a reasonable number of accesses have been made.

  • PDF

Overload Criteria of Distribution Transformers Considering the Electric Consumption Patterns of Customers (수용가 전력 소비 패턴을 고려한 배전용 변압기 과부하 판정기준)

  • 윤상윤;김재철
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.53 no.9
    • /
    • pp.513-520
    • /
    • 2004
  • In the paper, we summarize the result of the experimental research for the overload criteria of domestic distribution transformers considering the electric consumption patterns of customers. For the basic characteristic data of distribution transformer overload, the actual experiments are accomplished. The field data of loads are surveyed from sample transformers for analyzing the consumption pattern of customer load. The load data acquisition devices are equipped, and the algorithm of load pattern classification is applied. In addition to this efforts, various load pattern data. in past are gathered. Then the representative load pattern of each customer type in domestic is extracted. The final results of overload criterions are presented as tabular form through the results of experiments and survey are combined. The field test of the experiment results is peformed using the special manufactured transformers, which can measure both the load and top-oil temperature of transformer. Through this, we verify that the results of field test are similar to the laboratory one and the Proposed overload criteria can be effectively applied to the real system.