• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.028 seconds

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

An Exploratory Study on Smart-Phone and Service Convergence (스마트폰과 서비스 컨버전스에 대한 탐색적 연구)

  • Rho, Mi-Jung;Kim, Jin-Hwa;Lee, Jae-Beom
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.59-77
    • /
    • 2010
  • The purpose of this study is to examine the relationship between the smart-phone and the existing service convergence to find out the future direction in convergence pattern in e-business. To analyze the data and to derive the result, the association rules are applied. As a result, the findings are as followings. Firstly, it is observed that the usage patterns of smart-phone and the existing service convergence are very similar. This means that the convergence of smart-phone can be predicted through the usage pattern of the existing users. Secondly, through the analysis on the convergence patterns of smart-phone usages and the existing services, the smart-phone's link to home networking and office equipments can significantly conform to the user's requirements. It is meaningful that this research has newly approached to the future direction of e-business and the future convergence paradigm by analyzing the relationship between the usage patterns of smart-phone users and the existing service convergence.

A Study on the CBR Pattern using Similarity and the Euclidean Calculation Pattern (유사도와 유클리디안 계산패턴을 이용한 CBR 패턴연구)

  • Yun, Jong-Chan;Kim, Hak-Chul;Kim, Jong-Jin;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.875-885
    • /
    • 2010
  • CBR (Case-Based Reasoning) is a technique to infer the relationships between existing data and case data, and the method to calculate similarity and Euclidean distance is mostly frequently being used. However, since those methods compare all the existing and case data, it also has a demerit that it takes much time for data search and filtering. Therefore, to solve this problem, various researches have been conducted. This paper suggests the method of SE(Speed Euclidean-distance) calculation that utilizes the patterns discovered in the existing process of computing similarity and Euclidean distance. Because SE calculation applies the patterns and weight found during inputting new cases and enables fast data extraction and short operation time, it can enhance computing speed for temporal or spatial restrictions and eliminate unnecessary computing operation. Through this experiment, it has been found that the proposed method improves performance in various computer environments or processing rate more efficiently than the existing method that extracts data using similarity or Euclidean method does.

A Study on Learning-Path Individualization System for Improving Learning Effects in Web-based Education (웹 기반 교육에서 학습효과 향상을 위한 학습경로 개인화 시스템에 관한 연구)

  • Baek, Jang-hyeon;Kim, Yung-sik
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.213-222
    • /
    • 2004
  • Today's Web-based teaching-learning is developing in the direction that learners select and organize the contents, time and order of learning by themselves. That is, it is evolving to provide teaching-learning environment adaptive to individual learners' characteristics(their level of knowledge, pattern of study. areas of interest). This study analyzed learners' learning paths among the variables of learners' characteristics considered important in Web-based teaching- learning process using the Apriori algorithm and grouped learners who had similar learning paths. Based on the result, the author designed and developed a learning-path individualization system In order to provide learners with learning paths, Interface, the progress of learning etc. The proposed system is expected to provide optimal learning environment fit for learners' pattern of study and to be enhancing individual learner's learning effects

Mining Search Keywords for Improving the Accuracy of Entity Search (엔터티 검색의 정확성을 높이기 위한 검색 키워드 마이닝)

  • Lee, Sun Ku;On, Byung-Won;Jung, Soo-Mok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.451-464
    • /
    • 2016
  • Nowadays, entity search such as Google Product Search and Yahoo Pipes has been in the spotlight. The entity search engines have been used to retrieve web pages relevant with a particular entity. However, if an entity (e.g., Chinatown movie) has various meanings (e.g., Chinatown movies, Chinatown restaurants, and Incheon Chinatown), then the accuracy of the search result will be decreased significantly. To address this problem, in this article, we propose a novel method that quantifies the importance of search queries and then offers the best query for the entity search, based on Frequent Pattern (FP)-Tree, considering the correlation between the entity relevance and the frequency of web pages. According to the experimental results presented in this paper, the proposed method (59% in the average precision) improved the accuracy five times, compared to the traditional query terms (less than 10% in the average precision).

Shear behavior of non-persistent joints in concrete and gypsum specimens using combined experimental and numerical approaches

  • Haeri, Hadi;Sarfarazi, V.;Zhu, Zheming;Hokmabadi, N. Nohekhan;Moshrefifar, MR.;Hedayat, A.
    • Structural Engineering and Mechanics
    • /
    • v.69 no.2
    • /
    • pp.221-230
    • /
    • 2019
  • In this paper, shear behavior of non-persistent joint surrounded in concrete and gypsum layers has been investigated using experimental test and numerical simulation. Two types of mixture were prepared for this study. The first type consists of water and gypsum that were mixed with a ratio of water/gypsum of 0.6. The second type of mixture, water, sand and cement were mixed with a ratio of 27%, 33% and 40% by weight. Shear behavior of a non-persistent joint embedded in these specimens is studied. Physical models consisting of two edge concrete layers with dimensions of 160 mm by 130 mm by 60 mm and one internal gypsum layer with the dimension of 16 mm by 13 mm by 6 mm were made. Two horizontal edge joints were embedded in concrete beams and one angled joint was created in gypsum layer. Several analyses with joints with angles of $0^{\circ}$, $30^{\circ}$, and $60^{\circ}$ degree were conducted. The central fault places in 3 different positions. Along the edge joints, 1.5 cm vertically far from the edge joint face and 3 cm vertically far from the edge joint face. All samples were tested in compression using a universal loading machine and the shear load was induced because of the specimen geometry. Concurrent with the experiments, the extended finite element method (XFEM) was employed to analyze the fracture processes occurring in a non-persistent joint embedded in concrete and gypsum layers using Abaqus, a finite element software platform. The failure pattern of non-persistent cracks (faults) was found to be affected mostly by the central crack and its configuration and the shear strength was found to be related to the failure pattern. Comparison between experimental and corresponding numerical results showed a great agreement. XFEM was found as a capable tool for investigating the fracturing mechanism of rock specimens with non-persistent joint.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.449-458
    • /
    • 2010
  • As occasion demands to obtain efficient information from many documents and reviews on the Internet in many kinds of fields, automatic classification of opinion or thought is required. These automatic classification is called sentiment classification, which can be divided into three steps, such as subjective expression classification to extract subjective sentences from documents, sentiment classification to classify whether the polarity of documents is positive or negative, and strength classification to classify whether the documents have weak polarity or strong polarity. The latest studies in Opinion Mining have used N-gram words, lexical phrase pattern, and syntactic phrase pattern, etc. They have not used single word as feature for classification. Especially, patterns have been used frequently as feature because they are more flexible than N-gram words and are also more deterministic than single word. Theses studies are mainly concerned with English, other studies using patterns for Korean are still at an early stage. Although Korean has a slight difference in the meaning between predicates by the change of endings, which is 'Eomi' in Korean, of declinable words, the earlier studies about Korean opinion classification removed endings from predicates only to extract stems. Finally, this study introduces the earlier studies and methods using pattern for English, uses extracted sentimental patterns from Korean documents, and classifies polarities of these documents. In this paper, it also analyses the influence of the change of endings on performances of opinion classification.

  • PDF