• Title/Summary/Keyword: Pattern Mining

Search Result 622, Processing Time 0.03 seconds

Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN

  • Liu, Gaoyang;Niu, Yanbo;Zhao, Weijian;Duan, Yuanfeng;Shu, Jiangpeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • The deployment of advanced structural health monitoring (SHM) systems in large-scale civil structures collects large amounts of data. Note that these data may contain multiple types of anomalies (e.g., missing, minor, outlier, etc.) caused by harsh environment, sensor faults, transfer omission and other factors. These anomalies seriously affect the evaluation of structural performance. Therefore, the effective analysis and mining of SHM data is an extremely important task. Inspired by the deep learning paradigm, this study develops a novel generative adversarial network (GAN) and convolutional neural network (CNN)-based data anomaly detection approach for SHM. The framework of the proposed approach includes three modules : (a) A three-channel input is established based on fast Fourier transform (FFT) and Gramian angular field (GAF) method; (b) A GANomaly is introduced and trained to extract features from normal samples alone for class-imbalanced problems; (c) Based on the output of GANomaly, a CNN is employed to distinguish the types of anomalies. In addition, a dataset-oriented method (i.e., multistage sampling) is adopted to obtain the optimal sampling ratios between all different samples. The proposed approach is tested with acceleration data from an SHM system of a long-span bridge. The results show that the proposed approach has a higher accuracy in detecting the multi-pattern anomalies of SHM data.

Evaluation of blasting vibration with center-cut methods for tunnel excavation

  • Lee, Seung-Joong;Kim, Byung-Ryeol;Choi, Sung-Oong;Kim, Nam-Soo
    • Geomechanics and Engineering
    • /
    • v.30 no.5
    • /
    • pp.423-435
    • /
    • 2022
  • Ground vibration generated repeatedly in blasting tunnel excavation sites is known to be one of the major hazards induced by blasting operations. Various studies have been conducted to minimize these hazards, both theoretical and empirical methods using electronic detonator, the deck charge method, the center-cut method among others Among these various existing methods for controlling the ground vibration, in this study, we investigated the cut method. In particular, we analyzed and compared the V-cut method, which is commonly used in tunnel blasting, to the double-drilled parallel method, which has recently been introduced in tunnel excavation site. To understand the rock fragmentation efficiency as well as the ground vibration controllability of the two methods, we performed in-situ field blasting tests with both cut methods at a tunnel excavation site. Additionally, numerical analysis by FLAC3D has been executed for a better understanding of fracture propagation pattern and ground vibration generation by each cut method. Ground vibration levels, by PPVs measured in field blasting tests and PPVs estimated in numerical simulations, showed a lower value in the double-drilled parallel compared with the V-cut method, although the exact values are quite different in field measurement and numerical estimation.

Power Consumption Patterns Analysis Using Expectation-Maximization Clustering Algorithm and Emerging Pattern Mining (기대치-최대화 군집 알고리즘과 출현 패턴 마이닝을 이용한 전력 소비 패턴 분석)

  • Jin Hyoung Park;Heon Gyu Lee;Jin-Ho Shin;Keun Ho Ryu;Hiseok Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.261-264
    • /
    • 2008
  • 전력 회사의 효율적인 운용과 전력 시장에서의 경쟁을 위하여 고객의 전력 소비 패턴 분석 및 정확한 예측이 이루어져야 한다. 이를 위해서 이 논문에서는 원격 검침 시스템에 의한 전국의 고압 고객 데이터를 대상으로 고객의 전력 소비 패턴을 정확히 예측할 수 있는 마이닝 기법을 제안하였다. 먼저, 국내 계약종별 고객 특성에 맞는 부하 패턴의 정확한 구별을 위한 9가지의 특징 벡터를 추출하였고, 기대치-최대화 군집화 알고리즘을 사용하여 고객의 34개 대표 부하프로파일을 생성하였다. 마지막으로 추출된 특징 벡터로부터 각 대표 프로파일에 대한 출현 패턴 기반의 분류 모델을 구성하여 고객의 전력 소비 패턴을 분류하였다. 국내 원격 검침 시스템에 의해 측정된 총 3,895명의 고압 고객 데이터에 대한 실험 결과 약 91%의 분류 정확성을 보였다.

Natural frequency analysis of joined conical-cylindrical-conical shells made of graphene platelet reinforced composite resting on Winkler elastic foundation

  • Xiangling Wang;Xiaofeng Guo;Masoud Babaei;Rasoul Fili;Hossein Farahani
    • Advances in nano research
    • /
    • v.15 no.4
    • /
    • pp.367-384
    • /
    • 2023
  • Natural frequency behavior of graphene platelets reinforced composite (GPL-RC) joined truncated conical-cylindrical- conical shells resting on Winkler-type elastic foundation is presented in this paper for the first time. The rule of mixture and the modified Halpin-Tsai approach are applied to achieve the mechanical properties of the structure. Four different graphene platelets patterns are considered along the thickness of the structure such as GPLA, GPLO, GPLX, GPLUD. Finite element procedure according to Rayleigh-Ritz formulation has been used to solve 2D-axisymmetric elasticity equations. Application of 2D axisymmetric elasticity theory allows thickness stretching unlike simple shell theories, and this gives more accurate results, especially for thick shells. An efficient parametric investigation is also presented to show the effects of various geometric variables, three different boundary conditions, stiffness of elastic foundation, dispersion pattern and weight fraction of GPLs nanofillers on the natural frequencies of the joined shell. Results show that GPLO and BC3 provide the most rigidity that cause the most natural frequencies among different BCs and GPL patterns. Also, by increasing the weigh fraction of nanofillers, the natural frequencies will increase up to 200%.

Power Load Pattern Classification from AMR Data (AMR 데이터에서의 전력 부하 패턴 분류)

  • Piao, Minghao;Park, Jin-Hyung;Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.231-234
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in load demand data. The main aim of our work is to forecast customers' contract information from capacity of daily power consumption patterns. According to the result, we try to evaluate the contract information's suitability. The proposed our approach consists of three stages: (i) data preprocessing: noise or outlier is detected and removed (ii) cluster analysis: SOMs clustering is used to create load patterns and the representative load profiles and (iii) classification: we applied the K-NNs classifier in order to predict the customers' contract information base on power consumption patterns. According to the our proposed methodology, power load measured from AMR(automatic meter reading) system, as well as customer indexes, were used as inputs. The output was the classification of representative load profiles (or classes). Lastly, in order to evaluate KNN classification technique, the proposed methodology was applied on a set of high voltage customers of the Korea power system and the results of our experiments was presented.

DATA MININING APPROACH TO PARAMETRIC COST ESTIMATE IN EARLY DESIGN STAGE AND ANALYTICAL CHARACTERIZATION ON OLAP (ON-LINE ANALYTICAL PROCESSING)

  • JaeHo Cho;HyunKyun Jung;JaeYoul Chun
    • International conference on construction engineering and project management
    • /
    • 2011.02a
    • /
    • pp.176-181
    • /
    • 2011
  • A role of cost modeler is that of facilitating design process by the systematic application of cost factors so as to maintain sensible and economic relationships between cost, quantity, utility and appearance. These relationships help to achieve the client's requirements within an agreed budget. The purpose of this study is to develop a parametric cost estimating model for the early design stage by using the multi-dimensional system of OLAP (On-line Analytical Processing) based on the case of quantity data related to architectural design features. The parametric cost estimating models have been adopted to support decision making in the early design stage. These models typically use a similar instance or a pattern of historical case. In order to effectively use this type of data model, it is required to set data classification and prediction methods. One of the methods is to find the similar class in line with attribute selection measure in the multi-dimensional data model. Therefore, this research is to analyze the relevance attribute influenced by architectural design features with the subject of case-based quantity data used for the parametric cost estimating model. The relevance attributes can be analyzed by Analytical Characterization. It helps determine what attributes to be included in the OLAP multi-dimension.

  • PDF

Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets

  • Mehmet F. Karaca
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.591-609
    • /
    • 2024
  • In this study, preprocessings with all combinations were examined in terms of the effects on decreasing word number, shortening the duration of the process and the classification success in balanced and imbalanced datasets which were unbalanced in different ratios. The decreases in the word number and the processing time provided by preprocessings were interrelated. It was seen that more successful classifications were made with Turkish datasets and English datasets were affected more from the situation of whether the dataset is balanced or not. It was found out that the incorrect classifications, which are in the classes having few documents in highly imbalanced datasets, were made by assigning to the class close to the related class in terms of topic in Turkish datasets and to the class which have many documents in English datasets. In terms of average scores, the highest classification was obtained in Turkish datasets as follows: with not applying lowercase, applying stemming and removing stop words, and in English datasets as follows: with applying lowercase and stemming, removing stop words. Applying stemming was the most important preprocessing method which increases the success in Turkish datasets, whereas removing stop words in English datasets. The maximum scores revealed that feature selection, feature size and classifier are more effective than preprocessing in classification success. It was concluded that preprocessing is necessary for text classification because it shortens the processing time and can achieve high classification success, a preprocessing method does not have the same effect in all languages, and different preprocessing methods are more successful for different languages.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

A Case Study of a Text Mining Method for Discovering Evolutionary Patterns of Mobile Phone in Korea (국내 휴대폰의 진화패턴 규명을 위한 텍스트 마이닝 방안 제안 및 사례 연구)

  • On, Byung-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.29-45
    • /
    • 2015
  • Systematic theory, concepts, and methodology for the biological evolution have been developed while patterns and principles of the evolution have been actively studied in the past 200 years. Furthermore, they are applied to various fields such as evolutionary economics, evolutionary psychology, evolutionary linguistics, making significant progress in research. In addition, existing studies have applied main biological evolutionary models to artifacts although such methods do not fit to them. These models are also limited to generalize evolutionary patterns of artifacts because they are designed in terms of a subjective point of view of experts who know well about the artifacts. Unlike biological organisms, because artifacts are likely to reflect the imagination of the human will, it is known that the theory of biological evolution cannot be directly applied to artifacts. In this paper, beyond the individual's subjective, the aim of our research is to present evolutionary patterns of a given artifact based on peeping the idea of the public. For this, we propose a text mining approach that presents a systematic framework that can find out the evolutionary patterns of a given artifact and then visualize effectively. In particular, based on our proposal, we focus mainly on a case study of mobile phone that has emerged as an icon of innovation in recent years. We collect and analyze review posts on mobile phone available in the domestic market over the past decade, and discuss the detailed results about evolutionary patterns of the mobile phone. Moreover, this kind of task is a tedious work over a long period of time because a small number of experts carry out an extensive literature survey and summarize a huge number of materials to finally draw a diagram of evolutionary patterns of the mobile phone. However, in this work, to minimize the human efforts, we present a semi-automatic mining algorithm, and through this research we can understand how human creativity and imagination are implemented. In addition, it is a big help to predict the future trend of mobile phone in business and industries.

Trend Analysis of Barrier-free Academic Research using Text Mining and CONCOR (텍스트 마이닝과 CONCOR을 활용한 배리어 프리 학술연구 동향 분석)

  • Jeong-Ki Lee;Ki-Hyok Youn
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.2
    • /
    • pp.19-31
    • /
    • 2023
  • The importance of barrier free is being highlighted worldwide. This study attempted to identify barrier-free research trends using text mining. Through this, it was intended to help with research and policies to create a barrier free environment. The analysis data is 227 papers published in domestic academic journals from 1996 when barrier free research began to 2022. The researcher converted the title, keywords, and abstract of an academic thesis into text, and then analyzed the pattern of the thesis and the meaning of the data. The summary of the research results is as follows. First, barrier-free research began to increase after 2009, with an annual average of 17.1 papers being published. This is related to the implementation guidelines for the barrier-free certification system that took effect on July 15, 2008. Second, results of barrier-free text mining i) As a result of word frequency analysis of top keywords, important keywords such as barrier free, disabled, design, universal design, access, elderly, certification, improvement, evaluation, and space, facility, and environment were searched. ii) As a result of TD-IDF analysis, the main keywords were universal design, design, certification, house, access, elderly, installation, disabled, park, evaluation, architecture, and space. iii) As a result of N-Ggam analysis, barrier free+certification, barrier free+design, barrier free+barrier free, elderly+disabled, disabled+elderly, disabled+convenience facilities, the disabled+the elderly, society+the elderly, convenience facilities+installation, certification+evaluation index, physical+environment, life+quality, etc. appeared in a related language. Third, as a result of the CONCOR analysis, cluster 1 was barrier-free issues and challenges, cluster 2 was universal design and space utilization, cluster 3 was Improving Accessibility for the Disabled, and cluster 4 was barrier free certification and evaluation. Based on the analysis results, this study presented policy implications for vitalizing barrier-free research and establishing a desirable barrier free environment.