• Title/Summary/Keyword: Pattern mining

Search Result 624, Processing Time 0.028 seconds

Identifying the Expression Patterns of Depression Based on the Random Forest (랜덤 포레스트 기반 우울증 발현 패턴 도출)

  • Jeon, Hyeon Jin;Jihn, Chang-Ho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.53-64
    • /
    • 2021
  • Depression is one of the most important psychiatric disorders worldwide. Most depression-related data mining and machine learning studies have been conducted to predict the presence of depression or to derive individual risk factors. However, since depression is caused by a combination of various factors, it is necessary to identify the complex relationship between the factors in order to establish effective anti-depression and management measures. In this study, we propose a methodology for identifying and interpreting patterns of depression expressions using the method of deriving random forest rules, where the random forest rule consists of the condition for the manifestation of the depressive pattern and the prediction result of depression when the condition is met. The analysis was carried out by subdividing into 4 groups in consideration of the different depressive patterns according to gender and age. Depression rules derived by the proposed methodology were validated by comparing them with the results of previous studies. Also, through the AUC comparison test, the depression diagnosis performance of the derived rules was evaluated, and it was not different from the performance of the existing PHQ-9 summing method. The significance of this study can be found in that it enabled the interpretation of the complex relationship between depressive factors beyond the existing studies that focused on prediction and deduction of major factors.

Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN

  • Liu, Gaoyang;Niu, Yanbo;Zhao, Weijian;Duan, Yuanfeng;Shu, Jiangpeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • The deployment of advanced structural health monitoring (SHM) systems in large-scale civil structures collects large amounts of data. Note that these data may contain multiple types of anomalies (e.g., missing, minor, outlier, etc.) caused by harsh environment, sensor faults, transfer omission and other factors. These anomalies seriously affect the evaluation of structural performance. Therefore, the effective analysis and mining of SHM data is an extremely important task. Inspired by the deep learning paradigm, this study develops a novel generative adversarial network (GAN) and convolutional neural network (CNN)-based data anomaly detection approach for SHM. The framework of the proposed approach includes three modules : (a) A three-channel input is established based on fast Fourier transform (FFT) and Gramian angular field (GAF) method; (b) A GANomaly is introduced and trained to extract features from normal samples alone for class-imbalanced problems; (c) Based on the output of GANomaly, a CNN is employed to distinguish the types of anomalies. In addition, a dataset-oriented method (i.e., multistage sampling) is adopted to obtain the optimal sampling ratios between all different samples. The proposed approach is tested with acceleration data from an SHM system of a long-span bridge. The results show that the proposed approach has a higher accuracy in detecting the multi-pattern anomalies of SHM data.

Evaluation of blasting vibration with center-cut methods for tunnel excavation

  • Lee, Seung-Joong;Kim, Byung-Ryeol;Choi, Sung-Oong;Kim, Nam-Soo
    • Geomechanics and Engineering
    • /
    • v.30 no.5
    • /
    • pp.423-435
    • /
    • 2022
  • Ground vibration generated repeatedly in blasting tunnel excavation sites is known to be one of the major hazards induced by blasting operations. Various studies have been conducted to minimize these hazards, both theoretical and empirical methods using electronic detonator, the deck charge method, the center-cut method among others Among these various existing methods for controlling the ground vibration, in this study, we investigated the cut method. In particular, we analyzed and compared the V-cut method, which is commonly used in tunnel blasting, to the double-drilled parallel method, which has recently been introduced in tunnel excavation site. To understand the rock fragmentation efficiency as well as the ground vibration controllability of the two methods, we performed in-situ field blasting tests with both cut methods at a tunnel excavation site. Additionally, numerical analysis by FLAC3D has been executed for a better understanding of fracture propagation pattern and ground vibration generation by each cut method. Ground vibration levels, by PPVs measured in field blasting tests and PPVs estimated in numerical simulations, showed a lower value in the double-drilled parallel compared with the V-cut method, although the exact values are quite different in field measurement and numerical estimation.

Power Consumption Patterns Analysis Using Expectation-Maximization Clustering Algorithm and Emerging Pattern Mining (기대치-최대화 군집 알고리즘과 출현 패턴 마이닝을 이용한 전력 소비 패턴 분석)

  • Jin Hyoung Park;Heon Gyu Lee;Jin-Ho Shin;Keun Ho Ryu;Hiseok Kim
    • Annual Conference of KIPS
    • /
    • 2008.11a
    • /
    • pp.261-264
    • /
    • 2008
  • 전력 회사의 효율적인 운용과 전력 시장에서의 경쟁을 위하여 고객의 전력 소비 패턴 분석 및 정확한 예측이 이루어져야 한다. 이를 위해서 이 논문에서는 원격 검침 시스템에 의한 전국의 고압 고객 데이터를 대상으로 고객의 전력 소비 패턴을 정확히 예측할 수 있는 마이닝 기법을 제안하였다. 먼저, 국내 계약종별 고객 특성에 맞는 부하 패턴의 정확한 구별을 위한 9가지의 특징 벡터를 추출하였고, 기대치-최대화 군집화 알고리즘을 사용하여 고객의 34개 대표 부하프로파일을 생성하였다. 마지막으로 추출된 특징 벡터로부터 각 대표 프로파일에 대한 출현 패턴 기반의 분류 모델을 구성하여 고객의 전력 소비 패턴을 분류하였다. 국내 원격 검침 시스템에 의해 측정된 총 3,895명의 고압 고객 데이터에 대한 실험 결과 약 91%의 분류 정확성을 보였다.

Natural frequency analysis of joined conical-cylindrical-conical shells made of graphene platelet reinforced composite resting on Winkler elastic foundation

  • Xiangling Wang;Xiaofeng Guo;Masoud Babaei;Rasoul Fili;Hossein Farahani
    • Advances in nano research
    • /
    • v.15 no.4
    • /
    • pp.367-384
    • /
    • 2023
  • Natural frequency behavior of graphene platelets reinforced composite (GPL-RC) joined truncated conical-cylindrical- conical shells resting on Winkler-type elastic foundation is presented in this paper for the first time. The rule of mixture and the modified Halpin-Tsai approach are applied to achieve the mechanical properties of the structure. Four different graphene platelets patterns are considered along the thickness of the structure such as GPLA, GPLO, GPLX, GPLUD. Finite element procedure according to Rayleigh-Ritz formulation has been used to solve 2D-axisymmetric elasticity equations. Application of 2D axisymmetric elasticity theory allows thickness stretching unlike simple shell theories, and this gives more accurate results, especially for thick shells. An efficient parametric investigation is also presented to show the effects of various geometric variables, three different boundary conditions, stiffness of elastic foundation, dispersion pattern and weight fraction of GPLs nanofillers on the natural frequencies of the joined shell. Results show that GPLO and BC3 provide the most rigidity that cause the most natural frequencies among different BCs and GPL patterns. Also, by increasing the weigh fraction of nanofillers, the natural frequencies will increase up to 200%.

Power Load Pattern Classification from AMR Data (AMR 데이터에서의 전력 부하 패턴 분류)

  • Piao, Minghao;Park, Jin-Hyung;Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Annual Conference of KIPS
    • /
    • 2008.05a
    • /
    • pp.231-234
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in load demand data. The main aim of our work is to forecast customers' contract information from capacity of daily power consumption patterns. According to the result, we try to evaluate the contract information's suitability. The proposed our approach consists of three stages: (i) data preprocessing: noise or outlier is detected and removed (ii) cluster analysis: SOMs clustering is used to create load patterns and the representative load profiles and (iii) classification: we applied the K-NNs classifier in order to predict the customers' contract information base on power consumption patterns. According to the our proposed methodology, power load measured from AMR(automatic meter reading) system, as well as customer indexes, were used as inputs. The output was the classification of representative load profiles (or classes). Lastly, in order to evaluate KNN classification technique, the proposed methodology was applied on a set of high voltage customers of the Korea power system and the results of our experiments was presented.

DATA MININING APPROACH TO PARAMETRIC COST ESTIMATE IN EARLY DESIGN STAGE AND ANALYTICAL CHARACTERIZATION ON OLAP (ON-LINE ANALYTICAL PROCESSING)

  • JaeHo Cho;HyunKyun Jung;JaeYoul Chun
    • International conference on construction engineering and project management
    • /
    • 2011.02a
    • /
    • pp.176-181
    • /
    • 2011
  • A role of cost modeler is that of facilitating design process by the systematic application of cost factors so as to maintain sensible and economic relationships between cost, quantity, utility and appearance. These relationships help to achieve the client's requirements within an agreed budget. The purpose of this study is to develop a parametric cost estimating model for the early design stage by using the multi-dimensional system of OLAP (On-line Analytical Processing) based on the case of quantity data related to architectural design features. The parametric cost estimating models have been adopted to support decision making in the early design stage. These models typically use a similar instance or a pattern of historical case. In order to effectively use this type of data model, it is required to set data classification and prediction methods. One of the methods is to find the similar class in line with attribute selection measure in the multi-dimensional data model. Therefore, this research is to analyze the relevance attribute influenced by architectural design features with the subject of case-based quantity data used for the parametric cost estimating model. The relevance attributes can be analyzed by Analytical Characterization. It helps determine what attributes to be included in the OLAP multi-dimension.

  • PDF

Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets

  • Mehmet F. Karaca
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.3
    • /
    • pp.591-609
    • /
    • 2024
  • In this study, preprocessings with all combinations were examined in terms of the effects on decreasing word number, shortening the duration of the process and the classification success in balanced and imbalanced datasets which were unbalanced in different ratios. The decreases in the word number and the processing time provided by preprocessings were interrelated. It was seen that more successful classifications were made with Turkish datasets and English datasets were affected more from the situation of whether the dataset is balanced or not. It was found out that the incorrect classifications, which are in the classes having few documents in highly imbalanced datasets, were made by assigning to the class close to the related class in terms of topic in Turkish datasets and to the class which have many documents in English datasets. In terms of average scores, the highest classification was obtained in Turkish datasets as follows: with not applying lowercase, applying stemming and removing stop words, and in English datasets as follows: with applying lowercase and stemming, removing stop words. Applying stemming was the most important preprocessing method which increases the success in Turkish datasets, whereas removing stop words in English datasets. The maximum scores revealed that feature selection, feature size and classifier are more effective than preprocessing in classification success. It was concluded that preprocessing is necessary for text classification because it shortens the processing time and can achieve high classification success, a preprocessing method does not have the same effect in all languages, and different preprocessing methods are more successful for different languages.

Refined nonlocal strain gradient theory for mechanical response of cosine FG-GRNC laminated nanoshells rested on elastic foundation

  • Mohamed A. Eltaher;A.A. Daikh;Amin Hamdi;Gamal S. Abdelhaffez; Azza M. Abdraboh
    • Advances in nano research
    • /
    • v.17 no.4
    • /
    • pp.335-350
    • /
    • 2024
  • This paper investigates the mechanical behavior of a new type of functionally graded graphene-reinforced nanocomposite (FG-GRNC) doubly-curved laminated shells, referred to as cosine FG-GRNC. The study employs a refined higher-order shear deformation shell theory combined with a modified continuum nonlocal strain gradient theory. The effective Young's modulus of the GRNC shell in the thickness direction is determined using the modified Halpin-Tsai model, while Poisson's ratio and mass density are calculated using the rule of mixtures. The analysis includes two graphene-reinforced distribution patterns-FG-A CNRCs and FG-B CNRCs-along with uniform UD CNRCs. An enhanced Galerkin method is used to solve the governing equilibrium equations for the GRNC nanoshell, yielding closed-form solutions for bending deflection and critical buckling loads. The nanoshell is supported by an orthotropic elastic foundation characterized by three parameters. A detailed parametric analysis is performed to evaluate how factors such as the length scale parameter, nonlocal parameter, distribution pattern, GPL weight fraction, shell thickness, and shell geometry influence deflections and critical buckling loads.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.