• Title/Summary/Keyword: Data extraction

Search Result 3,368, Processing Time 0.264 seconds

Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory (데이터마이닝의 자동 데이터 규칙 추출 방법론 개발 : 계층적 클러스터링 알고리듬과 러프 셋 이론을 중심으로)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.135-142
    • /
    • 2009
  • Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. The major techniques used in data mining are mining association rules, classification and clustering. Since these techniques are used individually, it is necessary to develop the methodology for rule extraction using a process of integrating these techniques. Rule extraction techniques assist humans in analyzing of large data sets and to turn the meaningful information contained in the data sets into successful decision making. This paper proposes an autonomous method of rule extraction using clustering and rough set theory. The experiments are carried out on data sets of UCI KDD archive and present decision rules from the proposed method. These rules can be successfully used for making decisions.

Multiscale features and information extraction of online strain for long-span bridges

  • Wu, Baijian;Li, Zhaoxia;Chan, Tommy H.T.;Wang, Ying
    • Smart Structures and Systems
    • /
    • v.14 no.4
    • /
    • pp.679-697
    • /
    • 2014
  • The strain data acquired from structural health monitoring (SHM) systems play an important role in the state monitoring and damage identification of bridges. Due to the environmental complexity of civil structures, a better understanding of the actual strain data will help filling the gap between theoretical/laboratorial results and practical application. In the study, the multi-scale features of strain response are first revealed after abundant investigations on the actual data from two typical long-span bridges. Results show that, strain types at the three typical temporal scales of $10^5$, $10^2$ and $10^0$ sec are caused by temperature change, trains and heavy trucks, and have their respective cut-off frequency in the order of $10^{-2}$, $10^{-1}$ and $10^0$ Hz. Multi-resolution analysis and wavelet shrinkage are applied for separating and extracting these strain types. During the above process, two methods for determining thresholds are introduced. The excellent ability of wavelet transform on simultaneously time-frequency analysis leads to an effective information extraction. After extraction, the strain data will be compressed at an attractive ratio. This research may contribute to a further understanding of actual strain data of long-span bridges; also, the proposed extracting methodology is applicable on actual SHM systems.

3D Line Segment Extraction Based on Line Fitting of Elevation Data

  • Woo, Dong-Min
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.181-185
    • /
    • 2009
  • In this paper, we are concerned with a 3D line segment extraction method by area-based stereo matching technique. The main idea is based on line fitting of elevation data on 2D line coordinates of ortho-image. Elevation data and ortho-image can be obtained by well-known area-based stereo matching technique. In order to use elevation in line fitting, the elevation itself should be reliable. To measure the reliability of elevation, in this paper, we employ the concept of self-consistency. We test the effectiveness of the proposed method with a quantitative accuracy analysis using synthetic images generated from Avenches data set of Ascona aerial images. Experimental results indicate that our method generates 3D line segments almost 7.5 times more accurate than raw elevations obtained by area-based method.

  • PDF

Extraction of similar XML data based on XML structure and processing unit

  • Park, Jong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2017
  • XML has established itself as the format for data exchange on the internet and the volume of its instance is large scale. Therefore, to extract similar information from XML instance is one of research topics but is insufficient. In this paper, we extract similar information from various kind of XML instances according to the same goal. Also we use only the structure information of XML instance for information extraction because some of XML instance is described without its schema. In order to efficiently extract similar information, we propose a minimum unit of processing and two approaches for finding the unit. The one is a structure-based method which uses only the structure information of XML instance and another is a measure-based method which finds a unit by numerical formula. Our two approaches can be applied to any application that needs the extraction of similar information based on XML data. Also the approach can be used for HTML instance.

Web Information Extraction using HTML Tag Pattern (HTML 태그페턴을 이용한 웹정보추출시스템)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.79-92
    • /
    • 2005
  • To query the vast amount of web pages which are available i]l the Internet, it is necessary to extract the encoded information in the web pages for converting it into structured data (e.g. relational data for SQL) or semistructured data (e.g. XML data for XQuery), In this paper, we propose a new web information extraction system, PIES, to convert web information into XML documents. PIES is based on a user-specified target schema and HTML tag pattern descriptions. The web information is extracted by the pattern descriptions and validated by the target schema. We designed a new language to describe extraction rules, and a new regular expression to describe HTML tag patterns. We implemented PIES and applied it to the US patent web site to evaluate its correctness. It successfully extracted more than thousands of US patent data and converted them into XML documents.

  • PDF

Comparative Study of Knowledge Extraction on the Industrial Applications

  • Woo, Young-Kwang;Bae, Hyeon;Kim, Sung-Shin;Woo, Kwang-Bang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1338-1343
    • /
    • 2003
  • Data is the expression of the language or numerical values that show some characteristics. And information is extracted from data for the specific purposes. The knowledge is utilized as information to construct rules that recognize patterns and make decisions. Today, knowledge extraction and application of the knowledge are broadly accomplished to improve the comprehension and to elevate the performance of systems in several industrial fields. The knowledge extraction could be achieved by some steps that include the knowledge acquisition, expression, and implementation. Such extracted knowledge can be drawn by rules. Clustering (CU, input space partition (ISP), neuro-fuzzy (NF), neural network (NN), extension matrix (EM), etc. are employed for expression the knowledge by rules. In this paper, the various approaches of the knowledge extraction are examined by categories that separate the methods by the applied industrial fields. Also, the several test data and the experimental results are compared and analysed based upon the applied techniques that include CL, ISP, NF, NN, EM, and so on.

  • PDF

CREATING MULTIPLE CLASSIFIERS FOR THE CLASSIFICATION OF HYPERSPECTRAL DATA;FEATURE SELECTION OR FEATURE EXTRACTION

  • Maghsoudi, Yasser;Rahimzadegan, Majid;Zoej, M.J.Valadan
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.6-10
    • /
    • 2007
  • Classification of hyperspectral images is challenging. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. In other words in order to obtain statistically reliable classification results, the number of necessary training samples increases exponentially as the number of spectral bands increases. However, in many situations, acquisition of the large number of training samples for these high-dimensional datasets may not be so easy. This problem can be overcome by using multiple classifiers. In this paper we compared the effectiveness of two approaches for creating multiple classifiers, feature selection and feature extraction. The methods are based on generating multiple feature subsets by running feature selection or feature extraction algorithm several times, each time for discrimination of one of the classes from the rest. A maximum likelihood classifier is applied on each of the obtained feature subsets and finally a combination scheme was used to combine the outputs of individual classifiers. Experimental results show the effectiveness of feature extraction algorithm for generating multiple classifiers.

  • PDF

Pattern recognition of time series data based on the chaotic feature extracrtion (카오스 특징 추출에 의한 시계열 신호의 패턴인식)

  • 이호섭;공성곤
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1996.10a
    • /
    • pp.294-297
    • /
    • 1996
  • This paper proposes the method to recognize of time series data based on the chaotic feature extraction. Features extract from time series data using the chaotic time series data analysis and the pattern recognition process is using a neural network classifier. In experiment, EEG(electroencephalograph) signals are extracted features by correlation dimension and Lyapunov experiments, and these features are classified by multilayer perceptron neural networks. Proposed chaotic feature extraction enhances recognition results from chaotic time series data.

  • PDF

Feature Extraction of ECG Signal for Heart Diseases Diagnoses (심장질환진단을 위한 ECG파형의 특징추출)

  • Kim, Hyun-Dong;Min, Chul-Hong;Kim, Tae-Seon
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.325-327
    • /
    • 2004
  • ECG limb lead II signal widely used to diagnosis heart diseases and it is essential to detect ECG events (onsets, offsets and peaks of the QRS complex P wave and T wave) and extract them from ECG signal for heart diseases diagnoses. However, it is very difficult to develop standardized feature extraction formulas since ECG signals are varying on patients and disease types. In this paper, simple feature extraction method from normal and abnormal types of ECG signals is proposed. As a signal features, heart rate, PR interval, QRS interval, QT interval, interval between S wave and baseline, and T wave types are extracted. To show the validity of proposed method, Right Bundle Branch Block (RBBB), Left Bundle Branch Block (LBBB), Sinus Bradycardia, and Sinus Tachycardia data from MIT-BIH arrhythmia database are used for feature extraction and the extraction results showed higher extraction capability compare to conventional formula based extraction method.

  • PDF

Pretreatment For The Problem Solution Of Contents-Based Music Retrieval (내용 기반 음악 검색의 문제점 해결을 위한 전처리)

  • Chung, Myoung-Beom;Sung, Bo-Kyung;Ko, Il-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.97-104
    • /
    • 2007
  • This paper presents the problem of the feature extraction techniques that has been used a content-based analysis, classification and retrieval in audio data and proposes a course of the preprocessing for a new contents-based retrieval methods. Because the feature vector according to sampling value changes, the existing audio data analysis is problem that same music is appraised by other music. Therefore, we propose waveform information extraction method of PCM data for retrieval audio data of various format to contents-based. If this method is used. we can find that audio datas that get into sampling in various format are same data. And it may be applied in contents-based music retrieval system. To verity the performance of the method, an experiment was done feature extraction using STFT and waveform information extraction using PCM data. As a result, we could know that the method to propose is effective more.

  • PDF