• Title/Summary/Keyword: Data Paper

Search Result 56,237, Processing Time 0.069 seconds

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

A Study on Product Information Exchange between Heterogeneous Systems including Commercial PDM Systems (상용 PDM을 포함한 이기종 시스템 간의 제품정보 교환에 관한 연구)

  • Yang, Tae-Ho;Yoon, Tae-Hyuck;Choi, Sang-Su;Noh, Sang-Do
    • Korean Journal of Computational Design and Engineering
    • /
    • v.13 no.3
    • /
    • pp.175-186
    • /
    • 2008
  • For the success to PLM in manufacturing industries, the creation, management and coordination of all product-related information are essential, and the exchange of product information and data has become an important part of the product development. In this paper, we define the neutral schema, and it refers to PLM Services. Based on this neutral schema, we develop the PLM Integrator to exchange product information and data between diverse heterogeneous systems including PDM systems. We apply the PLM Integrator developed in this paper to commercial PDM systems such as SmarTeam, Teamcenter Engineering and MEMPHIS which is a data exchange middleware system for VR applications. By implementations, exchanges of product information and data can be done without loss of information. Also, the PLM Integrator can upload and download product information, data and related files. The result of this paper can not only reduce unnecessary efforts for data exchanging between different information systems including PDM/PLM systems but also provide a collaborative environment for PLM.

An Individual Information Management Method on a Distributed Geographic Information System

  • Yutaka-Ohsawa;Kim, Kyongwol
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1998.06b
    • /
    • pp.105-110
    • /
    • 1998
  • This paper proposes a method to manage individual information on large scale distributed geographic information systems. On such system, ordinary users usually cannot alter the contents of the server. The method proposed in this paper makes possible to alter the contents or add individual data onto such kinds of non-write-permitted data onto set. We call the method as GDSF, ‘geographic differential script file’. In this method, a client user makes a GDSF which contains the private information to be added onto the served data. Then, the client keeps the file on a local disk. After this, when the user uses the data, he applies the differential data sequence onto the down loaded data to restore the information. The GDSF is a collection of picture commands which tell pictures insertions, deletions, and modification operations. The GDSF also can contain the modification. The GDSF also can contain the modification of the attribute information of geographic entities. The method also applicable to modify data on a ROM device, for example CD-ROM or DVD-ROM. This paper describes the method and experimental results.

  • PDF

Application of Digital Photogrammetry for The Automatic Extraction of Road Information (도로정보의 자동추출을 위한 수치사진측량기법의 적용)

  • 유환희
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.12 no.1
    • /
    • pp.89-94
    • /
    • 1994
  • A number of the latest research projects focus on the development of real-time mapping system. Typically, these devices are used to capture land-related information in digital form from airplanes or cars. The purpose of this paper is to automatically extract the road information from the digital images obtained using the so-called "GPS-Van" which has been developed by Center for Mapping at The Ohio State University, and to propose the method for the effective storage and management of the digital data. The edges of a road can be extracted from the digital image and determined real-time 3-dimensional position by digital photogrammetry. Also, the three storage level which consists of raster data level, object-oriented data level, and vector data level in the data storage and Quadtree data structure for the effective compression and search in the data management was proposed in this paper.his paper.

  • PDF

An Alloy Specification Based Automated Test Data Generation Technique (Alloy 명세 기반 자동 테스트 데이터 생성 기법)

  • Chung, In-Sang
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.191-202
    • /
    • 2007
  • In general, test data generation techniques require the specification of an entire program path for automated test data generation. This paper presents a new way for generating test data automatically een without specifying a program path completely. For the ends, this paper presents a technique for transforming a program under test into Alloy which is the first order relational logic and then producing test data via Alloy analyzer. The proposed method reduces the burden of selecting a program path and also makes it easy to generate test data according to various test adequacy criteria. This paper illustrates the proposed method through simple, but illustrative examples.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

Competitive Benchmarking in Large Data Bases Using Self-Organizing Maps

  • 이영찬
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.10a
    • /
    • pp.303-311
    • /
    • 1999
  • The amount of financial information in today's sophisticated large data bases is huge and makes comparisons between company performance difficult or at least very time consuming. The purpose of this paper is to investigate whether neural networks in the form of self-organizing maps can be used to manage the complexity in large data bases. This paper structures and analyzes accounting numbers in a large data base over several time periods. By using self-organizing maps, we overcome the problems associated with finding the appropriate underlying distribution and the functional form of the underlying data in the structuring task that is often encountered, for example, when using cluster analysis. The method chosen also offers a way of visualizing the results. The data base in this study consists of annual reports of more than 80 Korean companies with data from the year 1998.

  • PDF

A Study on the Effective Database Marketing using Data Mining Technique(CHAID) (데이터마이닝 기법(CHAID)을 이용한 효과적인 데이터베이스 마케팅에 관한 연구)

  • 김신곤
    • The Journal of Information Technology and Database
    • /
    • v.6 no.1
    • /
    • pp.89-101
    • /
    • 1999
  • Increasing number of companies recognize that the understanding of customers and their markets is indispensable for their survival and business success. The companies are rapidly increasing the amount of investments to develop customer databases which is the basis for the database marketing activities. Database marketing is closely related to data mining. Data mining is the non-trivial extraction of implicit, previously unknown and potentially useful knowledge or patterns from large data. Data mining applied to database marketing can make a great contribution to reinforce the company's competitiveness and sustainable competitive advantages. This paper develops the classification model to select the most responsible customers from the customer databases for telemarketing system and evaluates the performance of the developed model using LIFT measure. The model employs the decision tree algorithm, i.e., CHAID which is one of the well-known data mining techniques. This paper also represents the effective database marketing strategy by applying the data mining technique to a credit card company's telemarketing system.

  • PDF

Predicting Nonstationary Time Series with Fuzzy Learning Based on Consecutive Data (연속된 데이터의 퍼지학습에 의한 비정상 시계열 예측)

  • Kim, In-Taek
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.50 no.5
    • /
    • pp.233-240
    • /
    • 2001
  • This paper presents a time series prediction method using a fuzzy rule-based system. Extracting fuzzy rules by performing a simple one-pass operation on the training data is quite attractive because it is easy to understand, verify, and extend. The simplest method is probably to relate an estimate, x(n+k), with past data such as x(n), x(n-1), ..x(n-m), where k and m are prefixed positive integers. The relation is represented by fuzzy if-then rules, where the past data stand for premise part and the predicted value for consequence part. However, a serious problem of the method is that it cannot handle nonstationary data whose long-term mean is varying. To cope with this, a new training method is proposed, which utilizes the difference of consecutive data in a time series. In this paper, typical previous works relating time series prediction are briefly surveyed and a new method is proposed to overcome the difficulty of prediction nonstationary data. Finally, computer simulations are illustrated to show the improved results for various time series.

  • PDF

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

  • Seo, Sang-Wook;Lee, Dong-Wook;Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.31-36
    • /
    • 2008
  • This paper presents adaptive learning data of evolvable neural networks (ENNs) for time series prediction of nonlinear dynamic systems. ENNs are a special class of neural networks that adopt the concept of biological evolution as a mechanism of adaptation or learning. ENNs can adapt to an environment as well as changes in the enviromuent. ENNs used in this paper are L-system and DNA coding based ENNs. The ENNs adopt the evolution of simultaneous network architecture and weights using indirect encoding. In general just previous data are used for training the predictor that predicts future data. However the characteristics of data and appropriate size of learning data are usually unknown. Therefore we propose adaptive change of learning data size to predict the future data effectively. In order to verify the effectiveness of our scheme, we apply it to chaotic time series predictions of Mackey-Glass data.