• Title/Summary/Keyword: Data Classification Systems

Search Result 1,440, Processing Time 0.027 seconds

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.81-86
    • /
    • 2016
  • Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

A classification techiniques of J-lead solder joint using neural network (신경 회로망을 이용한 J-리드 납땜 상태 분류)

  • Yu, Chang-Mok;Lee, Joong-Ho;Cha, Young-Yeup
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.5 no.8
    • /
    • pp.995-1000
    • /
    • 1999
  • This paper presents a optic system and a visual inspection algorithm looking for solder joint defects of J-lead chip which are more integrate and smaller than ones with Gull-wing on PCBs(Printed Circuit Boards). The visual inspection system is composed of three sections : host PC, imaging and driving parts. The host PC part controls the inspection devices and executes the inspection algorithm. The imaging part acquires and processes image data. And the driving part controls XY-table for automatic inspection. In this paper, the most important five features are extracted from input images to categorize four classes of solder joint defects in the case of J-lead chip and utilized to a back-propagation network for classification. Consequently, good accuracy of classification performance and effectiveness of chosen five features are examined by experiment using proposed inspection algorithm.

  • PDF

A Study on Plagiarism Detection and Document Classification Using Association Analysis (연관분석을 이용한 효과적인 표절검사 및 문서분류에 관한 연구)

  • Hwang, Insoo
    • The Journal of Information Systems
    • /
    • v.23 no.3
    • /
    • pp.127-142
    • /
    • 2014
  • Plagiarism occurs when the content is copied without permission or citation, and the problem of plagiarism has rapidly increased because of the digital era of resources available on the World Wide Web. An important task in plagiarism detection is measuring and determining similar text portions between a given pair of documents. One of the main difficulties of this task is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to handle this problem, this paper proposed association analysis in data mining to detect plagiarism. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of plagiarized text. Experimental results employing an unsupervised document classification strategy showed that the proposed method outperformed traditionally used approaches.

Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees: An Application to the Credit Rating of S&P 500 Companies

  • Hong, Tae-Ho;Park, Ji-Young
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.43-58
    • /
    • 2011
  • Support vector machines (SVMs), a machine learning technique, has been applied to not only binary classification problems such as bankruptcy prediction but also multi-class problems such as corporate credit ratings. However, in general, the performance of SVMs can be easily worse than the best alternative model to SVMs according to the selection of predictors, even though SVMs has the distinguishing feature of successfully classifying and predicting in a lot of dichotomous or multi-class problems. For overcoming the weakness of SVMs, this study has proposed an approach for selecting features for multi-class SVMs that utilize the impurity measures of classification trees. For the selection of the input features, we employed the C4.5 and CART algorithms, including the stepwise method of discriminant analysis, which is a well-known method for selecting features. We have built a multi-class SVMs model for credit rating using the above method and presented experimental results with data regarding S&P 500 companies.

Performance analysis in automatic modulation classification based on deep learning (딥러닝 기반 자동 변조 인식 성능 분석)

  • Kang, Jong-Jin;Kim, Jae-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.3
    • /
    • pp.427-432
    • /
    • 2021
  • In this paper, we conduct performance analysis in automatic modulation classification of unknown communication signal to identify its modulation types based on deep neural network. The modulation classification performance was verified using time domain digital sample data of the modulated signal, frequency domain data to which FFT was applied, and time and frequency domain mixed data as neural network input data. For 11 types of analog and digitally modulated signals, the modulation classification performance was verified in various SNR environments ranging from -20 to 18 dB and reason for false classification was analyzed. In addition, by checking the learning speed according to the type of input data for neural network, proposed method is effective for constructing an practical automatic modulation recognition system that require a lot of time to learn.

Recommendation System for Research Field of R&D Project Using Machine Learning (머신러닝을 이용한 R&D과제의 연구분야 추천 서비스)

  • Kim, Yunjeong;Shin, Donggu;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1809-1816
    • /
    • 2021
  • In order to identify the latest research trends using data related to national R&D projects and to produce and utilize meaningful information, the application of automatic classification technology was also required in the national R&D information service, so we conducted research to automatically classify and recommend research field. About 450,000 cases of national R&D project data from 2013 to 2020 were collected and used for learning and evaluation. A model was selected after data pre-processing, analysis, and performance analysis for valid data among collected data. The performance of Word2vec, GloVe, and fastText was compared for the purpose of deriving the optimal model combination. As a result of the experiment, the accuracy of only the subcategories used as essential items of task information is 90.11%. This model is expected to be applicable to the automatic classification study of other classification systems with a hierarchical structure similar to that of the national science and technology standard classification research field.

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

  • Hwang, Chulhyun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.179-190
    • /
    • 2022
  • Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.

AUTOMATED INTEGRATION OF CONSTRUCTION IMAGES IN MODEL BASED SYSTEMS

  • Ioannis K. Brilakis;Lucio Soibelman
    • International conference on construction engineering and project management
    • /
    • 2005.10a
    • /
    • pp.503-508
    • /
    • 2005
  • In the modern, distributed and dynamic construction environment it is important to exchange information from different sources and in different data formats in order to improve the processes supported by these systems. Previous research has demonstrated that (i) a significant percentage of construction data is stored in semi-structured or unstructured data formats (ii) locating and identifying such data that are needed for the important decision making processes is a very hard and time-consuming task. In this paper, an automated methodology for the classification and retrieval of construction images in AEC/FM model based systems will be presented. Specifically, a combination of techniques from the areas of image processing, computer vision, and content-based image retrieval have been deployed to develop a method that can retrieve related construction site image data from components of a project model.

  • PDF

Content Analysis of Learning Classifications of Foodservice and Culinary Majors (외식조리전공의 학문분류에 대한 내용분석)

  • Han, Kyung-Soo;Shin, Sun-Hwa
    • Culinary science and hospitality research
    • /
    • v.16 no.2
    • /
    • pp.367-381
    • /
    • 2010
  • The principal objective of this study was to compare domestic and foreign learning(science) classification systems for foodservice and culinary majors, and to identify any problems with the domestic learning classification system. This study entailed a comparison of domestic and foreign versions of scientific systems addressing hospitality management. This study involved content analysis, which proved to be a useful method for comparing secondary data, and was used to evaluate the science classification systems of the Korea Research Foundation, Korea Science and Engineering Foundation(Korea), National Science Foundation, Oracle Corporation(America), Natural Science and Engineering Research Council(Canada) and the Australian Bureau Of STATISTICS(Australia). As a result, the Korean classification systems were identified as being based on a hierarchical stepwise system, whereas those of other countries were classified on the basis of nominal classifications. The initial research conducted in this study lays the groundwork for effective learning classifications for foodservice and culinary majors in the future.

  • PDF

Development and testing of a composite system for bridge health monitoring utilising computer vision and deep learning

  • Lydon, Darragh;Taylor, S.E.;Lydon, Myra;Martinez del Rincon, Jesus;Hester, David
    • Smart Structures and Systems
    • /
    • v.24 no.6
    • /
    • pp.723-732
    • /
    • 2019
  • Globally road transport networks are subjected to continuous levels of stress from increasing loading and environmental effects. As the most popular mean of transport in the UK the condition of this civil infrastructure is a key indicator of economic growth and productivity. Structural Health Monitoring (SHM) systems can provide a valuable insight to the true condition of our aging infrastructure. In particular, monitoring of the displacement of a bridge structure under live loading can provide an accurate descriptor of bridge condition. In the past B-WIM systems have been used to collect traffic data and hence provide an indicator of bridge condition, however the use of such systems can be restricted by bridge type, assess issues and cost limitations. This research provides a non-contact low cost AI based solution for vehicle classification and associated bridge displacement using computer vision methods. Convolutional neural networks (CNNs) have been adapted to develop the QUBYOLO vehicle classification method from recorded traffic images. This vehicle classification was then accurately related to the corresponding bridge response obtained under live loading using non-contact methods. The successful identification of multiple vehicle types during field testing has shown that QUBYOLO is suitable for the fine-grained vehicle classification required to identify applied load to a bridge structure. The process of displacement analysis and vehicle classification for the purposes of load identification which was used in this research adds to the body of knowledge on the monitoring of existing bridge structures, particularly long span bridges, and establishes the significant potential of computer vision and Deep Learning to provide dependable results on the real response of our infrastructure to existing and potential increased loading.