• Title/Summary/Keyword: Term Classification

Search Result 752, Processing Time 0.026 seconds

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

A Study on Improving the Performance of Document Classification Using the Context of Terms (용어의 문맥활용을 통한 문헌 자동 분류의 성능 향상에 관한 연구)

  • Song, Sung-Jeon;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.205-224
    • /
    • 2012
  • One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term's meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.

Musical Genre Classification System based on Multiple-Octave Bands (다중 옥타브 밴드 기반 음악 장르 분류 시스템)

  • Byun, Karam;Kim, Moo Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.238-244
    • /
    • 2013
  • For musical genre classification, various types of feature vectors are utilized. Mel-frequency cepstral coefficient (MFCC), decorrelated filter bank (DFB), and octave-based spectral contrast (OSC) are widely used as short-term features, and their long-term variations are also utilized. In this paper, OSC features are extracted not only in the single-octave band domain, but also in the multiple-octave band one to capture the correlation between octave bands. As a baseline system, we select the genre classification system that won the fourth place in the 2012 music information retrieval evaluation exchange (MIREX) contest. By applying the OSC features based on multiple-octave bands, we obtain the better classification accuracy by 0.40% and 3.15% for the GTZAN and Ballroom databases, respectively.

A Novel RGB Channel Assimilation for Hyperspectral Image Classification using 3D-Convolutional Neural Network with Bi-Long Short-Term Memory

  • M. Preethi;C. Velayutham;S. Arumugaperumal
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.177-186
    • /
    • 2023
  • Hyperspectral imaging technology is one of the most efficient and fast-growing technologies in recent years. Hyperspectral image (HSI) comprises contiguous spectral bands for every pixel that is used to detect the object with significant accuracy and details. HSI contains high dimensionality of spectral information which is not easy to classify every pixel. To confront the problem, we propose a novel RGB channel Assimilation for classification methods. The color features are extracted by using chromaticity computation. Additionally, this work discusses the classification of hyperspectral image based on Domain Transform Interpolated Convolution Filter (DTICF) and 3D-CNN with Bi-directional-Long Short Term Memory (Bi-LSTM). There are three steps for the proposed techniques: First, HSI data is converted to RGB images with spatial features. Before using the DTICF, the RGB images of HSI and patch of the input image from raw HSI are integrated. Afterward, the pair features of spectral and spatial are excerpted using DTICF from integrated HSI. Those obtained spatial and spectral features are finally given into the designed 3D-CNN with Bi-LSTM framework. In the second step, the excerpted color features are classified by 2D-CNN. The probabilistic classification map of 3D-CNN-Bi-LSTM, and 2D-CNN are fused. In the last step, additionally, Markov Random Field (MRF) is utilized for improving the fused probabilistic classification map efficiently. Based on the experimental results, two different hyperspectral images prove that novel RGB channel assimilation of DTICF-3D-CNN-Bi-LSTM approach is more important and provides good classification results compared to other classification approaches.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

A Case Study on Network Status Classification based on Latency Stability

  • Kim, JunSeong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.11
    • /
    • pp.4016-4027
    • /
    • 2014
  • Understanding network latency is important for providing consistent and acceptable levels of services in network-based applications. However, due to the difficulty of estimating applications' network demands and the difficulty of network latency modeling the management of network resources has often been ignored. We expect that, since network latency repeats cycles of congested states, a systematic classification method for network status would be helpful to simplify issues in network resource managements. This paper presents a simple empirical method to classify network status with a real operational network. By observing oscillating behavior of end-to-end latency we determine networks' status in run time. Five typical network statuses are defined based on a long-term stability and a short-term burstiness. By investigating prediction accuracies of several simple numerical models we show the effectiveness of the network status classification. Experimental results show that around 80% reduction in prediction errors depending on network status.

An Analysis of Defects Apartment Houses Occurring during the Term of Warranty Liability (하자담보책임기간에 발생하는 공동주택 하자 분석)

  • Yu, Byong-Jae;Bang, Hong-Soon;Kim, Ok-Kyue
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2022.11a
    • /
    • pp.135-136
    • /
    • 2022
  • Defects caused by apartment houses have the term of warranty liability according to the enforcement ordinance of Acts of the Management of Apartment Houses. In case when defects occur during the term, free defect maintenance can be provided from the construction company. Yet, there occur conflicts between the construction company and residents, as to whether there occur defects or not. To resolve these conflicts, this study aimed to analyze construction classification and types that need managing, based on defects of apartment houses occurring during the term of warranty liability. This research analyzed 138,576 cases of data, as of five apartment house complexes. For the construction classification for defects of apartment houses, wooden flooring products accounted for the highest rate, followed by paper hanging, and wooden window. For the construction types of defects, torn/scratching took up with the highest rate, followed by the condition of defect in fixing and operating. In order to embody defects occurring during the term of warranty liability, into the visualization technique, this researcher utilized the word cloud method. This study will pursue the method for maintaining defects during the term of warranty liability, in the subsequent research, using the data that this research presented.

  • PDF

Development and Validation of the Classification of Home-based Long-term Care Activities (노인장기요양보험 재가서비스 분류 틀 개발 및 타당도 검증)

  • Song, MI Sook;Song, Hyun Jong
    • 한국노년학
    • /
    • v.34 no.2
    • /
    • pp.369-386
    • /
    • 2014
  • The purpose of this study was to develop the classification of home-based long-term care activities and to test its validity. In this study, the taxonomy of long-term care activities was structured according to the service domain and process. Two expert groups participated in making a draft of the taxonomy that was composed of 7 service domains, 22 care needs, 22 service objectives, and 114 activities. Reliability and validity of the taxonomy was tested in a sample of 152 elderly subjects who used the home-based long-term care services. Based on the factor analysis of 114 activities, 21 factors were extracted from 114 activities. Internal consistency of the factors was high. Content validity was confirmed by the CVI. Long-term care insurance grade was used to assess the criterion validity. Among 21 care needs, 12 cares needs were significantly different from their grade. The classification of home-based long-term care activities demonstrated reliability and validity. In conclusion, the use of this classification is recommended while communicating with the elderly subjects, service providers, and the 3rd party payers.

A Design of Control Chart for Fraction Nonconforming Using Fuzzy Data (퍼지 데이터를 이용한 불량률(p) 관리도의 설계)

  • 김계완;서현수;윤덕균
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.2
    • /
    • pp.191-200
    • /
    • 2004
  • Using the p chart is not adequate in case that there are lots of data and it is difficult to divide into products conforming or nonconforming because of obscurity of binary classification. So we need to design a new control chart which represents obscure situation efficiently. This study deals with the method to performing arithmetic operation representing fuzzy data into fuzzy set by applying fuzzy set theory and designs a new control chart taking account of a concept of classification on the term set and membership function associated with term set.

Comparative Analysis of Terminology and Classification Related to Risk Management of Radiotherapy

  • Oh, Yoonjin;Kim, Dong Wook;Shin, Dong Oh;Koo, Jihye;Lee, Soon Sung;Choi, Sang Hyoun;Ahn, Sohyun;Park, Dong-wook
    • Progress in Medical Physics
    • /
    • v.27 no.3
    • /
    • pp.131-138
    • /
    • 2016
  • We analyzed the terminology and classification related to the risk management of radiation treatment overseas to establish the terminology and classification system for Korea. This study investigated the terminology and classification for radiotherapy risk management through overseas research materials from related organizations and associations, including the IAEA, WHO, British group, EC, and AAPM. Overseas risk management commonly uses the terms "near miss", "incident", and "adverse event", classified according to the degree of severity. However, several organizations have ambiguous terminologies. They use the term "near miss" for events such as a near event, close call, and good catch; the term "incident" for an event; and the term "adverse event" for the likes of an accident and an event. In addition, different organizations use different classifications: a "near miss" is generally classified as "incident" in most cases but not classified as such in BIR et al. Confusion might also be caused by the disunity of the terminology and classification, and by the ambiguity of definitions. Patient safety management of medical institutions in Korea uses the terms "near miss", "adverse event", and "sentinel event", which it classifies into eight levels according to the severity of risk to the patient. Therefore, the terminology and classification for radiotherapy risk management based on the patient safety management of medical institutions in Korea will help in improving the safety and quality of radiotherapy.