Search | Korea Science

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
- Journal of Computing Science and Engineering
- /
- v.3 no.3
- /
- pp.165-180
- /
- 2009
The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.
https://doi.org/10.5626/JCSE.2009.3.3.165 인용 PDF

Automatic Categorization of Islamic Jurisprudential Legal Questions using Hierarchical Deep Learning Text Classifier

AlSabban, Wesam H.;Alotaibi, Saud S.;Farag, Abdullah Tarek;Rakha, Omar Essam;Al Sallab, Ahmad A.;Alotaibi, Majid
- International Journal of Computer Science & Network Security
- /
- v.21 no.9
- /
- pp.281-291
- /
- 2021
The Islamic jurisprudential legal system represents an essential component of the Islamic religion, that governs many aspects of Muslims' daily lives. This creates many questions that require interpretations by qualified specialists, or Muftis according to the main sources of legislation in Islam. The Islamic jurisprudence is usually classified into branches, according to which the questions can be categorized and classified. Such categorization has many applications in automated question-answering systems, and in manual systems in routing the questions to a specialized Mufti to answer specific topics. In this work we tackle the problem of automatic categorisation of Islamic jurisprudential legal questions using deep learning techniques. In this paper, we build a hierarchical deep learning model that first extracts the question text features at two levels: word and sentence representation, followed by a text classifier that acts upon the question representation. To evaluate our model, we build and release the largest publicly available dataset of Islamic questions and answers, along with their topics, for 52 topic categories. We evaluate different state-of-the art deep learning models, both for word and sentence embeddings, comparing recurrent and transformer-based techniques, and performing extensive ablation studies to show the effect of each model choice. Our hierarchical model is based on pre-trained models, taking advantage of the recent advancement of transfer learning techniques, focused on Arabic language.
https://doi.org/10.22937/IJCSNS.2021.21.9.37 인용 PDF KSCI

Classification of Fall in Sick Times of Liver Cirrhosis using Magnetic Resonance Image (자기공명영상을 이용한 간경변 단계별 분류에 관한 연구)

Park, Byung-Rae;Jeon, Gye-Rok
- Journal of radiological science and technology
- /
- v.26 no.1
- /
- pp.71-82
- /
- 2003
In this paper, I proposed a classifier of liver cirrhotic step using T1-weighted MRI(magnetic resonance imaging) and hierarchical neural network. The data sets for classification of each stage, which were normal, 1type, 2type and 3type, were obtained in Pusan National University Hospital from June 2001 to december 2001. And the number of data was 46. We extracted liver region and nodule region from T1-weighted MR liver image. Then objective interpretation classifier of liver cirrhotic steps in T1-weighted MR liver images. Liver cirrhosis classifier implemented using hierarchical neural network which gray-level analysis and texture feature descriptors to distinguish normal liver and 3 types of liver cirrhosis. Then proposed Neural network classifier teamed through error back-propagation algorithm. A classifying result shows that recognition rate of normal is 100%, 1type is 82.3%, 2type is 86.7%, 3type is 83.7%. The recognition ratio very high, when compared between the result of obtained quantified data to that of doctors decision data and neural network classifier value. If enough data is offered and other parameter is considered, this paper according to we expected that neural network as well as human experts and could be useful as clinical decision support tool for liver cirrhosis patients.
PDF

A Hierarchical Text Rating System for Objectionable Documents

Jeong, Chi-Yoon;Han, Seung-Wan;Nam, Taek-Yong
- Journal of Information Processing Systems
- /
- v.1 no.1 s.1
- /
- pp.22-26
- /
- 2005
In this paper, we classified the objectionable texts into four rates according to their harmfulness and proposed the hierarchical text rating system for objectionable documents. Since the documents in the same category have similarities in used words, expressions and structure of the document, the text rating system, which uses a single classification model, has low accuracy. To solve this problem, we separate objectionable documents into several subsets by using their properties, and then classify the subsets hierarchically. The proposed system consists of three layers. In each layer, we select features using the chi-square statistics, and then the weight of the features, which is calculated by using the TF-IDF weighting scheme, is used as an input of the non-linear SVM classifier. By means of a hierarchical scheme using the different features and the different number of features in each layer, we can characterize the objectionability of documents more effectively and expect to improve the performance of the rating system. We compared the performance of the proposed system and performance of several text rating systems and experimental results show that the proposed system can archive an excellent classification performance.
https://doi.org/10.3745/JIPS.2005.1.1.022 인용 PDF KSCI

Hierarchical Multi-Classifier for the Mixed Character Code Set (홍용 문자 코드 집합을 위한 계층적 다중문자 인식기)

Kim, Do-Hyeon;Park, Jae-Hyeon;Kim, Cheol-Ki;Cha, Eui-Young
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.10
- /
- pp.1977-1985
- /
- 2007
The character recognition technique is one of the artificial intelligence and has been widely applied in the automated system robot HCI(Human Computer Interaction), etc. This paper introduces the character set and the representative character that can be used in the recognition of the mage ROI. The character codes in this ROI include the digit, symbol, English and Hereat etc. We proposed the efficient multi-classifier structure by combining the small-size classifiers hierarchically. Moreover, we generated each small-size classifiers by delta-bar-delta learning algorithm. We tested the performance with various kinds of images and achieved the accuracy of 99%. The proposed multi-classifier showed the efficiency and the reliability for the mixed character code set.
https://doi.org/10.6109/jkiice.2007.11.10.1977 인용 PDF KSCI

Karyotype Classification of Chromosome Using the Hierarchical Neu (계층형 신경회로망을 이용한 염색체 핵형 분류)

Chang, Yong-Hoon;Lee, Young-Jin;Lee, Kwon-Soon
- Proceedings of the KIEE Conference
- /
- 1998.07b
- /
- pp.555-559
- /
- 1998
The human chromosome analysis is widely used to diagnose genetic disease and various congenital anomalies. Many researches on automated chromosome karyotype analysis have been carried out, some of which produced commercial systems. However, there still remains much room for improving the accuracy of chromosome classification. In this paper, We proposed an optimal pattern classifier by neural network to improve the accuracy of chromosome classification. The proposed pattern classifier was built up of two-step multi-layer neural network(TMANN). We reconstructed chromosome image to improve the chromosome classification accuracy and extracted four morphological features parameters such as centromeric index (C.I.), relative length ratio(R.L.), relative area ratio(R.A.) and chromosome length(C.L.). These Parameters employed as input in neural network by preprocessing twenty human chromosome images. The experiment results shown that the chromosome classification error was reduced much more than that of the other classification methods.
PDF

High Performance Recognition System for Chinese Character (고성능 한자 인식 시스템)

An, Seong-Ok;Ju, Gi-Ho
- The Journal of Engineering Research
- /
- v.1 no.1
- /
- pp.59-64
- /
- 1997
More than 2,000 different chinese characters are used daily in Korea newspapers and publications. The large repertoire of character pattern are the main difficulties when machine recognition of chinese characters is concerned. The goal of this paper is to conceive, evaluate and refine techniques for high performance Chinese character recognition. A new character classifier was being developed using prototype creation method.
PDF

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

Kim, Young Soo;Lee, Byoung Yup
- The Journal of the Korea Contents Association
- /
- v.17 no.11
- /
- pp.600-608
- /
- 2017
Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.
https://doi.org/10.5392/JKCA.2017.17.11.600 인용 PDF KSCI

Real-time Classification of Internet Application Traffic using a Hierarchical Multi-class SVM

Yu, Jae-Hak;Lee, Han-Sung;Im, Young-Hee;Kim, Myung-Sup;Park, Dai-Hee
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.4 no.5
- /
- pp.859-876
- /
- 2010
In this paper, we propose a hierarchical application traffic classification system as an alternative means to overcome the limitations of the port number and payload based methodologies, which are traditionally considered traffic classification methods. The proposed system is a new classification model that hierarchically combines a binary classifier SVM and Support Vector Data Descriptions (SVDDs). The proposed system selects an optimal attribute subset from the bi-directional traffic flows generated by our traffic analysis system (KU-MON) that enables real-time collection and analysis of campus traffic. The system is composed of three layers: The first layer is a binary classifier SVM that performs rapid classification between P2P and non-P2P traffic. The second layer classifies P2P traffic into file-sharing, messenger and TV, based on three SVDDs. The third layer performs specialized classification of all individual application traffic types. Since the proposed system enables both coarse- and fine-grained classification, it can guarantee efficient resource management, such as a stable network environment, seamless bandwidth guarantee and appropriate QoS. Moreover, even when a new application emerges, it can be easily adapted for incremental updating and scaling. Only additional training for the new part of the application traffic is needed instead of retraining the entire system. The performance of the proposed system is validated via experiments which confirm that its recall and precision measures are satisfactory.
https://doi.org/10.3837/tiis.2010.10.009 인용 PDF KSCI

A Vehicle Classification Method in Thermal Video Sequences using both Shape and Local Features (형태특징과 지역특징 융합기법을 활용한 열영상 기반의 차량 분류 방법)

Yang, Dong Won
- Journal of IKEEE
- /
- v.24 no.1
- /
- pp.97-105
- /
- 2020
A thermal imaging sensor receives the radiating energy from the target and the background, so it has been widely used for detection, tracking, and classification of targets at night for military purpose. In recognizing the target automatically using thermal images, if the correct edges of object are used then it can generate the classification results with high accuracy. However since the thermal images have lower spatial resolution and more blurred edges than color images, the accuracy of the classification using thermal images can be decreased. In this paper, to overcome this problem, a new hierarchical classifier using both shape and local features based on the segmentation reliabilities, and the class/pose updating method for vehicle classification are proposed. The proposed classification method was validated using thermal video sequences of more than 20,000 images which include four types of military vehicles - main battle tank, armored personnel carrier, military truck, and estate car. The experiment results showed that the proposed method outperformed the state-of-the-arts methods in classification accuracy.
https://doi.org/10.7471/ikeee.2020.24.1.97 인용 PDF KSCI

Search Result 55, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)