• Title/Summary/Keyword: information classification

Search Result 8,303, Processing Time 0.031 seconds

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

A Study on the Developing Standard Classsification of the National Knowledge and Information Resources (국가지식정보 자원 분류 체계 표준화 연구)

  • Ko Young-Man;Seo Tae-Sul;Cho Sun-Yeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.40 no.3
    • /
    • pp.151-173
    • /
    • 2006
  • The purpose of this study is to make out a draft for the standard classification of the National Knowledge and Information Resources. As the result of the Study the standard classification system of the national knowledge and information resources, named "Knowledge Classification 'KC' is suggested. KC consists of 3 classification systems classification by subject, type of resources and type of media. The classification by subject has 12 main classes, and each main class has divisions. Main classes consist each of major discipline or group of related disciplines. The type of resources is classified by 10 types of content, likewise numbered 0-9, and the media of knowledge are classified by 8 types. likewise 0-7. In the Practice the notation always consists of 2 characters and 2 digits. The first character designate main class and the second character designate division. The first number designate the type of resources and the second number designate the type of media.

A Study on the Improvements of Food and Culture in Dewey Decimal Classification System (음식문화 분야의 DDC 분류체계 개선방안에 관한 연구)

  • Chung, Yeon-Kyoung;Choi, Yoon-Kyung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.21 no.1
    • /
    • pp.43-57
    • /
    • 2010
  • The purposes of this study are to examine how food and culture and Korean foods are reflected in the classification systems and to propose improvements of DDC to classify various subjects related to the materials of food and culture. For the study, six classification systems - DDC(Dewey Decimal Classification), UDC(Universal Decimal Classification), LCC(Library of Congress Classification), KDC(Korean Decimal Classification), NDC (Nippon Decimal Classification), China Library Classification - were analyzed in aspects of eating and drinking customs, eating etiquette, nutrition and diet, food and drink, meal and table service, beverage technology, and food technology. As a result, there were few headings about Korean food in six classification systems and it was necessary for DDC to have new headings for classifying Korean and Asian traditional foods and table services. Due to the literary warrant in classification systems, it is required to publish and disseminate various Korean food recipes and publications to add new headings or notes in future classification systems.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

Korean Document Classification using Characteristics of Word Information

  • Kim, Seok-Ki;Han, Kyung-Soo;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.167-175
    • /
    • 2003
  • In document classification, target of analysis is not document itself but words appeared in the document. Word information, therefore, is a significant factor in document classification. In this study, we are dealing with the classification of Korean document based on words and feature vectors. First, we present the performance of document classification using nouns and keywords. Second, we compare to the results for the size of feature vectors.

  • PDF

Classifying Biomedical Literature Providing Protein Function Evidence

  • Lim, Joon-Ho;Lee, Kyu-Chul
    • ETRI Journal
    • /
    • v.37 no.4
    • /
    • pp.813-823
    • /
    • 2015
  • Because protein is a primary element responsible for biological or biochemical roles in living bodies, protein function is the core and basis information for biomedical studies. However, recent advances in bio technologies have created an explosive increase in the amount of published literature; therefore, biomedical researchers have a hard time finding needed protein function information. In this paper, a classification system for biomedical literature providing protein function evidence is proposed. Note that, despite our best efforts, we have been unable to find previous studies on the proposed issue. To classify papers based on protein function evidence, we should consider whether the main claim of a paper is to assert a protein function. We, therefore, propose two novel features - protein and assertion. Our experimental results show a classification performance with 71.89% precision, 90.0% recall, and a 79.94% F-measure. In addition, to verify the usefulness of the proposed classification system, two case study applications are investigated - information retrieval for protein function and automatic summarization for protein function text. It is shown that the proposed classification system can be successfully applied to these applications.

Text Classification on Social Network Platforms Based on Deep Learning Models

  • YA, Chen;Tan, Juan;Hoekyung, Jung
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • The natural language on social network platforms has a certain front-to-back dependency in structure, and the direct conversion of Chinese text into a vector makes the dimensionality very high, thereby resulting in the low accuracy of existing text classification methods. To this end, this study establishes a deep learning model that combines a big data ultra-deep convolutional neural network (UDCNN) and long short-term memory network (LSTM). The deep structure of UDCNN is used to extract the features of text vector classification. The LSTM stores historical information to extract the context dependency of long texts, and word embedding is introduced to convert the text into low-dimensional vectors. Experiments are conducted on the social network platforms Sogou corpus and the University HowNet Chinese corpus. The research results show that compared with CNN + rand, LSTM, and other models, the neural network deep learning hybrid model can effectively improve the accuracy of text classification.

A Structure on Classification Service System of Internet Documents (인터넷 문서의 자동분류 서비스 시스템에 관한 구현)

  • Hwang Sung-Ha;Choi Kwang-Nam;Lee Dae-Kyu;Lee Sang-Ho
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.66-71
    • /
    • 2005
  • Using for the internet information is easy or difficult. The effort to obtain the useful information is developed the various technique such as search as well as the information repository, classification, processing and the utilization. Specially, such developments are remarkable to the Agent of various uses and the classification, conversion in processing techniques. The study introduces the classification service system of internet documents which is processing from the repository of internet information to the automatic classification and search service.

  • PDF

The Application of RS and GIS Technologies on Landslide Information Extraction of ALOS Images in Yanbian Area, China

  • Quan, He Chun;Lee, Byung Gul
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.3
    • /
    • pp.85-93
    • /
    • 2015
  • This paper mainly introduces the methods of extracting landslide information using ALOS(Advanced Land Observing Satellite) images and GIS(Geographical Information System) technology. In this study, we classified images using three different methods which are the unsupervised the supervised and the PCA(Principal Components Analysis) for extracting landslide information based on characteristics of ALOS image. From the image classification results, we found out that the quality of classified image extracted with PCA supervised method was superior than the other images extracted with the other methods. But the accuracy of landslide information extracted from this image classification was still very low as the pixels were very similar between the landslide and safety regions. It means that it is really difficult to distinguish those areas with an image classification method alone because the values of pixels between the landslide and other areas were similar, particularly in a region where the landslide and other areas coexist. To solve this problem, we used the LSM(Landslide Susceptibility Map) created with ArcView software through weighted overlay GIS method in the areas. Finally, the developed LSM was applied to the image classification process using the ALOS images. The accuracy of the extracted landslide information was improved after adopting the PCA and LSM methods. Finally, we found that the landslide region in the study area can be calculated and the accuracy can also be improved with the LSM and PCA image classification methods using GIS tools.

Novel Image Classification Method Based on Few-Shot Learning in Monkey Species

  • Wang, Guangxing;Lee, Kwang-Chan;Shin, Seong-Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.2
    • /
    • pp.79-83
    • /
    • 2021
  • This paper proposes a novel image classification method based on few-shot learning, which is mainly used to solve model overfitting and non-convergence in image classification tasks of small datasets and improve the accuracy of classification. This method uses model structure optimization to extend the basic convolutional neural network (CNN) model and extracts more image features by adding convolutional layers, thereby improving the classification accuracy. We incorporated certain measures to improve the performance of the model. First, we used general methods such as setting a lower learning rate and shuffling to promote the rapid convergence of the model. Second, we used the data expansion technology to preprocess small datasets to increase the number of training data sets and suppress over-fitting. We applied the model to 10 monkey species and achieved outstanding performances. Experiments indicated that our proposed method achieved an accuracy of 87.92%, which is 26.1% higher than that of the traditional CNN method and 1.1% higher than that of the deep convolutional neural network ResNet50.