• 제목/요약/키워드: Words classification

Search Result 463, Processing Time 0.026 seconds

A Semantic Classification Model for Educational Resource Repositories (교육용 자원 저장소를 위한 의미적 분류 모델)

  • Choi, Myoung-Hoi;Jeong, Dong-Won
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.35-45
    • /
    • 2007
  • This paper proposes a classification model for systematical management of resources in educational repositories. A classification scheme should be provided to systematically store and manage, precisely retrieve, and maximize the usability of the resources. However, there is little research result on the classification scheme and classification model for educational repository resources. It causes several issues such as inefficient management of educational resources, incorrect retrieval, and low usability. However, there are different characteristics between the educational resource information and information of the previous fields. Therefore, a novel research on the classification scheme and classification model for the resources in educational repositories is required. To achieve the goal for efficient and easy use of the educational resources, we should manage consistently the resources according to the classification scheme accepting several views. This paper proposes a classification model to systematically manage and increase the usability of the educational resources. In other words, the proposed classification model can manages dynamically the classification scheme for the resources in educational repositories according to various views. To achieve the objectives, we first define a proper classification scheme for the implementation resources based on the classification scheme in relevant scientific technology fields. Especially, we define a novel classification model to dynamically manage the defined classification scheme. The proposed classification scheme and classification model enable more precise and systematic management of implementation resources and also increase the ease of usability.

CCMS (Crop Classification Management System) Detecting Growth Environment Changes to Improve Crop Production Rate (작물 생산률 향상을 위한 생장 환경 변화 탐지 CCMS(Crop Classification Management System))

  • Choi, Hokil;Lee, Byungkwan;Son, Surak;Ahn, Heuihak
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.2
    • /
    • pp.145-152
    • /
    • 2020
  • In this paper, we propose the Crop Classification Management System (CCMS) that detects changes in growth environment to improve crop production rate. The CCMS consists of two modules. First, the Crop Classification Module (CCM) classifies crops through CNN. Second, the Farm Anomaly Detection Module (FADM) detects abnormal crops by comparing accumulated data of farms. The CCM recognizes crops currently grown on farms and sends them to the FADM, and the FADM picks up the weather data from the past to the present day of the farm growing the crops and applies them to the Nelson rules. The FADM uses the Nelson rules to find out weather data that has occurred and adjust farm conditions through IoT devices. The performance analysis of CCMS showed that the CCM had a crop classification accuracy of about 90%, and the FADM improved the estimated yield by up to about 30%. In other words, managing farms through the CCMS can help increase the yield of smart farms.

Time-Series based Dataset Selection Method for Effective Text Classification (효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법)

  • Chae, Yeonghun;Jeong, Do-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.39-49
    • /
    • 2017
  • As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance.

Performance Comparison of Korean Dialect Classification Models Based on Acoustic Features

  • Kim, Young Kook;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.37-43
    • /
    • 2021
  • Using the acoustic features of speech, important social and linguistic information about the speaker can be obtained, and one of the key features is the dialect. A speaker's use of a dialect is a major barrier to interaction with a computer. Dialects can be distinguished at various levels such as phonemes, syllables, words, phrases, and sentences, but it is difficult to distinguish dialects by identifying them one by one. Therefore, in this paper, we propose a lightweight Korean dialect classification model using only MFCC among the features of speech data. We study the optimal method to utilize MFCC features through Korean conversational voice data, and compare the classification performance of five Korean dialects in Gyeonggi/Seoul, Gangwon, Chungcheong, Jeolla, and Gyeongsang in eight machine learning and deep learning classification models. The performance of most classification models was improved by normalizing the MFCC, and the accuracy was improved by 1.07% and F1-score by 2.04% compared to the best performance of the classification model before normalizing the MFCC.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

A Study on the effect management of human resource in Hotel (호텔기업의 인적자원 관리에 관한 연구)

  • 류진순
    • Culinary science and hospitality research
    • /
    • v.6 no.2
    • /
    • pp.199-225
    • /
    • 2000
  • It is desirable that the management of human resources, as a strategy for the competition, should be the necessity for the hotel industries to survive in the rapid change and continuous development In other words, the management of enterprise provides the foundation to form human relationship, just as the hospitality industry operates with human relationship. Here by, all the problems out hotels have faced are that our hotels should look for a new human resources. The control of human resources in hotels means that if does not only satisfy hotels, employees, and guests but improves the personal ability. also it is important for the method of hotel operation as a management. Therefor, hotel managers have to get a good human resources, at the same time, improve the potential ability from them in order to get development for industries and a person. This study in the effective project for the human resources in hotels is relation to the organization of hotel sand he factor of human resources. This research if focused on the property of classification each factor in the control of human resources. according to the classification, the relationship and an effect are commonly made in groups of properties and are named respectively.

  • PDF

Keyword Selection for Visual Search based on Wikipedia (비주얼 검색을 위한 위키피디아 기반의 질의어 추출)

  • Kim, Jongwoo;Cho, Soosun
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.960-968
    • /
    • 2018
  • The mobile visual search service uses a query image to acquire linkage information through pre-constructed DB search. From the standpoint of this purpose, it would be more useful if you could perform a search on a web-based keyword search system instead of a pre-built DB search. In this paper, we propose a representative query extraction algorithm to be used as a keyword on a web-based search system. To do this, we use image classification labels generated by the CNN (Convolutional Neural Network) algorithm based on Deep Learning, which has a remarkable performance in image recognition. In the query extraction algorithm, dictionary meaningful words are extracted using Wikipedia, and hierarchical categories are constructed using WordNet. The performance of the proposed algorithm is evaluated by measuring the system response time.

Service Trade and its Patterns (서비스 무역(貿易)과 그 유형(類型))

  • Kim, Woo-Kyu
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.13
    • /
    • pp.681-698
    • /
    • 2000
  • As volume of international trade is growing importance for service business and service trade are also increasing. Increasing importance of service trade in Korea is also a reflection of such international trading. In this treatise this writer did not address various topics in relation to international trade arising from service trade. But confined the topic to study of concept of service and concept and patterns of service trade. Depending on scholars concept of service may be varied and this holds true also with concept of service trade which also lends itself to various classification. Among them if the focus is concentrated to tax standard in international trade then it can be classified into service transaction. In other words classification can be made according to service trade separated from commodities transaction with embodiment of service and service trade accompanying commodities transaction. In this treatise this writer confined the topic to introduction of international service trade but issues arising in relation to such trade internationally are varied. For this reason more study on such topics will be required in future.

  • PDF

Term Frequency-Inverse Document Frequency (TF-IDF) Technique Using Principal Component Analysis (PCA) with Naive Bayes Classification

  • J.Uma;K.Prabha
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.113-118
    • /
    • 2024
  • Pursuance Sentiment Analysis on Twitter is difficult then performance it's used for great review. The present be for the reason to the tweet is extremely small with mostly contain slang, emoticon, and hash tag with other tweet words. A feature extraction stands every technique concerning structure and aspect point beginning particular tweets. The subdivision in a aspect vector is an integer that has a commitment on ascribing a supposition class to a tweet. The cycle of feature extraction is to eradicate the exact quality to get better the accurateness of the classifications models. In this manuscript we proposed Term Frequency-Inverse Document Frequency (TF-IDF) method is to secure Principal Component Analysis (PCA) with Naïve Bayes Classifiers. As the classifications process, the work proposed can produce different aspects from wildly valued feature commencing a Twitter dataset.