• Title/Summary/Keyword: Words classification

Search Result 463, Processing Time 0.028 seconds

A Study on the Improvement Directions of Data Classification Format for Efficient Information Management System (효율적인 정보화경영을 위한 데이터분류체계의 개선방안에 관한 연구)

  • Park, Jae-Yong
    • International Commerce and Information Review
    • /
    • v.6 no.3
    • /
    • pp.41-61
    • /
    • 2004
  • Today, most companies are needed to become interested on e-Biz and information management system. Especially, Data classification format system was very important for application to effective and efficiency management decision support. They should include main entry which consists of department, employee's name, title, publication date. Now, each company is using eleven different methods on data classification format system. In this paper finding result was as follows, in other words, general management document case using the nine date classification methods and special report management document ca se using the twodata classification methods. The aim of this study is to investigate problems that the present data classification format system has and some concerns that should be taken into account in case of the modification of the data classification system and change into a new one. This study is based on the survey in that the company managergave to 35 companies throughout the nation. As a result, the survey indicates that the crucial concerns of the participating managers are ineffective management information source and the duplication of data classification systems. This paper is the transcendental study the introduction of data classification format systems to business companies in Korea. This paper provided the fundamental data for the effective business process reengineering in business activity for management information.

  • PDF

Enhancing the Narrow-down Approach to Large-scale Hierarchical Text Classification with Category Path Information

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of Information Science Theory and Practice
    • /
    • v.5 no.3
    • /
    • pp.31-47
    • /
    • 2017
  • The narrow-down approach, separately composed of search and classification stages, is an effective way of dealing with large-scale hierarchical text classification. Recent approaches introduce methods of incorporating global, local, and path information extracted from web taxonomies in the classification stage. Meanwhile, in the case of utilizing path information, there have been few efforts to address existing limitations and develop more sophisticated methods. In this paper, we propose an expansion method to effectively exploit category path information based on the observation that the existing method is exposed to a term mismatch problem and low discrimination power due to insufficient path information. The key idea of our method is to utilize relevant information not presented on category paths by adding more useful words. We evaluate the effectiveness of our method on state-of-the art narrow-down methods and report the results with in-depth analysis.

Texture Classification Using Local Neighbor Differences (지역 근처 차이를 이용한 텍스쳐 분류에 관한 연구)

  • Saipullah, Khairul Muzzammil;Peng, Shao-Hu;Park, Min-Wook;Kim, Deok-Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.377-380
    • /
    • 2010
  • This paper proposes texture descriptor for texture classification called Local Neighbor Differences (LND). LND is a high discriminating texture descriptor and also robust to illumination changes. The proposed descriptor utilizes the sign of differences between surrounding pixels in a local neighborhood. The differences of those pixels are thresholded to form an 8-bit binary codeword. The decimal values of these 8-bit code words are computed and they are called LND values. A histogram of the resulting LND values is created and used as feature to describe the texture information of an image. Experimental results, with respect to texture classification accuracies using OUTEX_TC_00001 test suite has been performed. The results show that LND outperforms LBP method, with average classification accuracies of 92.3% whereas that of local binary patterns (LBP) is 90.7%.

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI) (지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구)

  • Lee, Ji-Hye;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.451-462
    • /
    • 2009
  • The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency (단어패턴 빈도를 이용한 단문 오피니언 문서 분류기법의 실험적 평가)

  • Chang, Jae-Young;Kim, Ilmin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.243-253
    • /
    • 2012
  • An opinion mining technique which was developed from document classification in area of data mining now becomes a common interest in domestic as well as international industries. The core of opinion mining is to decide precisely whether an opinion document is a positive or negative one. Although many related approaches have been previously proposed, a classification accuracy was not satisfiable enough to applying them in practical applications. A opinion documents written in Korean are not easy to determine a polarity automatically because they often include various and ungrammatical words in expressing subjective opinions. Proposed in this paper is a new approach of classification of opinion documents, which considers only a frequency of word patterns and excludes the grammatical factors as much as possible. In proposed method, we express a document into a bag of words and then apply a learning algorithm using a frequency of word patterns, and finally decide the polarity of the document using a score function. Additionally, we also present the experiment results for evaluating the accuracy of the proposed method.

Short Text Classification for Job Placement Chatbot by T-EBOW (T-EBOW를 이용한 취업알선 챗봇용 단문 분류 연구)

  • Kim, Jeongrae;Kim, Han-joon;Jeong, Kyoung Hee
    • Journal of Internet Computing and Services
    • /
    • v.20 no.2
    • /
    • pp.93-100
    • /
    • 2019
  • Recently, in various business fields, companies are concentrating on providing chatbot services to various environments by adding artificial intelligence to existing messenger platforms. Organizations in the field of job placement also require chatbot services to improve the quality of employment counseling services and to solve the problem of agent management. A text-based general chatbot classifies input user sentences into learned sentences and provides appropriate answers to users. Recently, user sentences inputted to chatbots are inputted as short texts due to the activation of social network services. Therefore, performance improvement of short text classification can contribute to improvement of chatbot service performance. In this paper, we propose T-EBOW (Translation-Extended Bag Of Words), which is a method to add translation information as well as concept information of existing researches in order to strengthen the short text classification for employment chatbot. The performance evaluation results of the T-EBOW applied to the machine learning classification model are superior to those of the conventional method.

A Korean Document Sentiment Classification System based on Semantic Properties of Sentiment Words (감정 단어의 의미적 특성을 반영한 한국어 문서 감정분류 시스템)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.317-322
    • /
    • 2010
  • This paper proposes how to improve performance of the Korean document sentiment-classification system using semantic properties of the sentiment words. A sentiment word means a word with sentiment, and sentiment features are defined by a set of the sentiment words which are important lexical resource for the sentiment classification. Sentiment feature represents different sentiment intensity in general field and in specific domain. In general field, we can estimate the sentiment intensity using a snippet from a search engine, while in specific domain, training data can be used for this estimation. When the sentiment intensity of the sentiment features are estimated, it is called semantic orientation and is used to estimate the sentiment intensity of the sentences in the text documents. After estimating sentiment intensity of the sentences, we apply that to the weights of sentiment features. In this paper, we evaluate our system in three different cases such as general, domain-specific, and general/domain-specific semantic orientation using support vector machine. Our experimental results show the improved performance in all cases, and, especially in general/domain-specific semantic orientation, our proposed method performs 3.1% better than a baseline system indexed by only content words.

An Application of Canonical Correlation Analysis Technique to Land Cover Classification of LANDSAT Images

  • Lee, Jong-Hun;Park, Min-Ho;Kim, Yong-Il
    • ETRI Journal
    • /
    • v.21 no.4
    • /
    • pp.41-51
    • /
    • 1999
  • This research is an attempt to obtain more accurate land cover information from LANDSAT images. Canonical correlation analysis, which has not been widely used in the image classification community, was applied to the classification of a LANDSAT images. It was found that it is easy to select training areas on the classification using canonical correlation analysis in comparison with the maximum likelihood classifier of $ERDAS^{(R)}$ software. In other words, the selected positions of training areas hardly affect the classification results using canonical correlation analysis. when the same training areas are used, the mapping accuracy of the canonical correlation classification results compared with the ground truth data is not lower than that of the maximum likelihood classifier. The kappa analysis for the canonical correlation classifier and the maximum likelihood classifier showed that the two methods are alike in classification accuracy. However, the canonical correlation classifier has better points than the maximum likelihood classifier in classification characteristics. Therefore, the classification using canonical correlation analysis applied in this research is effective for the extraction of land cover information from LANDSAT images and will be able to be put to practical use.

  • PDF

Efficient Management of Statistical Information of Keywords on E-Catalogs (전자 카탈로그에 대한 효율적인 색인어 통계 정보 관리 방법)

  • Lee, Dong-Joo;Hwang, In-Beom;Lee, Sang-Goo
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.1-17
    • /
    • 2009
  • E-Catalogs which describe products or services are one of the most important data for the electronic commerce. E-Catalogs are created, updated, and removed in order to keep up-to-date information in e-Catalog database. However, when the number of catalogs increases, information integrity is violated by the several reasons like catalog duplication and abnormal classification. Catalog search, duplication checking, and automatic classification are important functions to utilize e-Catalogs and keep the integrity of e-Catalog database. To implement these functions, probabilistic models that use statistics of index words extracted from e-Catalogs had been suggested and the feasibility of the methods had been shown in several papers. However, even though these functions are used together in the e-Catalog management system, there has not been enough consideration about how to share common data used for each function and how to effectively manage statistics of index words. In this paper, we suggest a method to implement these three functions by using simple SQL supported by relational database management system. In addition, we use materialized views to reduce the load for implementing an application that manages statistics of index words. This brings the efficiency of managing statistics of index words by putting database management systems optimize statistics updating. We showed that our method is feasible to implement three functions and effective to manage statistics of index words with empirical evaluation.

  • PDF

Detection of Character Emotional Type Based on Classification of Emotional Words at Story (스토리기반 저작물에서 감정어 분류에 기반한 등장인물의 감정 성향 판단)

  • Baek, Yeong Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.131-138
    • /
    • 2013
  • In this paper, I propose and evaluate the method that classifies emotional type of characters with their emotional words. Emotional types are classified as three types such as positive, negative and neutral. They are selected by classification of emotional words that characters speak. I propose the method to extract emotional words based on WordNet, and to represent as emotional vector. WordNet is thesaurus of network structure connected by hypernym, hyponym, synonym, antonym, and so on. Emotion word is extracted by calculating its emotional distance to each emotional category. The number of emotional category is 30. Therefore, emotional vector has 30 levels. When all emotional vectors of some character are accumulated, her/his emotion of a movie can be represented as a emotional vector. Also, thirty emotional categories can be classified as three elements of positive, negative, and neutral. As a result, emotion of some character can be represented by values of three elements. The proposed method was evaluated for 12 characters of four movies. Result of evaluation showed the accuracy of 75%.