• Title/Summary/Keyword: 내용 자질

Search Result 169, Processing Time 0.023 seconds

Automatic Classification of Blog Posts (블로그 포스트의 자동 분류 시스템)

  • Jho, Hee-Sun;Kim, Su-Ah;Lee, Hyun-Ah
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.160-162
    • /
    • 2013
  • 편리한 블로그 사용과 블로그에서의 정보 탐색을 위해서는 내용에 기반한 분류가 필요하다. 대부분의 블로그 사이트에서는 내용 기반 분류를 제공하고 있으나, 블로거들은 자신이 작성한 블로그에 대한 수동 분류를 입력하지 않는 경우가 많다. 본 논문에서는 분류가 제공되는 블로그 사이트에서 각 분류별 문서를 수집하고, 어휘빈도와 문서빈도, 분류별 빈도를 활용하여 문서 내 어휘의 자질 가중치를 부여하고, 다양한 학습기를 이용하여 분류 모델을 생성한 뒤 블로그의 특성에 적합한 자질 추출 알고리즘과 분류 알고리즘을 찾아낸다. 실험에서는 본 논문에서 고안한 CTF-IECDF와 나이브 베이즈 멀티노미얼로 조합한 분류 모델이 75.40%의 분류 정확률을 보였다.

  • PDF

Emotion Classification in Dialogues Using Embedding Features (임베딩 자질을 이용한 대화의 감정 분류)

  • Shin, Dong-Won;Lee, Yeon-Soo;Jang, Jung-Sun;Lim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.109-114
    • /
    • 2015
  • 대화 시스템에서 사용자 발화에 대한 감정 분석은 적절한 시스템 응답과 서비스를 제공하는데 있어 매우 중요한 정보이다. 본 연구에서는 단순한 긍, 부정이 아닌 분노, 슬픔, 공포, 기쁨 등 Plutchick의 8 분류 체계에 해당하는 상세한 감정을 분석 하는 데 있어, 임베딩 모델을 사용하여 기존의 어휘 자질을 효과적으로 사용할 수 있는 새로운 방법을 제안한다. 또한 대화 속에서 발생한 감정의 지속성을 반영하기 위하여 문장 임베딩 벡터와 문맥 임베딩 벡터를 자질로서 이용하는 방법에 대해 제안한다. 실험 결과 제안하는 임베딩 자질은 특히 내용어에 대해 기존의 어휘 자질을 대체할 수 있으며, 데이터 부족 문제를 다소 해소하여 성능 향상에 도움이 되는 것으로 나타났다.

  • PDF

Construction of Research Fronts Using Factor Graph Model in the Biomedical Literature (팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로)

  • Kim, Hea-Jin;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.1
    • /
    • pp.177-195
    • /
    • 2017
  • This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.

Comparison of Product and Customer Feature Selection Methods for Content-based Recommendation in Internet Storefronts (인터넷 상점에서의 내용기반 추천을 위한 상품 및 고객의 자질 추출 성능 비교)

  • Ahn Hyung-Jun;Kim Jong-Woo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.279-286
    • /
    • 2006
  • One of the widely used methods for product recommendation in Internet storefronts is matching product features against target customer profiles. When using this method, it's very important to choose a suitable subset of features for recommendation efficiency and performance, which, however, has not been rigorously researched so far. In this paper, we utilize a dataset collected from a virtual shopping experiment in a Korean Internet book shopping mall to compare several popular methods from other disciplines for selecting features for product recommendation: the vector-space model, TFIDF(Term Frequency-Inverse Document Frequency), the mutual information method, and the singular value decomposition(SVD). The application of SVD showed the best performance in the analysis results.

A Study on the Musical Theme Clustering for Searching Note Sequences (음렬 탐색을 위한 주제소절 자동분류에 관한 연구)

  • 심지영;김태수
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.3
    • /
    • pp.5-30
    • /
    • 2002
  • In this paper, classification feature is selected with focus of musical content, note sequences pattern, and measures similarity between note sequences followed by constructing clusters by similar note sequences, which is easier for users to search by showing the similar note sequences with the search result in the CBMR system. Experimental document was $\ulcorner$A Dictionary of Musical Themes$\lrcorner$, the index of theme bar focused on classical music and obtained kern-type file. Humdrum Toolkit version 1.0 was used as note sequences treat tool. The hierarchical clustering method is by stages focused on four-type similarity matrices by whether the note sequences segmentation or not and where the starting point is. For the measurement of the result, WACS standard is used in the case of being manual classification and in the case of the note sequences starling from any point in the note sequences, there is used common feature pattern distribution in the cluster obtained from the clustering result. According to the result, clustering with segmented feature unconnected with the starting point Is higher with distinct difference compared with clustering with non-segmented feature.

Text-Confidence Feature Based Quality Evaluation Model for Knowledge Q&A Documents (텍스트 신뢰도 자질 기반 지식 질의응답 문서 품질 평가 모델)

  • Lee, Jung-Tae;Song, Young-In;Park, So-Young;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.10
    • /
    • pp.608-615
    • /
    • 2008
  • In Knowledge Q&A services where information is created by unspecified users, document quality is an important factor of user satisfaction with search results. Previous work on quality prediction of Knowledge Q&A documents evaluate the quality of documents by using non-textual information, such as click counts and recommendation counts, and focus on enhancing retrieval performance by incorporating the quality measure into retrieval model. Although the non-textual information used in previous work was proven to be useful by experiments, data sparseness problem may occur when predicting the quality of newly created documents with such information. To solve data sparseness problem of non-textual features, this paper proposes new features for document quality prediction, namely text-confidence features, which indicate how trustworthy the content of a document is. The proposed features, extracted directly from the document content, are stable against data sparseness problem, compared to non-textual features that indirectly require participation of service users in order to be collected. Experiments conducted on real world Knowledge Q&A documents suggests that text-confidence features show performance comparable to the non-textual features. We believe the proposed features can be utilized as effective features for document quality prediction and improve the performance of Knowledge Q&A services in the future.

Automatic Text Categorization using the Importance of Sentences (문장 중요도를 이용한 자동 문서 범주화)

  • Ko, Young-Joong;Park, Jin-Woo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.417-424
    • /
    • 2002
  • Automatic text categorization is a problem of assigning predefined categories to free text documents. In order to classify text documents, we have to extract good features from them. In previous researches, a text document is commonly represented by the frequency of each feature. But there is a difference between important and unimportant sentences in a text document. It has an effect on the importance of features in a text document. In this paper, we measure the importance of sentences in a text document using text summarizing techniques. A text document is represented by features with different weights according to the importance of each sentence. To verify the new method, we constructed Korean news group data set and experiment our method using it. We found that our new method gale a significant improvement over a basis system for our data sets.

A Document Sentiment Classification System Based on the Feature Weighting Method Improved by Measuring Sentence Sentiment Intensity (문장 감정 강도를 반영한 개선된 자질 가중치 기법 기반의 문서 감정 분류 시스템)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.6
    • /
    • pp.491-497
    • /
    • 2009
  • This paper proposes a new feature weighting method for document sentiment classification. The proposed method considers the difference of sentiment intensities among sentences in a document. Sentiment features consist of sentiment vocabulary words and the sentiment intensity scores of them are estimated by the chi-square statistics. Sentiment intensity of each sentence can be measured by using the obtained chi-square statistics value of each sentiment feature. The calculated intensity values of each sentence are finally applied to the TF-IDF weighting method for whole features in the document. In this paper, we evaluate the proposed method using support vector machine. Our experimental results show that the proposed method performs about 2.0% better than the baseline which doesn't consider the sentiment intensity of a sentence.

Study of Feature Extraction Algorithm for Harmful word Filtering (유해어 필터링을 위한 자질어 추출 알고리즘에 관한 연구)

  • Jeong Jung-Hoon;Lee Won-Hee;Lee Shin-Won;An Don-Gun;Chung Sung-Jong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.7-9
    • /
    • 2006
  • 유해 정보란 정보의 홍수 속에서 무차별적으로 제공되는 음란, 폭력 등의 내용을 담고 있는 정보를 말한다. 이러한 유해 정보들로부터 청소년 등 사회적으로 보호를 받아야 할 인터넷 이용자들을 보호하기 위한 장치가 필요하다. 현재 다양한 방법이 제안되고 연구되고 있다. 본 연구에서는 유해 문서의 필터링을 기법 중 키워드 필터링에서 사용되는 유해어 사전을 위한 자질어 추출 알고리즘에 대해서 비교/연구하였다. 키워드 필터링에서 자질어는 필터링의 성능에 많은 영향을 미친다. 따라서 필터링의 성능을 높이기 위한 자질어 추출 알고리즘 선택은 매우 중요하다. 이에 본 논문에서는 다양한 알고리즘을 비교 분석하여 정확하고 효율적인 자질어 추출 알고리즘 조합을 찾고자 하였다. 그 결과 CHI/TF-IDF 조합이 높은 성능을 보였으며 92%의 정확도를 얻을 수 있었다.

  • PDF

Multi-Document Summarization using Time Feature (시간자질을 이용한 다중 문서요약)

  • 임정민;강인수;배재학;이종혁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.898-900
    • /
    • 2004
  • 시간에 중속적인 문서집합에서 사람이 만든 요약문은 시간에 따른 중요 내용의 분포를 보여준다. 본 논문은 다중 문서에 시간 자질을 이용한 문서의 분류와 시간별 문서집합에서 핵심문장과 부가문장을 선별하고, 문장간의 계층적인 클러스터링을 통해서 중요 문장을 선별하는 방법을 제안한다. 동일한 주제를 갖는 문서집합에서 사랑이 선택한 중요 문장에 대해서 제안한 방법은 50% 정확률을 나타냈다.

  • PDF