• Title/Summary/Keyword: 연관단어

Search Result 252, Processing Time 0.028 seconds

A Study on the Application of Topic Modeling for the Book Report Text (독후감 텍스트의 토픽모델링 적용에 관한 탐색적 연구)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.1-18
    • /
    • 2016
  • The purpose of this study is to explore application of topic modeling for topic analysis of book report. Topic modeling can be understood as one method of topic analysis. This analysis was conducted with texts in 23 book reports using LDA function of the "topicmodels" package provided by R. According to the result of topic modeling, 16 topics were extracted. The topic network was constructed by the relation between the topics and keywords, and the book report network was constructed by the relation between book report cases and topics. Next, Centrality analysis was conducted targeting the topic network and book report network. The result of this study is following these. First, 16 topics are shown as network which has one component. In other words, 16 topics are interrelated. Second, book report was divided into 2 groups, book reports with high centrality and book reports with low centrality. The former group has similarities with others, the latter group has differences with others in aspect of the topics of book reports. The result of topic modeling is useful to identify book reports' topics combining with network analysis.

A Study on the Automatic Lexical Acquisition for Multi-lingustic Speech Recognition (다국어 음성 인식을 위한 자동 어휘모델의 생성에 대한 연구)

  • 지원우;윤춘덕;김우성;김석동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.434-442
    • /
    • 2003
  • Software internationalization, the process of making software easier to localize for specific languages, has deep implications when applied to speech technology, where the goal of the task lies in the very essence of the particular language. A greatdeal of work and fine-tuning has gone into language processing software based on ASCII or a single language, say English, thus making a port to different languages difficult. The inherent identity of a language manifests itself in its lexicon, where its character set, phoneme set, pronunciation rules are revealed. We propose a decomposition of the lexicon building process, into four discrete and sequential steps. For preprocessing to build a lexical model, we translate from specific language code to unicode. (step 1) Transliterating code points from Unicode. (step 2) Phonetically standardizing rules. (step 3) Implementing grapheme to phoneme rules. (step 4) Implementing phonological processes.

An Effective Incremental Text Clustering Method for the Large Document Database (대용량 문서 데이터베이스를 위한 효율적인 점진적 문서 클러스터링 기법)

  • Kang, Dong-Hyuk;Joo, Kil-Hong;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.57-66
    • /
    • 2003
  • With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increased accuracy of search. This paper proposes an efficient incremental cluttering method for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF$\times$NIDF function.

Subtopic Mining of Two-level Hierarchy Based on Hierarchical Search Intentions and Web Resources (계층적 검색 의도와 웹 자원을 활용한 2계층 구조의 서브토픽 마이닝)

  • Kim, Se-Jong;Lee, Jong-Hyeok
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.83-88
    • /
    • 2016
  • Subtopic mining is the extraction and ranking of possible subtopics, which disambiguate and specify the search intentions of an input query in terms of relevance, popularity, and diversity. This paper describes the limitations of previous studies on the utilization of web resources, and proposes a subtopic mining method with a two-level hierarchy based on hierarchical search intentions and web resources, in order to overcome these limitations. Considering the characteristics of resources provided by the official subtopic mining task, we extract various second-level subtopics reflecting hierarchical search intentions from web documents, and expand and re-rank them using other provided resources. Terms in subtopics with wider search intentions are used to generate first-level subtopics. Our method performed better than state-of-the-art methods in almost every aspect.

Speaker Adaptation for Voice Dialing (음성 다이얼링을 위한 화자적응)

  • ;Chin-Hui Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.455-461
    • /
    • 2002
  • This paper presents a method that improves the performance of the personal voice dialling system in which speaker independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.

SHRT : New Method of URL Shortening including Relative Word of Target URL (SHRT : 유사 단어를 활용한 URL 단축 기법)

  • Yoon, Soojin;Park, Jeongeun;Choi, Changkuk;Kim, Seungjoo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.6
    • /
    • pp.473-484
    • /
    • 2013
  • Shorten URL service is the method of using short URL instead of long URL, it redirect short url to long URL. While the users of microblog increased rapidly, as the creating and usage of shorten URL is convenient, shorten url became common under the limited length of writing on microblog. E-mail, SMS and books use shorten URL well, because of its simplicity. But, there is no relativeness between the most of shorten URLs and their target URLs, user can not expect the target URL. To cover this problem, there is attempts such as changing the shorten URL service name, inserting the information of website into shorten URL, and the usage of shortcode of physical address. However, each ones has the limits, so these are the trouble of automation, relatively long address, and the narrowness of applicable targets. SHRT is complementary to the attempts, as getting the idea from the writing system of Arabic. Though the writing system of Arabic has no vowel alphabet, Arabs have no difficult to understand their writing. This paper proposes SHRT, new method of URL Shortening. SHRT makes user guess the target URL using Relative word of the lowest domain of target URL without vowels.

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

A Recognition Method for Main Characters Name in Korean Novels (한국어 소설에서 주요 인물명 인식 기법)

  • Kim, Seo-Hee;Park, Tae-Keun;Kim, Seung-Hoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.9 no.1
    • /
    • pp.75-81
    • /
    • 2016
  • The main characters play leading roles in novels. In the previous studies, they recognize the main characters in a novel mainly based on dictionaries that built beforehand. In English, names begin with upper cases and are used with some words. In this paper, we propose a recognition method for main characters name in Korean novels by using predicates, rules and weights. We first recognize candidates for the characters name by predicates and propose some rules to exclude candidates that cannot be characters. We assign importances for candidates, considering weights that given by the number of candidates appeared in a sentence. Finally, if the importance of the character is more than a threshold, we decide that the character is one of main characters. The results from the experiments for 300 novels show that an average accuracy is 85.97%. The main characters name may be used to grasp relationships among characters, character's action and tendency.

Building Concept Networks using a Wikipedia-based 3-dimensional Text Representation Model (위키피디아 기반의 3차원 텍스트 표현모델을 이용한 개념망 구축 기법)

  • Hong, Ki-Joo;Kim, Han-Joon;Lee, Seung-Yeon
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.9
    • /
    • pp.596-603
    • /
    • 2015
  • A concept network is an essential knowledge base for semantic search engines, personalized search systems, recommendation systems, and text mining. Recently, studies of extending concept representation using external ontology have been frequently conducted. We thus propose a new way of building 3-dimensional text model-based concept networks using the world knowledge-level Wikipedia ontology. In fact, it is desirable that 'concepts' derived from text documents are defined according to the theoretical framework of formal concept analysis, since relationships among concepts generally change over time. In this paper, concept networks hidden in a given document collection are extracted more reasonably by representing a concept as a term-by-document matrix.

Auto-tagging Method for Unlabeled Item Images with Hypernetworks for Article-related Item Recommender Systems (잡지기사 관련 상품 연계 추천 서비스를 위한 하이퍼네트워크 기반의 상품이미지 자동 태깅 기법)

  • Ha, Jung-Woo;Kim, Byoung-Hee;Lee, Ba-Do;Zhang, Byoung-Tak
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.10
    • /
    • pp.1010-1014
    • /
    • 2010
  • Article-related product recommender system is an emerging e-commerce service which recommends items based on association in contexts between items and articles. Current services recommend based on the similarity between tags of articles and items, which is deficient not only due to the high cost in manual tagging but also low accuracies in recommendation. As a component of novel article-related item recommender system, we propose a new method for tagging item images based on pre-defined categories. We suggest a hypernetwork-based algorithm for learning association between images, which is represented by visual words, and categories of products. Learned hypernetwork are used to assign multiple tags to unlabeled item images. We show the ability of our method with a product set of real-world online shopping-mall including 1,251 product images with 10 categories. Experimental results not only show that the proposed method has competitive tagging performance compared with other classifiers but also present that the proposed multi-tagging method based on hypernetworks improves the accuracy of tagging.