• Title/Summary/Keyword: document clustering

Search Result 223, Processing Time 0.031 seconds

Resampling Feedback Documents Using Overlapping Clusters (중첩 클러스터를 이용한 피드백 문서의 재샘플링 기법)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.247-256
    • /
    • 2009
  • Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Examining the Intellectual Structure of Reading Studies with Co-Word Analysis Based on the Importance of Journals and Sequence of Keywords (학술지 중요도와 키워드 순서를 고려한 단어동시출현 분석을 이용한 독서분야의 지적구조 분석)

  • Zhang, Ling Ling;Hong, Hyun Jin
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.25 no.1
    • /
    • pp.295-318
    • /
    • 2014
  • The purpose of this study is to analyze the intellectual structure of reading studies by using Co-Word Analysis based on the mixed weight in which the level of academic journals and the position of keywords are calculated. To achieve it, 838 academic articles relating to reading studies from KCI during the period from 2003 to 2012 were retrieved and 56 keywords were extracted. The results of clustering analysis, MDS, network analysis are that the network based on the mixed weight has a better performance in above three methods and reading studies can be divided into 4 bigger divisions and 11 subdivisions. Finally, the result of document analysis shows reading studies changes its research tendency from theoretical studies to empirical studies.

Study on Designing and Implementing Online Customer Analysis System based on Relational and Multi-dimensional Model (관계형 다차원모델에 기반한 온라인 고객리뷰 분석시스템의 설계 및 구현)

  • Kim, Keun-Hyung;Song, Wang-Chul
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.4
    • /
    • pp.76-85
    • /
    • 2012
  • Through opinion mining, we can analyze the degree of positive or negative sentiments that customers feel about important entities or attributes in online customer reviews. But, the limit of the opinion mining techniques is to provide only simple functions in analyzing the reviews. In this paper, we proposed novel techniques that can analyze the online customer reviews multi-dimensionally. The novel technique is to modify the existing OLAP techniques so that they can be applied to text data. The novel technique, that is, multi-dimensional analytic model consists of noun, adjective and document axes which are converted into four relational tables in relational database. The multi-dimensional analysis model would be new framework which can converge the existing opinion mining, information summarization and clustering algorithms. In this paper, we implemented the multi-dimensional analysis model and algorithms. we recognized that the system would enable us to analyze the online customer reviews more complexly.

Adaptive Data Mining Model using Fuzzy Performance Measures (퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.541-546
    • /
    • 2006
  • Data Mining is the process of finding hidden patterns inside a large data set. Cluster analysis has been used as a popular technique for data mining. It is a fundamental process of data analysis and it has been Playing an important role in solving many problems in pattern recognition and image processing. If fuzzy cluster analysis is to make a significant contribution to engineering applications, much more attention must be paid to fundamental decision on the number of clusters in data. It is related to cluster validity problem which is how well it has identified the structure that Is present in the data. In this paper, we design an adaptive data mining model using fuzzy performance measures. It discovers clusters through an unsupervised neural network model based on a fuzzy objective function and evaluates clustering results by a fuzzy performance measure. We also present the experimental results on newsgroup data. They show that the proposed model can be used as a document classifier.

Unsupervised Motion Learning for Abnormal Behavior Detection in Visual Surveillance (영상감시시스템에서 움직임의 비교사학습을 통한 비정상행동탐지)

  • Jeong, Ha-Wook;Chang, Hyung-Jin;Choi, Jin-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.48 no.5
    • /
    • pp.45-51
    • /
    • 2011
  • In this paper, we propose an unsupervised learning method for modeling motion trajectory patterns effectively. In our approach, observations of an object on a trajectory are treated as words in a document for latent dirichlet allocation algorithm which is used for clustering words on the topic in natural language process. This allows clustering topics (e.g. go straight, turn left, turn right) effectively in complex scenes, such as crossroads. After this procedure, we learn patterns of word sequences in each cluster using Baum-Welch algorithm used to find the unknown parameters in a hidden markov model. Evaluation of abnormality can be done using forward algorithm by comparing learned sequence and input sequence. Results of experiments show that modeling of semantic region is robust against noise in various scene.

Design and Implementation of Topic Map Generation System based Tag (태그 기반 토픽맵 생성 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.730-739
    • /
    • 2010
  • One of core technology in Web 2.0 is tagging, which is applied to multimedia data such as web document of blog, image and video etc widely. But unlike expectation that the tags will be reused in information retrieval and then maximize the retrieval efficiency, unacceptable retrieval results appear owing to toot limitation of tag. In this paper, in the base of preceding research about image retrieval through tag clustering, we design and implement a topic map generation system which is a semantic knowledge system. Finally, tag information in cluster were generated automatically with topics of topic map. The generated topics of topic map are endowed with mean relationship by use of WordNet. Also the topics are endowed with occurrence information suitable for topic pair, and then a topic map with semantic knowledge system can be generated. As the result, the topic map preposed in this paper can be used in not only user's information retrieval demand with semantic navigation but alse convenient and abundant information service.

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

The Expressive Characteristics of Fashion Installation in Henrik Vibskov Collection (헨릭 빕스코브 컬렉션에 나타난 패션 인스톨레이션의 표현 특성)

  • Ko, Hyunzin
    • Journal of the Korean Society of Costume
    • /
    • v.65 no.6
    • /
    • pp.133-147
    • /
    • 2015
  • The aim of this study is to review the creative fashion installation of Henrik Vibskov, Danish designer. Its intention is to contribute useful information for more innovative fashion presentation. As a research method, document and case study were performed and his collections from 2004 F/W to 2016 S/S were analyzed. In fashion installation, the designer puts objects in meaningful spaces in order to convey a certain message, to make an integrated artwork, and to interact with spectator. It has been used in fashion exhibitions, as well as in the set design of fashion performance and fashion show. The results were as follows. Henrik Vibskov's fashion installation has three features, which are 1)conceptual theme approach that communicates a twisted and metaphoric message, with a poetic and interesting show title, 2) surrealistic scenography that plays with fragmentation of the human body, clustering of plastic and symbolic objects, innovative color transformations, and visual trickery between figures and the background, and 3) setting for multisensory performance that makes spectators interact by making artistic objects and surroundings, which stimulates the five senses. Henrik Vibskov's fashion installation can exist as an independent artwork, and not just as a supporting piece for a fashion show. It has both artistic and fashionable values, and can be an effective fashion presentation communicating his conceptual fashion themes.

Locating Text in Web Images Using Image Based Approaches (웹 이미지로부터 이미지기반 문자추출)

  • Chin, Seongah;Choo, Moonwon
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.1
    • /
    • pp.27-39
    • /
    • 2002
  • A locating text technique capable of locating and extracting text blocks in various Web images is presented here. Until now this area of work has been ignored by researchers even if this sort of text may be meaningful for internet users. The algorithms associated with the technique work without prior knowledge of the text orientation, size or font. In the work presented in this research, our text extraction algorithm utilizes useful edge detection followed by histogram analysis on the genuine characteristics of letters defined by text clustering region, to properly perform extraction of the text region that does not depend on font styles and sizes. By a number of experiments we have showed impressively acceptable results.

  • PDF

Generic Summarization Using Generic Important of Semantic Features (의미특징의 포괄적 중요도를 이용한 포괄적 문서 요약)

  • Park, Sun;Lee, Jong-Hoon
    • Journal of Advanced Navigation Technology
    • /
    • v.12 no.5
    • /
    • pp.502-508
    • /
    • 2008
  • With the increased use of the internet and the tremendous amount of data it transfers, it is more necessary to summarize documents. We propose a new method using the Non-negative Semantic Variable Matrix (NSVM) and the generic important of semantic features obtained by Non-negative Matrix Factorization (NMF) to extract the sentences for automatic generic summarization. The proposed method use non-negative constraints which is more similar to the human's cognition process. As a result, the proposed method selects more meaningful sentences for summarization than the unsupervised method used the Latent Semantic Analysis (LSA) or clustering methods. The experimental results show that the proposed method archives better performance than other methods.

  • PDF