• Title/Summary/Keyword: User Clustering

Search Result 377, Processing Time 0.048 seconds

Query Expansion based on Word Sense Community (유사 단어 커뮤니티 기반의 질의 확장)

  • Kwak, Chang-Uk;Yoon, Hee-Geun;Park, Seong-Bae
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1058-1065
    • /
    • 2014
  • In order to assist user's who are in the process of executing a search, a query expansion method suggests keywords that are related to an input query. Recently, several studies have suggested keywords that are identified by finding domains using a clustering method over the documents that are retrieved. However, the clustering method is not relevant when presenting various domains because the number of clusters should be fixed. This paper proposes a method that suggests keywords by finding various domains related to the input queries by using a community detection algorithm. The proposed method extracts words from the top-30 documents of those that are retrieved and builds communities according to the word graph. Then, keywords representing each community are derived, and the represented keywords are used for the query expansion method. In order to evaluate the proposed method, we compared our results to those of two baseline searches performed by the Google search engine and keyword recommendation using TF-IDF in the search results. The results of the evaluation indicate that the proposed method outperforms the baseline with respect to diversity.

Design and Implementation of Load Balancing Method for Efficient Spatial Query Processing in Clustering Environment (클러스터링 환경에서 효율적인 공간 질의 처리를 위한 로드 밸런싱 기법의 설계 및 구현)

  • 김종훈;이찬구;정현민;정미영;배영호
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.3
    • /
    • pp.384-396
    • /
    • 2003
  • Hybrid query processing method is used for preventing server overload that is created by heavy user connection in Web GIS. In Hybrid query processing method, both server and client participate in spatial query processing. But, Hybrid query processing method is restricted in scalability of server and it can't be fundamentally solution for server overload. So, it is necessary for Web GIS to be brought in web clustering technique. In this thesis, we propose load-balancing method that uses proximity of query region. In this paper, we create tile groups that have relation each tile in same group is very close, and forward client request to the server that can have maximum rate of buffer reuse with considering characteristic of spatial query. With out load balancing method, buffet in server is optimized for exploring spatial index tree and increase rate of buffer reuse, so it can be reduced amount of disk access and increase system performance.

  • PDF

Automatic e-mail Hierarchy Classification using Dynamic Category Hierarchy and Principal Component Analysis (PCA와 동적 분류체계를 사용한 자동 이메일 계층 분류)

  • Park, Sun
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.3
    • /
    • pp.419-425
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification and no category labels. The classification methods have a disadvantage of training and setting of category labels by user. In this paper, we propose a novel multi-way e-mail hierarchy classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF

2D-THI: Two-Dimensional Type Hierarchy Index for XML Databases (2D-THI: XML 데이테베이스를 위한 이차원 타입상속 계층색인)

  • Lee Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.3
    • /
    • pp.265-278
    • /
    • 2006
  • This paper presents a two-dimensional type inheritance hierarchy index(2D-THI) for XML databases. XML Schema is one of schema models for the XML documents supporting. The type inheritance. The conventional indexing techniques for XML databases can not support XML queries on type inheritance hierarchies. We construct a two-dimensional index structure using multidimensional file organizations for supporting type inheritance hierarchy in XML queries. This indexing technique deals with the problem of clustering index entries in the two-dimensional domain space that consists of a key element domain and a type identifier domain based on the user query pattern. This index enhances query performance by adjusting the degree of clustering between the two domains. For performance evaluation, we have compared our proposed 2D-THI with the conventional class hierarchy indexing techniques in object-oriented databases such as CH-index and CG-tree through the cost model. As the result of the performance evaluations, we have verified that our proposed two-dimensional type inheritance indexing technique can efficiently support the query Processing in XML databases according to the query types.

  • PDF

Design and Implementation of Topic Map Generation System based Tag (태그 기반 토픽맵 생성 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.730-739
    • /
    • 2010
  • One of core technology in Web 2.0 is tagging, which is applied to multimedia data such as web document of blog, image and video etc widely. But unlike expectation that the tags will be reused in information retrieval and then maximize the retrieval efficiency, unacceptable retrieval results appear owing to toot limitation of tag. In this paper, in the base of preceding research about image retrieval through tag clustering, we design and implement a topic map generation system which is a semantic knowledge system. Finally, tag information in cluster were generated automatically with topics of topic map. The generated topics of topic map are endowed with mean relationship by use of WordNet. Also the topics are endowed with occurrence information suitable for topic pair, and then a topic map with semantic knowledge system can be generated. As the result, the topic map preposed in this paper can be used in not only user's information retrieval demand with semantic navigation but alse convenient and abundant information service.

Anomaly Detection Analysis using Repository based on Inverted Index (역방향 인덱스 기반의 저장소를 이용한 이상 탐지 분석)

  • Park, Jumi;Cho, Weduke;Kim, Kangseok
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.294-302
    • /
    • 2018
  • With the emergence of the new service industry due to the development of information and communication technology, cyber space risks such as personal information infringement and industrial confidentiality leakage have diversified, and the security problem has emerged as a critical issue. In this paper, we propose a behavior-based anomaly detection method that is suitable for real-time and large-volume data analysis technology. We show that the proposed detection method is superior to existing signature security countermeasures that are based on large-capacity user log data according to in-company personal information abuse and internal information leakage. As the proposed behavior-based anomaly detection method requires a technique for processing large amounts of data, a real-time search engine is used, called Elasticsearch, which is based on an inverted index. In addition, statistical based frequency analysis and preprocessing were performed for data analysis, and the DBSCAN algorithm, which is a density based clustering method, was applied to classify abnormal data with an example for easy analysis through visualization. Unlike the existing anomaly detection system, the proposed behavior-based anomaly detection technique is promising as it enables anomaly detection analysis without the need to set the threshold value separately, and was proposed from a statistical perspective.

Implementation of a Layer-7 Web Clustering System on Linux with Performance Enhancements via Recognition of User Request Rate Variations (리눅스에서 레이어-7 웹 클러스터링 시스템의 구현 및 사용자 요청률 차이의 인식에 기반한 성능 개선)

  • Hong Il-gu;Noh Sam H.
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.1
    • /
    • pp.68-79
    • /
    • 2005
  • The popularity of Web service is ever increasing. As the number of services and clients continue to increase, the problem of providing a system that scales with this increase is becoming more difficult. A costly and ineffective method is to buy a new system that is more powerful every time the load becomes unbearable. h more cost effective solution is to expand the system as the need arises. This is the approach taken in Web cluster systems. However, providing effective scalability in a Web cluster system is stil1 an open issue. In this study, we implement a Web cluster system based on Layer 7 switching technique on Linux. The implementation is based on a design proposed and implemented by Aron et al., but on the FreeBSD. Though the design li the same, due to the vast difference between the FreeBSD and Linux, the implementation presented in this paper is totally new. We also propose the Dual Scheduling (DS) load distribution algorithm that distributes the requests to the system resources by observing the variations in the request rate. We show through measurement on our implementation that the DS alorithm performs considerably bettor than previous algorithms.

A Real-time Service Recommendation System using Context Information in Pure P2P Environment (Pure P2P 환경에서 컨텍스트 정보를 이용한 실시간 서비스 추천 시스템)

  • Lee Se-Il;Lee Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.7
    • /
    • pp.887-892
    • /
    • 2005
  • Under pure P2P environments, collaborative filtering must be provided with only a few service items by real time information without accumulated data. However, in case of collaborative filtering with only a few service items collected locally, quality of recommended service becomes low. Therefore, it is necessary to research a method to improve quality of recommended service by users' context information. But because a great volume of users' context information can be recognized in a moment, there can be a scalability problem and there are limitations in supporting differentiated services according to fields and items. In this paper, we solved the scalability problem by clustering context information Per each service field and classifying il per each user, using SOM. In addition, we could recommend proper services for users by measuring the context information of the users belonging to the similar classification to the service requester among classified data and then using collaborative filtering.

Design and Implementation of Spatial Characterization System using Density-Based Clustering (밀도 클러스터링을 이용한 공간 특성화 시스템 설계 및 구현)

  • You Jae-Hyun;Park Tae-Su;Ahn Chan-Min;Park Sang-Ho;Hong Jun-Sik;Lee Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.43-52
    • /
    • 2006
  • LRecently, with increasing interest in ubiquitous computing, knowledge discovery method is needed with consideration of the efficiency and the effectiveness of wide range and various forms of data. Spatial Characterization which extends former characterization method with consideration of spatial and non-spatial property enables to find various form of knowledge in spatial region. The previous spatial characterization methods have the problems as follows. Firstly, former study shows the problem that the result of searched knowledge is unable to perform the multiple spatial analysis. Secondly, it is unable to secure the useful knowledge search since it searches the limited spatial region which is allocated by the user. Thus, this study suggests spatial characterization which applies to density based clustering.

  • PDF

Highlight based Lyrics Search Considering the Characteristics of Query (사용자 질의어 특징을 반영한 하이라이트 기반 노래 가사 검색)

  • Kim, Kweon Yang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.4
    • /
    • pp.301-307
    • /
    • 2016
  • This paper proposes a lyric search method to consider the characteristics of the user query. According to the fact that queries for the lyric search are derived from highlight parts of the music, this paper uses the hierarchical agglomerative clustering to find the highlight and proposes a Gaussian weighting to consider the neighbor of the highlight as well as highlight. By setting the mean of a Gaussian weighting at the highlight, this weighting function has higher weights near the highlight and the lower weights far from the highlight. Then, this paper constructs a index of lyrics with the gaussian weighting. According to the experimental results on a data set obtained from 5 real users, the proposed method is proved to be effective.