• Title/Summary/Keyword: Query Index

Search Result 410, Processing Time 0.023 seconds

Indexing and Retrieval Mechanism using Variation Patterns of Theme Melodies in Content-based Music Information Retrievals (내용 기반 음악 정보 검색에서 주제 선율의 변화 패턴을 이용한 색인 및 검색 기법)

  • 구경이;신창환;김유성
    • Journal of KIISE:Databases
    • /
    • v.30 no.5
    • /
    • pp.507-520
    • /
    • 2003
  • In this paper, an automatic construction method of theme melody index for large music database and an associative content-based music retrieval mechanism in which the constructed theme melody index is mainly used to improve the users' response time are proposed. First, the system automatically extracted the theme melody from a music file by the graphical clustering algorithm based on the similarities between motifs of the music. To place an extracted theme melody into the metric space of M-tree, we chose the average length variation and the average pitch variation of the theme melody as the major features. Moreover, we added the pitch signature and length signature which summarize the pitch variation pattern and the length variation pattern of a theme melody, respectively, to increase the precision of retrieval results. We also proposed the associative content-based music retrieval mechanism in which the k-nearest neighborhood searching and the range searching algorithms of M-tree are used to select the similar melodies to user's query melody from the theme melody index. To improve the users' satisfaction, the proposed retrieval mechanism includes ranking and user's relevance feedback functions. Also, we implemented the proposed mechanisms as the essential components of content-based music retrieval systems to verify the usefulness.

Content-Based Video Search Using Eigen Component Analysis and Intensity Component Flow (고유성분 분석과 휘도성분 흐름 특성을 이용한 내용기반 비디오 검색)

  • 전대홍;강대성
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.3
    • /
    • pp.47-53
    • /
    • 2002
  • In this paper, we proposed a content-based video search method using the eigen value of key frame and intensity component. We divided the video stream into shot units to extract key frame representing each shot, and get the intensity distribution of the shot from the database generated by using ECA(Eigen Component Analysis). The generated codebook, their index value for each key frame, and the intensity values were used for database. The query image is utilized to find video stream that has the most similar frame by using the euclidean distance measure among the codewords in the codebook. The experimental results showed that the proposed algorithm is superior to any other methols in the search outcome since it makes use of eigen value and intensity elements, and reduces the processing time etc.

  • PDF

Cloaking Method supporting K-anonymity and L-diversity for Privacy Protection in Location-Based Services (위치기반 서비스에서 개인 정보 보호를 위한 K-anonymity 및 L-diversity를 지원하는 Cloaking 기법)

  • Kim, Ji-Hee;Lee, Ah-Reum;Kim, Yong-Ki;Um, Jung-Ho;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.4
    • /
    • pp.1-10
    • /
    • 2008
  • In wireless internet, the location information of the user is one of the important resources for many applications. One of these applications is Location-Based Services (LBSs) which are being popular. Because, in the LBS system, users request a location-based query to LBS servers by sending their exact location, the location information of the users can be misused by adversaries. In this regard, there must be a mechanism which can deal with privacy protection of the users. In this paper, we propose a cloaking method considering both features of K-anonymity and L-diversity. Our cloaking method creates a minimum cloaking region by finding L number of buildings (L-diversity) and then finding number of users (K-anonymity). To support this, we use a R*-tree based index structure and use filtering methods especially for the m inimum cloaking region. Finally, we show from a performance analysis that our method outperforms the existing grid based cloaking method.

  • PDF

An Algorithm for generating Cloaking Region Using Grids for Privacy Protection in Location-Based Services (위치기반 서비스에서 개인 정보 보호를 위한 그리드를 이용한 Cloaking 영역 생성 알고리즘)

  • Um, Jung-Ho;Kim, Ji-Hee;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.151-161
    • /
    • 2009
  • In Location-Based Services (LBSs), users requesting a location-based query send their exact location to a database server and thus the location information of the users can be misused by adversaries. Therefore, a privacy protection method is required for using LBS in a safe way. In this paper, we propose a new cloaking region generation algorithm using grids for privacy protection in LBSs. The proposed algorithm creates a m inimum cloaking region by finding L buildings and then performs K-anonymity to search K users. For this, we make use of not only a grid-based index structure, but also an efficient pruning techniques. Finally, we show from a performance analysis that our cloaking region generation algorithm outperforms the existing algorithm in term of the size of cloaking region.

  • PDF

Efficient Processing method of OLAP Range-Sum Queries in a dynamic warehouse environment (다이나믹 데이터 웨어하우스 환경에서 OLAP 영역-합 질의의 효율적인 처리 방법)

  • Chun, Seok-Ju;Lee, Ju-Hong
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.427-438
    • /
    • 2003
  • In a data warehouse, users typically search for trends, patterns, or unusual data behaviors by issuing queries interactively. The OLAP range-sum query is widely used in finding trends and in discovering relationships among attributes in the data warehouse. In a recent environment of enterprises, data elements in a data cube are frequently changed. The problem is that the cost of updating a prefix sum cube is very high. In this paper, we propose a novel algorithm which reduces the update cost significantly by an index structure called the Δ-tree. Also, we propose a hybrid method to provide either approximate or precise results to reduce the overall cost of queries. It is highly beneficial for various applications that need quick approximate answers rather than time consuming accurate ones, such as decision support systems. An extensive experiment shows that our method performs very efficiently on diverse dimensionalities, compared to other methods.

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

Integrated Information Retrieval with Metadata Interface for Heterogeneous Distributed XML Documents (메타정보 인터페이스를 이용한 이질 구조 분석 XML문서 통합 검색)

  • 류성준;황재문;김태훈;남영광
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1505-1518
    • /
    • 2004
  • We propose an extremely light DDXMI approach for semi-automated integration of both structurally and semantically heterogeneous distributed XML documents. In the proposed prototype, a DDXMI(Distributed Documents XML Metadata Interface) is defined and a user interface generator is developed. The prototype takes sources' DTDs as inputs and generates a friendly graphical user interface for the application users. The user can easily describe the semantic mapping between the integrated virtual database DTD and sources' DTDs through assigning index numbers and specifying associated function names so that the DDXMI based on the mappings is automatically generated. Quilt is selected as the XML query language which processes user queries according to the DDXMI. It is assumed that the application users know what they want from the different sources, that is, they have their own integrated database schema in their mind, and know the semantics of the involved XML databases. A small-size global DTD and a mid-size global DTB are generated to verify the rluery generation and retrieval results with 3 XML document databases, that is, Master/ph.D thesis, research reports, and journal databases. The system has been developed with JavaCC and Java Servelet.

Question Analysis and Expansion based on Semantics (의미 기반의 질의 분석 및 확장)

  • Shin, Seung-Eun;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.7
    • /
    • pp.50-59
    • /
    • 2007
  • This paper describes a question analysis and expansion based on semantics for on efficient information retrieval. Results of all information retrieval systems include many non-relevant documents because the index cannot naturally reflect the contents of documents and because queries used in information retrieval systems cannot represent enough information in user's question. To solve this problem, we analyze user's question semantically, determine the answer type, and extract semantic features. And then we expand user's question using them and syntactic structures which are used to represent the answer. Our similarity is to rank documents which include expanded queries in high position. Especially, we found that an efficient document retrieval is possible by a question analysis and expansion based on semantics on natural language questions which are comparatively short but fully expressing the information demand of users.

Design of Lazy Classifier based on Fuzzy k-Nearest Neighbors and Reconstruction Error (퍼지 k-Nearest Neighbors 와 Reconstruction Error 기반 Lazy Classifier 설계)

  • Roh, Seok-Beom;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.101-108
    • /
    • 2010
  • In this paper, we proposed a new lazy classifier with fuzzy k-nearest neighbors approach and feature selection which is based on reconstruction error. Reconstruction error is the performance index for locally linear reconstruction. When a new query point is given, fuzzy k-nearest neighbors approach defines the local area where the local classifier is available and assigns the weighting values to the data patterns which are involved within the local area. After defining the local area and assigning the weighting value, the feature selection is carried out to reduce the dimension of the feature space. When some features are selected in terms of the reconstruction error, the local classifier which is a sort of polynomial is developed using weighted least square estimation. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods such as standard neural networks, support vector machine, linear discriminant analysis, and C4.5 trees.

Measuring Hadoop Optimality by Lorenz Curve (로렌츠 커브를 이용한 하둡 플랫폼의 최적화 지수)

  • Kim, Woo-Cheol;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.249-261
    • /
    • 2014
  • Ever increasing "Big data" can only be effectively processed by parallel computing. Parallel computing refers to a high performance computational method that achieves effectiveness by dividing a big query into smaller subtasks and aggregating results from subtasks to provide an output. However, it is well-known that parallel computing does not achieve scalability which means that performance is improved linearly by adding more computers because it requires a very careful assignment of tasks to each node and collecting results in a timely manner. Hadoop is one of the most successful platforms to attain scalability. In this paper, we propose a measurement for Hadoop optimization by utilizing a Lorenz curve which is a proxy for the inequality of hardware resources. Our proposed index takes into account the intrinsic overhead of Hadoop systems such as CPU, disk I/O and network. Therefore, it also indicates that a given Hadoop can be improved explicitly and in what capacity. Our proposed method is illustrated with experimental data and substantiated by Monte Carlo simulations.