• Title/Summary/Keyword: Data Indexing

Search Result 492, Processing Time 0.029 seconds

A Study on Automatic Indexing of Korean Texts based on Statistical Criteria (통계적기법에 의한 한글자동색인의 연구)

  • Woo, Dong-Chin
    • Journal of the Korean Society for information Management
    • /
    • v.4 no.1
    • /
    • pp.47-86
    • /
    • 1987
  • The purpose of this study is to present an effective automatic indexing method of Korean texts based on statistical criteria. Titles and abstracts of the 299 documents randomly selected from ETRI's DOCUMENT data base are used as the experimental data in this study the experimental data is divided into 4 word groups and these 4 word groups are respectively analyzed and evaluated by applying 3 automatic indexing methods including Transition Phenomena of Word Occurrence, Inverse Document Frequency Weighting Technique, and Term Discrimination Weighting Technique.

  • PDF

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

A Distributed Indexing Scheme for Wireless Data Broadcasting of Health Information FHIR Resources (의료 정보 FHIR 리소스 무선 데이터 방송을 위한 분산 인덱싱 기법)

  • Im, Seokjin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.23-28
    • /
    • 2017
  • FHIR, next-generation standard for health information exchange, allows to exchange health information fast and to provide various health services. In this paper, we propose an indexing scheme of FHIR resources for adopting the resources to wireless data broadcasting with a secure channel. That scheme keeps the information of users to support to download FHIR resources from the secure wireless broadcast channel and the information on the resources. Using the proposed index, massive users can download their desired FHIR resources with less energy in short time. With simulation studies, we show the proposed indexing scheme outperforms other scheme broadcasting FHIR resources.

THE EFFECT OF SOLDERING INDICES FOR THE DISTORTION OF SPLIT CAST (납착 인기재료가 분할 주조체의 변형에 미치는 영향)

  • Lee, Dong-Wook;Im, Jang-Seop;Jeong, Chang-Mo;Jeon, Yeong-Chan
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.38 no.1
    • /
    • pp.26-37
    • /
    • 2000
  • The purpose of this study was to investigate the effect that three resin indexing materials had on the distortion of split cast in the procedures of solder indexing and block fabrication. The specimen had two cylinders and connecting bar Two cylinders were a reference cylinder and a test cylinder that were machined precisely and placed on metal base. The total of specimens were 30 and they were divided into 3 groups according to the resin indexing materials : Acrylic $solder^{(R)}$, G-C Pattern $resin^{(R)},\;Z-100^{(R)}$. The relative coordinates (X, Y, Z) of centroids of both cylinders were measured by using 3-D cool dinates measuring machine. The value of indexing distortion was obtained after application of indexing material, and the value of the block distortion was obtained after fabrication of soldering block, and the value of total distortion was a value sum of indexing distortion and block distortion. Intercentroidal linear distortion ${(\sqrt{X^{'2}+Y^{'2}+Z^{'2}}-{\sqrt{X^2+Y^2+Z^2)}$ and global distortion $(\sqrt{{(X^'-X)}^2+{(Y^'-Y)}^2+{(Z^'-Z)}^2}$ were calculated from data of coordinates of centroids at each measuring stages. The results of this study were as belows ; 1. The intercentroidal distance between the split casts was reduced by indexing distortion and increased by block distortion. 2. The indexing global distortion between the split casts was smaller than block global distortion. 3. The intercentroidal linear distortion and the global distortion were no significant difference between indexing materials.

  • PDF

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.254-262
    • /
    • 2005
  • This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.

Indexing Methods of Splitting XML Documents (XML 문서의 분할 인덱스 기법)

  • Kim, Jong-Myung;Jin, Min
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.3
    • /
    • pp.397-408
    • /
    • 2003
  • Existing indexing mechanisms of XML data using numbering scheme have a drawback of rebuilding the entire index structure when insertion, deletion, and update occurs on the data. We propose a new indexing mechanism based on split blocks to cope with this problem. The XML data are split into blocks, where there exists at most a relationship between two blocks, and numbering scheme is applied to each block. This mechanism reduces the overhead of rebuilding index structures when insertion, deletion, and update occurs on the data. We also propose two algorithms, Parent-Child Block Merge Algorithm and Ancestor-Descendent Algorithm which retrieve the relationship between two entities in the XML hierarchy using this indexing mechanism. We also propose a mechanism in which the identifier of a block has the information of its Parents' block to expedite retrieval process of the ancestor-descendent relationship and also propose two algorithms. Parent-Child Block Merge Algorithm and Ancestor-Descendent Algorithm using this indexing mechanism.

  • PDF

An Efficient Bitmap Indexing Method for Multimedia Data Reflecting the Characteristics of MPEG-7 Visual Descriptors (MPEG-7 시각 정보 기술자의 특성을 반영한 효율적인 멀티미디어 데이타 비트맵 인덱싱 방법)

  • Jeong Jinguk;Nang Jongho
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.1
    • /
    • pp.9-20
    • /
    • 2005
  • Recently, the MPEG-7 standard a multimedia content description standard is wide]y used for content based image/video retrieval systems. However, since the descriptors standardized in MPEG-7 are usually multidimensional and the problem called 'Curse of dimensionality', previously proposed indexing methods(for example, multidimensional indexing methods, dimensionality reduction methods, filtering methods, and so on) could not be used to effectively index the multimedia database represented in MPEG-7. This paper proposes an efficient multimedia data indexing mechanism reflecting the characteristics of MPEG-7 visual descriptors. In the proposed indexing mechanism, the descriptor is transformed into a histogram of some attributes. By representing the value of each bin as a binary number, the histogram itself that is a visual descriptor for the object in multimedia database could be represented as a bit string. Bit strings for all objects in multimedia database are collected to form an index file, bitmap index, in the proposed indexing mechanism. By XORing them with the descriptors for query object, the candidate solutions for similarity search could be computed easily and they are checked again with query object to precisely compute the similarity with exact metric such as Ll-norm. These indexing and searching mechanisms are efficient because the filtering process is performed by simple bit-operation and it reduces the search space dramatically. Upon experimental results with more than 100,000 real images, the proposed indexing and searching mechanisms are about IS times faster than the sequential searching with more than 90% accuracy.

A Separated Indexing Technique for Efficient Evaluation of Nested Queries (내포 질의의 효율적 평가를 위한 분리 색인 기법)

  • 권영무;박용진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.7
    • /
    • pp.11-22
    • /
    • 1992
  • In this paper, a new indexing technique is proposed for efficient evaluation of nested queries on aggregation hierarchy in object-oriented data model. As an index data structure, an extended $B^{+}$ tree is introduced in which instance identifier to be searched and path information used for update of index record are stored in leaf node and subleaf node, respectively. the retrieval and update algorithm on the introduced index data structure is provided. Comparisons under a variety of conditions are given with current indexing techniques, showing improved performance in cost, i.e., the total number of pages accessed for retrieval and update.

  • PDF

Implementation of an Efficient Wavelet Based Audio Data Retrieval System (효율적인 웨이블렛 기반 오디오 데이터 검색 시스템 구현)

  • 이배호;조용춘;김광희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.82-88
    • /
    • 2002
  • In this paper, we proposed a audio indexing method that is used wavelet transform for audio data retrieval. It is difficult for audio data to make a efficient audio data index because of its own particular properties, such as requirement of large storage, real time to transfer and wide bandwidth. An audio data in del using wavelet transform make it possible to index and retrieval by using the particular wavelet transform properties. Our proposed indexing method doesn't separate data to several blocks. Therefore we use both high-pass and low-pass parts of last level coefficient of wavelet transform. Audio data indexing is made by applying the string matching algorithm to high-pass part and zero-crossing histogram to low-pass part. These are transformed to the continued strings, Through this method, we described a retrieval efficiency. The retrieval method is done by comparing the database index string to the query string and then data of minimum values is chosen to the result. Our simulation decided proper comparative coefficient and made known changing of retrieval efficiency versus audio data length. The results show that the proposed method improves retrieval efficiency compared to conventional method.

Data Aggregation Method using Shuffled Row Major Indexing on Wireless Mesh Sensor Network (무선 메쉬 센서 네트워크에서 셔플드 로우 메이져 인덱싱 기법을 활용한 데이터 수집 방법)

  • Moon, Chang-Joo;Choi, Mi-Young;Park, Jungkeun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.11
    • /
    • pp.984-990
    • /
    • 2016
  • In wireless mesh sensor networks (WMSNs), sensor nodes are connected in the form of a mesh topology and transfer sensor data by multi-hop routing. A data aggregation method for WMSNs is required to minimize the number of routing hops and the energy consumption of each node with limited battery power. This paper presents a shortest path data aggregation method for WMSNs. The proposed method utilizes a simple hash function based on shuffled row major indexing for addressing sensor nodes. This allows sensor data to be aggregated without complex routing tables and calculation for deciding the next hop. The proposed data aggregation algorithms work in a fractal fashion with different mesh sizes. The method repeatedly performs gathering and moves sensor data to sink nodes in higher-level clusters. The proposed method was implemented and simulations were performed to confirm the accuracy of the proposed algorithms.