• Title/Summary/Keyword: Data Indexing

Search Result 489, Processing Time 0.036 seconds

F-Tree : Flash Memory based Indexing Scheme for Portable Information Devices (F-Tree : 휴대용 정보기기를 위한 플래시 메모리 기반 색인 기법)

  • Byun, Si-Woo
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.257-271
    • /
    • 2006
  • Recently, flash memories are one of best media to support portable computer's storages in mobile computing environment. The features of non-volatility, low power consumption, and fast access time for read operations are sufficient grounds to support flash memory as major database storage components of portable computers. However, we need to improve traditional Indexing scheme such as B-Tree due to the relatively slow characteristics of flash operation as compared to RAM memory. In order to achieve this goal, we devise a new indexing scheme called F-Tree. F-Tree improves tree operation performance by compressing pointers and keys in tree nodes and rewriting the nodes without a slow erase operation in node insert/delete processes. Based on the results of the performance evaluation, we conclude that F-Tree indexing scheme outperforms the traditional indexing scheme.

  • PDF

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

A PROPOSAL OF SEMI-AUTOMATIC INDEXING ALGORITHM FOR MULTI-MEDIA DATABASE WITH USERS' SENSIBILITY

  • Mitsuishi, Takashi;Sasaki, Jun;Funyu, Yutaka
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.120-125
    • /
    • 2000
  • We propose a semi-automatic and dynamic indexing algorithm for multi-media database(e.g. movie files, audio files), which are difficult to create indexes expressing their emotional or abstract contents, according to user's sensitivity by using user's histories of access to database. In this algorithm, we simply categorize data at first, create a vector space of each user's interest(user model) from the history of which categories the data belong to, and create vector space of each data(title model) from the history of which users the data had been accessed from. By continuing the above method, we could create suitable indexes, which show emotional content of each data. In this paper, we define the recurrence formulas based on the proposed algorithm. We also show the effectiveness of the algorithm by simulation result.

  • PDF

Leveled Spatial Indexing Technique supporting Map Generalization (지도 일반화를 지원하는 계층화된 공간 색인 기법)

  • Lee, Ki-Jung;WhangBo, Taeg-Keun;Yang, Young-Kyu
    • Journal of Korea Spatial Information System Society
    • /
    • v.6 no.2 s.12
    • /
    • pp.15-22
    • /
    • 2004
  • Map services for cellular phone have problem for implementation, which are the limitation of a screen size. To effectively represent map data on screen of celluar phone, it need a process which translate a detailed map data into less detailed data using map generalization, and it should manipulate zoom in out quickly by leveling the generalized data. However, current spatial indexing methods supporting map generalization do not support all map generalization operations. In this paper, We propose a leveled spatial indexing method, LMG-tree, supporting map generalization and presents the results of performance evaluation.

  • PDF

Performance Analysis of Tree-based Indexing Scheme for Trajectories Processing of Moving Objects (이동객체의 궤적처리를 위한 트리기반 색인기법의 성능분석)

  • Shim, Choon-Bo;Shin, Yong-Won
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.7 no.4
    • /
    • pp.1-14
    • /
    • 2004
  • In this study, we propose Linktable based on extended TB-Tree(LTB-Tree) which can improve the performance of existing TB (Trajectory-Bundle)-tree proposed for indexing the trajectory of moving objects in GIS Applications. In addition, in order to evaluate proposed indexing scheme, we take into account as follows. At first, we select existing R*-tree, TB-tree, and LTB-tree as the subject of performance evaluation. Secondly, we make use of random data set and real data set as experimental data. Thirdly, we evaluate the performance with respect to the variation of size of memory buffer by considering the restriction of available memory of a given system. Fourth, we test them by using the experimental data set with a variation of data distribution. Finally, we think over insertion and retrieval performance of trajectory query and range query as experimental measures. The experimental results show that the proposed indexing scheme, LTB-tree, gains better performance than traditional other schemes with respect to the insertion and retrieval of trajectory query.

  • PDF

Design and Implementation of XML Indexing and Query Scheme Based on Database Concept Structure (데이터베이스의 개념구조에 기반한 XML 문서의 색인 및 질의 스키마의 설계 및 구현)

  • Choo Kyo-Nam;Woo Yo-Seob
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.317-324
    • /
    • 2006
  • In this paper, we propose a new indexing technique to solve various queries which have a strong good point not only database indexing schema take advantage of converting from semi-structured data to structured data but also performance is more faster than before. We represent structure information of XML document between nodes of tree that additional numbering information which can be bit-stream without modified structure of XML tree. And, We add in indexing schema searching incidental structure information in the process. In Querying schema, we recover ancestor nodes through give information of node using indexing schema in complete path query expression as well as relative path query expression. Therefore, it takes advantage of making derivative query expression with given query. In this process, we recognize that indexing and querying schema can get searched result set faster and more accurate. Because response time is become shorter by bit operating, when query occur and it just needs information of record set earch node in database.

A Study on Automatic Indexing of Korean Texts based on Statistical Criteria (통계적기법에 의한 한글자동색인의 연구)

  • Woo, Dong-Chin
    • Journal of the Korean Society for information Management
    • /
    • v.4 no.1
    • /
    • pp.47-86
    • /
    • 1987
  • The purpose of this study is to present an effective automatic indexing method of Korean texts based on statistical criteria. Titles and abstracts of the 299 documents randomly selected from ETRI's DOCUMENT data base are used as the experimental data in this study the experimental data is divided into 4 word groups and these 4 word groups are respectively analyzed and evaluated by applying 3 automatic indexing methods including Transition Phenomena of Word Occurrence, Inverse Document Frequency Weighting Technique, and Term Discrimination Weighting Technique.

  • PDF

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

A Distributed Indexing Scheme for Wireless Data Broadcasting of Health Information FHIR Resources (의료 정보 FHIR 리소스 무선 데이터 방송을 위한 분산 인덱싱 기법)

  • Im, Seokjin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.23-28
    • /
    • 2017
  • FHIR, next-generation standard for health information exchange, allows to exchange health information fast and to provide various health services. In this paper, we propose an indexing scheme of FHIR resources for adopting the resources to wireless data broadcasting with a secure channel. That scheme keeps the information of users to support to download FHIR resources from the secure wireless broadcast channel and the information on the resources. Using the proposed index, massive users can download their desired FHIR resources with less energy in short time. With simulation studies, we show the proposed indexing scheme outperforms other scheme broadcasting FHIR resources.