• Title/Summary/Keyword: 다차원 텍스트 데이터베이스

Search Result 3, Processing Time 0.022 seconds

Multi-Dimensional Keyword Search and Analysis of Hotel Review Data Using Multi-Dimensional Text Cubes (다차원 텍스트 큐브를 이용한 호텔 리뷰 데이터의 다차원 키워드 검색 및 분석)

  • Kim, Namsoo;Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.1
    • /
    • pp.63-73
    • /
    • 2014
  • As the advance of WWW, unstructured data including texts are taking users' interests more and more. These unstructured data created by WWW users represent users' subjective opinions thus we can get very useful information such as users' personal tastes or perspectives from them if we analyze appropriately. In this paper, we provide various analysis efficiently for unstructured text documents by taking advantage of OLAP (On-Line Analytical Processing) multidimensional cube technology. OLAP cubes have been widely used for the multidimensional analysis for structured data such as simple alphabetic and numberic data but they didn't have used for unstructured data consisting of long texts. In order to provide multidimensional analysis for unstructured text data, however, Text Cube model has been proposed precently. It incorporates term frequency and inverted index as measurements to search and analyze text databases which play key roles in information retrieval. The primary goal of this paper is to apply this text cube model to a real data set from in an Internet site sharing hotel information and to provide multidimensional analysis for users' reviews on hotels written in texts. To achieve this goal, we first build text cubes for the hotel review data. By using the text cubes, we design and implement the system which provides multidimensional keyword search features to search and to analyze review texts on various dimensions. This system will be able to help users to get valuable guest-subjective summary information easily. Furthermore, this paper evaluats the proposed systems through various experiments and it reveals the effectiveness of the system.

Skyline Query Algorithm in the Categoric Data (범주형 데이터에 대한 스카이라인 질의 알고리즘)

  • Lee, Woo-Key;Choi, Jung-Ho;Song, Jong-Su
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.819-823
    • /
    • 2010
  • The skyline query is one of the effective methods to deal with the large amounts and multi-dimensional data set. By utilizing the concept of 'dominate' the skyline query can pinpoint the target data so that the dominated ones, about 95% of them, can efficiently be excluded as an unnecessary data. Most of the skyline query algorithms, however, have been developed in terms of the numerical data set. This paper pioneers an entirely new domain, the categorical data, on which the corresponding ranking measures for the skyline queries are suggested. In the experiment, the ACM Computing Classification System has been exploited to which our methods are significantly represented with respect to performance thresholds such as the processing time and precision ratio, etc.

Construction of the Digital Archive System from the Records of Westerners Who Stayed in Korea during the Enlightenment Period of Chosun (개화기 조선 체류 서양인 기록물의 디지털 아카이브 시스템 구축)

  • Chung, Heesun;Kim, Heesoon;Song, Hyun-Sook;Lee, Myeong-Hee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.27 no.4
    • /
    • pp.229-249
    • /
    • 2016
  • This study was conducted to create a digital archive for local cultural contents compiled from the records of westerners who stayed in Korea during the Enlightenment Period of Chosun. The compiled information were gathered from 22 records, and 10 main subjects, 40 sub-subjects and 239 mini-subjects were derived through the subject classification scheme. Item analysis was conducted through 38 metadata and input data types were classified and databased in Excel. Finally, a web-based digital archiving system was developed for searching and providing information through various access points. Suggestions for future research were made to expand archive contents through continuous excavation of westerners' records, to build an integrated information system of Korean digital archives incorporating individual archive systems, to develop standardization of classification schemes and a multidimensional classification system considering facet structure in cultural heritage areas, to keep consistency of contents through standardization of metadata format, and to build ontology using semantic search functions and data mining functions.