• Title/Summary/Keyword: large database

Search Result 1,454, Processing Time 0.029 seconds

An Effective Similarity Search Technique supporting Time Warping in Sequence Databases (시퀀스 데이타베이스에서 타임 워핑을 지원하는 효과적인 유살 검색 기법)

  • Kim, Sang-Wook;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.643-654
    • /
    • 2001
  • This paper discusses an effective processing of similarity search that supports time warping in large sequence database. Time warping enables finding sequences with similar patterns even when they are of different length, Previous methods fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large database. Another method that hires the suffix tree also shows poor performance due to the large tree size. In this paper we propose a new novel method for similarity search that supports time warping Our primary goal is to innovate on search performance in large database without false dismissal. to attain this goal ,we devise a new distance function $D_{tw-Ib}$ consistently underestimates the time warping distance and also satisfies the triangular inequality, $D_{tw-Ib}$ uses a 4-tuple feature vector extracted from each sequence and is invariant to time warping, For efficient processing, we employ a distance function, We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments . The results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.

  • PDF

Retrieving Protein Domain Encoding DNA Sequences Automatically Through Database Cross-referencing

  • Choi, Yoon-Sup;Yang, Jae-Seong;Ryu, Sung-Ho;Kim, Sang-Uk
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.95-98
    • /
    • 2006
  • Recent proteomic studies of protein domains require high-throughput and systematic approaches. Since most experiments using protein domains, the modules of protein-protein interactions, require gene cloning, the first experimental step should be retrieving DNA sequences of domain encoding regions from databases. For a large scale proteomic research, however, it is a laborious task to extract a large number of domain sequences manually from several inter-linked databases. We present a new methodology to retrieve DNA sequences of domain encoding regions through automatic database cross-referencing. To extract protein domain encoding regions, it traverses several inter-connected database with validation process. And we applied this method to retrieve all the EGF domain encoding DNA sequences of homo sapiens. This new algorithm was implemented using Python library PAMIE, which enables to cross-reference across distinct databases automatically.

  • PDF

Classification of HTTP Automated Software Communication Behavior Using a NoSQL Database

  • Tran, Manh Cong;Nakamura, Yasuhiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.94-99
    • /
    • 2016
  • Application layer attacks have for years posed an ever-serious threat to network security, since they always come after a technically legitimate connection has been established. In recent years, cyber criminals have turned to fully exploiting the web as a medium of communication to launch a variety of forbidden or illicit activities by spreading malicious automated software (auto-ware) such as adware, spyware, or bots. When this malicious auto-ware infects a network, it will act like a robot, mimic normal behavior of web access, and bypass the network firewall or intrusion detection system. Besides that, in a private and large network, with huge Hypertext Transfer Protocol (HTTP) traffic generated each day, communication behavior identification and classification of auto-ware is a challenge. In this paper, based on a previous study, analysis of auto-ware communication behavior, and with the addition of new features, a method for classification of HTTP auto-ware communication is proposed. For that, a Not Only Structured Query Language (NoSQL) database is applied to handle large volumes of unstructured HTTP requests captured every day. The method is tested with real HTTP traffic data collected through a proxy server of a private network, providing good results in the classification and detection of suspicious auto-ware web access.

A Data Generator for Database Benchmarks and its Performance Evaluation (데이터베이스 벤치마크를 위한 데이터 생성기와 성능 평가)

  • Ok, Eun-Taek;Jeong, Hoe-Jin;Lee, Sang-Ho
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.907-916
    • /
    • 2003
  • Database benchmarks require efficient of large-scale data. This presents the system architecture, control flows, and characteristics of the data generator we have developed. The data generator features generation of large-scale data, column-by-column data generation, a number of data distributions and verification, and real data generation. An extensive conparison with other data generators in terms of function is also presented. Finally, empirical performance experiments between RAID systems and non-RAID one have been conducted to alleviate I/O bottleneck. The test results can serve as guidelines to help confifure system architecture.

Design of Web Agents Module for Information Filtering Based on Rough Sets (러프셋에 기반한 정보필터링 웹에이전트 모듈 설계)

  • 김형수;이상부
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2004.05b
    • /
    • pp.552-556
    • /
    • 2004
  • This paper surveys the design of the adaptive information filtering agents to retrieve the useful information within a large scale database. As the information retrieval through the Internet is generalized, it is necessary to extract the useful information satisfied the user's request condition to reduce the seeking time. For the first, this module is designed by the Rough reduct to generate the reduced minimal knowledge database considered the users natural query language in a large scale knowledge database, and also it is executed the soft computing by the fuzzy composite processing to operate the uncertain value of the reduced schema domain.

  • PDF

Association Rule Mining Scheme of Large-Scale Database for Socially Aware Computing (Socially aware computing을 위한 대규모 데이터베이스의 연관 규칙 감축 기법)

  • Jeong, Hwi-Woon;Park, Geon-Yong;Park, Jong-Chang;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2013.01a
    • /
    • pp.291-294
    • /
    • 2013
  • 연관 규칙 감축 기법은 대규모 데이터를 사용하는 Socially aware computing분야에서 매우 중요한 이슈이다. 본 논문에서는 수집된 각종 데이터들을 각 속성 기준에 따라 이진 변환한 후 가중치를 부여하고 논리식 감축 방법을 이용하여 신뢰성을 보장하는 규칙을 도출하는 새로운 데이터 감축 기법을 제안한다. 이는 컴퓨터 시뮬레이션 결과 기존의 방식들에 비해 지지도, 신뢰도, 규칙 감소율, 연관 규칙 추출 시간에 좋은 성능을 보였으며 이는 빠른 시간 내에 신뢰성 높은 대규모 데이터 처리가 필요한 Socially aware computing분야에 적합하다고 판단한다.

  • PDF

Evaluating Join Performance on Relational Database Systems

  • Ordonez, Carlos;Garcia-Garcia, Javier
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.4
    • /
    • pp.276-290
    • /
    • 2010
  • The join operator is fundamental in relational database systems. Evaluating join queries on large tables is challenging because records need to be efficiently matched based on a given key. In this work, we analyze join queries in SQL with large tables in which a foreign key may be null, invalid or valid, given a referential integrity constraint. We conduct an extensive join performance evaluation on three DBMSs. Specifically, we study join queries varying table sizes, row size and key probabilistic distribution, inserting null, invalid or valid foreign key values. We also benchmark three well-known query optimizations: view materialization, secondary index and join reordering. Our experiments show certain optimizations perform well across DBMSs, whereas other optimizations depend on the DBMS architecture.

Concept Design for Measurement against Large Fire Spreading based on BuildingDatabaseofaFolkCultureVillage

  • Umegane, Takuji;Uchida, Daisuke;Mishima, Nobuo;Wakuya, Hiroshi;Okazaki, Yasuhisa;Hayashida, Yukuo;Kitagawa, Keiko;Park, Sun-gyu;Oh, Yong-sun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2015.05a
    • /
    • pp.21-22
    • /
    • 2015
  • This study aims to develop a current building condition database of an important folk culture village of South Korea considering their fire spread risk. We have selected a folk cultral village, and conducted field survey to reveal structure of buildings, materials of building wall, and roof style which make us understand current vulnerability of the village to fire spread. As a result, we made a current building condition database with map, which showed that the village had mixture of reinforced concrete and wood. Besides, we proposed a conceptual idea to prevent from large fire accident in the village.

  • PDF

${\ulcorner}Gogeumdoseojipseong{\lrcorner}$ and Medical Interchange between Korea and China ("고금도서집성(古今圖書集成)"과 한.중(韓.中) 의학교류(醫學交流))

  • Ahn, Sang-Woo
    • Korean Journal of Oriental Medicine
    • /
    • v.8 no.2 s.9
    • /
    • pp.1-16
    • /
    • 2002
  • ${\ulcorner}Gogeumdoseojipseong{\lrcorner}$, it has been compiled by Chenmenglei(1651-1723). This large encyclopedia was published in 1125 during Ohing dynasty of China in the reign of the Kangxi emperor. The medical parts of this encyclopedia was titled the name of Yibuquanlu, but it is not correct. KIOM(Korea Institute of oriental Medicine) researched the compilation and publication of the original book of this and the process of introduction to Chosun with its woodblock-printed book for making database of this book. Even more, we analyzed the structure of this book and apprehended the historical significance about medical interchange between Korea and China by this book. As the result, we found that the most part of this large encyclopedia quotedfrom ${\ulcorner}Donguibogam{\lrcorner}$ of Chosun, then it was back to Chosun and taken a part to medical books such as ${\ulcorner}Imwonkyungjaejis{\lrcorner}$, ${\ulcorner}Uijongsonikr{\lrcorner}$.

  • PDF

Development of the Design Methodology for Large-scale Data Warehouse based on MongoDB

  • Lee, Junho;Joo, Kyungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.3
    • /
    • pp.49-54
    • /
    • 2018
  • A data warehouse is a system that collectively manages and integrates data of a company. And provides the basis for decision making for management strategy. Nowadays, analysis data volumes are reaching critical size challenging traditional data ware housing approaches. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we extend the data warehouse design methodology based on relational database using star schema, and have developed a consistent design methodology from information requirement analysis to data warehouse construction for large scale data warehouse construction based on MongoDB, one of NoSQL.