• Title/Summary/Keyword: 중복도

Search Result 3,882, Processing Time 0.03 seconds

Indexing method with deduplication for efficient RDF data retrieving (효율적인 RDF 데이터 검색을 위한 중복 제거 색인 방법)

  • Jang, Hyeonggyu;Bang, Sungho;Oh, Sangyoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.61-62
    • /
    • 2020
  • RDF의 활용이 증가하면서 RDF데이터를 저장하는 방법 또한 많은 연구가 이루어졌다. 그래프 형태인 RDF 데이터를 테이블로 바꿀 때, 동일한 데이터가 중복 저장되어 검색 시 불필요한 연산을 하는 문제점이 발생한다. 본 논문에서는 중복저장 및 불필요한 검색을 줄이기 위해 색인을 주어(S), 목적어(O) 색인과 이들의 중복 값을 별도의 색인을 만들고, 검색 시 중복 값을 확인하여 필요한 색인만 검색하는 기법을 제안한다. 실험에서 본 기법을 사용하여 불필요한 검색을 줄여서 전체적인 검색 시간이 줄어드는 것을 확인하였다.

  • PDF

A Framework for Handling Duplicate Documents in a Blog Environment (블로그 환경에서의 중복문서 핸들링을 위한 프레임워크)

  • Lee, Soon-Haeng;Lee, Sang-Chul;Kim, Sang-Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.239-242
    • /
    • 2008
  • 블로그 환경에서의 중복문서는 블로그 검색 서비스 성능의 저하를 초래한다. 기존의 웹 페이지 환경에서와는 달리 블로그 환경에서는 문서의 생성 시점을 알 수 있어 원본 문서와 중복문서를 쉽게 파악 할 수 있다는 특징이 있다. 본 논문에서는 이 점에 착안하여 문서를 저장하는 시점에 중복 여부를 판정함으로써 검색 결과에 중복문서가 반영되는 것을 원천적으로 방지할 수 있는 효과적인 중복문서 핸들링 프레임워크를 제안한다. 또한, 성능 평가를 통해 제안하는 프레임워크의 우수성을 보인다.

An Efficient Method for Detecting Duplicated Documents in a Blog Service System (블로그 서비스 시스템을 위한 효과적인 중복문서의 검출 기법)

  • Lee, Sang-Chul;Lee, Soon-Haeng;Kim, Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.37 no.1
    • /
    • pp.50-55
    • /
    • 2010
  • Duplicate documents in blog service system are one of causes that deteriorate both of the quality and the performance of blog searches. Unlike the WWW environment, the creation of documents is reported every time in blog service system, which makes it possible to identify the original document from its duplicate documents. Based on this observation, this paper proposes a novel method for detecting duplication documents in blog service system. This method determines whether a document is original or not at the time it is stored in the blog service system. As a result, it solves the problem of duplicate documents retrieved in the search result by keeping those documents from being stored in the index for the blog search engine. This paper also proposes three indexing methods that preserve an accuracy of previous work, Min-hashing. We show most effective indexing method via extensive experiments using real-life blog data.

A Study on the Performance Improvement with Subband Overlapping Variation for Overlapped Multicarrier DS-CDMA Systems (중복된 멀티캐리어 DS-CDMA 시스템의 서브밴드 중복율 변화에 따른 성능개선에 관한 연구)

  • O, Jeong-Heon;Park, Gwang-Cheol;Kim, Gi-Du
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.37 no.9
    • /
    • pp.11-23
    • /
    • 2000
  • Multicarrier DS-CDMA is an effective approach to realize wideband CDMA system in a multipath fading channel. In this paper, we propose a convolutionally-coded overlapped multicarrier DS-CDMA system, and analyze the performance with subband overlapping variation to determine the overlapping percentage showing best performance. Given a total number of subcarriers M*R, we will show that the BER variation is highly dependent on the rolloff factor P of raised-cosine chip wave-shaping filter irrespective of convolutional encoding rate I/M and repetition coding rate 1/R. We also analyze the possibility of reduction in total MUI by considering both variation of a rolloff factor (0 ($\beta$ :1) and variation of subband overlapping factor (0 ( A :2), and show that the proposed system may outperform the multicarrier DS-CDMA system in [1, 12].

  • PDF

Redundancy Allocation in A Multi-Level Series System by Cuckoo Search (뻐꾸기 탐색 방법을 활용한 다계층 시스템의 중복 할당 최적화)

  • Chung, Il-Han
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.4
    • /
    • pp.334-340
    • /
    • 2017
  • Reliability is considered a particularly important design factor for systems that have critical results once a failure occurs in a system, such as trains, airplanes, and passenger ships. The reliability of the system can be improved in several ways, but in a system that requires considerable reliability, the redundancy of parts is efficient in improving the system reliability. In the case of duplicating parts to improve reliability, the kind of parts and the number of duplicating parts should be determined under the system reliability, part costs, and resources. This study examined the redundancy allocation of multi-level systems with serial structures. This paper describes the definition of a multi-system and how to optimize the kind of parts and number of duplications to maximize the system reliability. To optimize the redundancy, the cuckoo search algorithm was applied. The search procedure, the solution representation and the development of the neighborhood solution were proposed to optimize the redundancy allocation of a multi-level system. The results of numerical experiments were compared with the genetic algorithm and cuckoo search algorithm.

An Efficient Data Nigration/Replication Scheme in a Large Scale Multimedia Server (대규모 멀티미디어 서버에서 효율적인 데이터 이동/중복 기법)

  • Kim, Eun-Sam
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.37-44
    • /
    • 2009
  • Recently, as the quality of multimedia data gets higher, multimedia servers require larger storage capacity and higher I/O bandwidth. In these large scale multimedia servers, the load-unbalance problem among disks due to the difference in access frequencies to multimedia objects according to their popularities significantly affects the system performance. To address this problem, many data replication schemes have been proposed. In this paper, we propose a novel data migration/replication scheme to provide better storage efficiency and performance than the dynamic data replication scheme which is typical data replication scheme employed in multimedia servers. This scheme can reduce the additional storage space required for replication, which is a major defect of replication schemes, by decreasing the number of copies per object. The scheme can also increase the number of concurrent users by increasing the caching effect due to the reduced lengths of the intervals among requests for each object.

Primary Copy based Data Replication Scheme for Ensuring Data Consistency in Mobile Ad-hoc Networks (이동적응망에서 데이터 일관성 보장을 위한 주사본 기반 데이터 중복 기법)

  • Moon, Ae-Kyung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.334-336
    • /
    • 2005
  • 이동적응망(MANET: Mobile Ad-hoc Network)은 네트워크 하부 구조를 필요로 하지 않은 무선 단말들로 구성된 네트워크이다. 이러한 특성은 네트워크 단절 가능성을 높게 하기 때문에 이동단말들의 데이터 액세스률이 낮아지게 된다는 문제점을 갖는다. 이를 해결하기 위하여 이동 노드들은 데이터의 중복사본을 갖는다. 이동 노드가 갖는 중복사본은 데이터 일관성을 유지하기 위하여 별도의 중복관리 기법이 필요하다. 하지만 MANET을 구성하는 이동 노드들은 일반적으로 제한된 전력을 가지고 있고 단절될 가능성이 높기 때문에 중복 사본의 일관성 보장은 어려운 문제로 지적되고 있다. 기존에 제안된 MANET에서의 데이터 중복관리 기법은 데이터 액세스 빈도수를 계산하여 액세스률을 높이는 방법에 주안점을 두고 있고 갱신 데이터의 일관성 보장은 그 어려움 때문에 주로 판독 연산만 고려하였다. 갱신 트랜잭션을 지원하는 경우 대부분 높은 통신비용을 이유로 데이터 일관성을 보장하지 않는다. 또한 이동 노드가 다수의 서버를 통해서 갱신 연산을 실행하기 때문에 통신 오버헤드로 인하여 전력소모가 크다. 본 논문에서는 주사본 노드를 통하여 갱신을 가능하게 함으로써 데이터 일관성을 유지할 수 있는 데이터 중복 기법을 제안한다. 제안된 기법은 이동 노드들의 에너지 특성을 고려하여 더 않은 에너지를 가진 노드에게 갱신 전파 및 일관성 유지를 의뢰함으로써 상대적으로 낮은 에너지를 갖는 이동 노드의 에너지 효율을 고려하였다.

  • PDF

Isolation and Identification of Hyperparasites against Powdery Mildew Fungi in Korea (우리나라에서 흰가루병균(病菌)을 침해하는 중복기생균(重複寄生菌)의 분리(分離) 및 동정(同定))

  • Shin, Hyeon-Dong
    • The Korean Journal of Mycology
    • /
    • v.22 no.4
    • /
    • pp.355-365
    • /
    • 1994
  • An extensive ssurvey was conducted on the occurrence of hyperparasites (HP) on powdery milew species in Korea during $1991{\sim}1994$ seasons. As a result, a total of 1070 materials infected with powdery mildew fungi were collected. Of these, 92 ones were infected with the HP; 6 with the unidentified HP and the rest 86 with Ampelomyces quisqualis. This showed infection of powdery mildew species with HP was common phenomenon in nature and A. quisqualis was the most common HP in Korea. To prove the hyperparasitism of A. quisqualis, 24 isolates from 32 collections in 1992 year were successfully cultured. All isolates tested were hyperparasitic to cucumber powdery mildew, Sphaerotheca fusca.

  • PDF

Encrypted Data Deduplication Using Key Issuing Server (키 발급 서버를 이용한 암호데이터 중복제거 기술)

  • Kim, Hyun-il;Park, Cheolhee;Hong, Dowon;Seo, Changho
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.143-151
    • /
    • 2016
  • Data deduplication is an important technique for cloud storage savings. These techniques are especially important for encrypted data because data deduplication over plaintext is basically vulnerable for data confidentiality. We examined encrypted data deduplication with the aid of a key issuing server and compared Convergent Encryption with a technique created by M.Bellare et al. In addition, we implemented this technique over not only Dropbox but also an open cloud storage service, Openstack Swift. We measured the performance for this technique over Dropbox and Openstack Swift. According to our results, we verified that the encrypted data deduplication technique with the aid of a key issuing server is a feasible and versatile method.

MetaSearch for Entry Page Finding Task (엔트리 페이지 검색을 위한 메타 검색)

  • Kang In-Ho
    • The KIPS Transactions:PartB
    • /
    • v.12B no.2 s.98
    • /
    • pp.215-222
    • /
    • 2005
  • In this paper, a MetaSearch algorithm for navigational queries is presented. Previous MetaSearch algorithms focused on informational queries. They Eave a high score to an overlapped document. However, the overemphasis of overlapped documents may degrade the performance of a MetaSearch algerian for a navigational query. However, if a lot of result documents are from a certain domain or a directory, then we can assume the importance of the domain or directory. Various experiments are conducted to show the effectiveness of overlap of a domain and directory names. System results from TREC and commercial search engines are used for experiments. From the results of experiments, the overlap of documents showed the better performance for informational queries. However, the overlap of domain names and directory names showed the $10\%$ higher performance for navigational queries.