• Title/Summary/Keyword: Probabilistic Search

Search Result 98, Processing Time 0.03 seconds

Subgraph Searching Scheme Based on Path Queries in Distributed Environments (분산 환경에서 경로 질의 기반 서브 그래프 탐색 기법)

  • Kim, Minyoung;Choi, Dojin;Park, Jaeyeol;Kim, Yeondong;Lim, Jongtae;Bok, Kyoungsoo;Choi, Han Suk;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.141-151
    • /
    • 2019
  • A network of graph data structure is used in many applications to represent interactions between entities. Recently, as the size of the network to be processed due to the development of the big data technology is getting larger, it becomes more difficult to handle it in one server, and thus the necessity of distributed processing is also increasing. In this paper, we propose a distributed processing system for efficiently performing subgraph and stores. To reduce unnecessary searches, we use statistical information of the data to determine the search order through probabilistic scoring. Since the relationship between the vertex and the degree of the graph network may show different characteristics depending on the type of data, the search order is determined by calculating a score to reduce unnecessary search through a different scoring method for a graph having various distribution characteristics. The graph is sequentially searched in the distributed servers according to the determined order. In order to demonstrate the superiority of the proposed method, performance comparison with the existing method was performed. As a result, the search time is improved by about 3 ~ 10% compared with the existing method.

Obscene Material Searching Method in WWW (WWW상에서 음란물 검색기법)

  • 노경택;김경우;이기영;김규호
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.2
    • /
    • pp.1-7
    • /
    • 1999
  • World-Wide Web(WWW) is a protocol for changing information exchanges which is central to text documents in the existing network to make a multimedia data exchanges. It is possible for a beginner to search and access data which he wants to find as data were stored in the form of hypertext. The easiness for searching and accessing the multimedia data in WWW makes a important role for obscene materials to be toward generalization and multimedia and occurs social problems for them to be commercialized, while other researchers have actively studied the way to block effectively the site providing obscene materials for solving such problems. This paper presents and implements the blocking method for the sites having obscene material as it effectively search them. The proposed model was based on Link-Based information retrieval method and proved that it accomplished more efficient retrieval of relevant documents than probabilistic model when compared the one with the other which is known to generate the most correct results. The improvements in the average recall and precision ratio were shown as 12% and 8% respectively. Especially, the retrieval capability of relevant documents which include non-text data and have a few links increased highly.

  • PDF

Inversion of Acoustical Properties of Sedimentary Layers from Chirp Sonar Signals (Chirp 신호를 이용한 해저퇴적층의 음향학적 특성 역산)

  • 박철수;성우제
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.32-41
    • /
    • 1999
  • In this paper, an inversion method using chirp signals and two near field receivers is proposed. Inversion problems can be formulated into the probabilistic models composed of signals, a forward model and noise. Forward model to simulate chirp signals is chosen to be the source-wavelet-convolution planewave modeling method. The solution of the inversion problem is defined by a posteriori pdf. The wavelet matching technique, using weighted least-squares fitting, estimates the sediment sound-speed and thickness on which determination of the ranges for a priori uniform distribution is based. The genetic algorithm can be applied to a global optimization problem to find a maximum a posteriori solution for determined a priori search space. Here the object function is defined by an L₂norm of the difference between measured and modeled signals. The observed signals can be separated into a set of two signals reflected from the upper and lower boundaries of a sediment. The separation of signals and successive applications of the genetic algorithm optimization process reduce the search space, therefore improving the inversion results. Not only the marginal pdf but also the statistics are calculated by numerical evaluation of integrals using the samples selected during importance sampling process of the genetic algorithm. The examples applied here show that, for synthetic data with noise, it is possible to carry out an inversion for sedimentary layers using the proposed inversion method.

  • PDF

Improvements of pursuit performance using episodic parameter optimization in probabilistic games (에피소드 매개변수 최적화를 이용한 확률게임에서의 추적정책 성능 향상)

  • Kwak, Dong-Jun;Kim, H.-Jin
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.40 no.3
    • /
    • pp.215-221
    • /
    • 2012
  • In this paper, we introduce an optimization method to improve pursuit performance of a pursuer in a pursuit-evasion game (PEG). Pursuers build a probability map and employ a hybrid pursuit policy which combines the merits of local-max and global-max pursuit policies to search and capture evaders as soon as possible in a 2-dimensional space. We propose an episodic parameter optimization (EPO) algorithm to learn good values for the weighting parameters of a hybrid pursuit policy. The EPO algorithm is performed while many episodes of the PEG are run repeatedly and the reward of each episode is accumulated using reinforcement learning, and the candidate weighting parameter is selected in a way that maximizes the total averaged reward by using the golden section search method. We found the best pursuit policy in various situations which are the different number of evaders and the different size of spaces and analyzed results.

Ranked Web Service Retrieval by Keyword Search (키워드 질의를 이용한 순위화된 웹 서비스 검색 기법)

  • Lee, Kyong-Ha;Lee, Kyu-Chul;Kim, Kyong-Ok
    • The Journal of Society for e-Business Studies
    • /
    • v.13 no.2
    • /
    • pp.213-223
    • /
    • 2008
  • The efficient discovery of services from a large scale collection of services has become an important issue[7, 24]. We studied a syntactic method for Web service discovery, rather than a semantic method. We regarded a service discovery as a retrieval problem on the proprietary XML formats, which were service descriptions in a registry DB. We modeled services and queries as probabilistic values and devised similarity-based retrieval techniques. The benefits of our way are follows. First, our system supports ranked service retrieval by keyword search. Second, we considers both of UDDI data and WSDL definitions of services amid query evaluation time. Last, our technique can be easily implemented on the off-theshelf DBMS and also utilize good features of DBMS maintenance.

  • PDF

Clustering of Web Objects with Similar Popularity Trends (유사한 인기도 추세를 갖는 웹 객체들의 클러스터링)

  • Loh, Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.485-494
    • /
    • 2008
  • Huge amounts of various web items such as keywords, images, and web pages are being made widely available on the Web. The popularities of such web items continuously change over time, and mining temporal patterns in popularities of web items is an important problem that is useful for several web applications. For example, the temporal patterns in popularities of search keywords help web search enterprises predict future popular keywords, enabling them to make price decisions when marketing search keywords to advertisers. However, presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in popularities of web items. We treat the popularities of web items as time-series, and propose gapmeasure to quantify the similarity between the popularities of two web items. To reduce the computation overhead for this measure, an efficient method using the Fast Fourier Transform (FFT) is presented. We assume that the popularities of web items are not necessarily following any probabilistic distribution or periodic. For finding clusters of web items with similar popularity trends, we propose to use a density-based clustering algorithm based on the gap measure. Our experiments using the popularity trends of search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.

Development of a Probabilistic Model for the Estimation of Yearly Workable Wave Condition Period for Offshore Operations - Centering on the Sea off the Ulsan Harbor (해상작업 가능기간 산정을 위한 확률모형 개발 - 울산항 전면 해역을 중심으로)

  • Choi, Se Ho;Cho, Yong Jun
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.31 no.3
    • /
    • pp.115-128
    • /
    • 2019
  • In this study, a probabilistic model for the estimation of yearly workable wave condition period for offshore operations is developed. In doing so, we first hindcast the significant wave heights and peak periods off the Ulsan every hour from 2003.1.1 to 2017.12.31 based on the meteorological data by JMA (Japan Meterological Agency) and NOAA (National Oceanic and Atmospheric Administration), and SWAN. Then, we proceed to derive the long term significant wave height distribution from the simulated time series using a least square method. It was shown that the agreements are more remarkable in the distribution in line with the Modified Glukhovskiy Distribution than in the three parameters Weibull distribution which has been preferred in the literature. In an effort to develop a more comprehensive probabilistic model for the estimation of yearly workable wave condition period for offshore operations, wave height distribution over the 15 years with individual waves occurring within the unit simulation period (1 hour) being fully taken into account is also derived based on the Borgman Convolution Integral. It is shown that the coefficients of the Modified Glukhovskiy distribution are $A_p=15.92$, $H_p=4.374m$, ${\kappa}_p=1.824$, and the yearly workable wave condition period for offshore work is estimated to be 319 days when a threshold wave height for offshore work is $H_S=1.5m$. In search of a way to validate the probabilistic model derived in this study, we also carry out the wave by wave analysis of the entire time series of numerically simulated significant wave heights over the 15 years to collect every duration periods of waves the height of which are surpassing the threshold height which has been reported to be $H_S=1.5m$ in the field practice in South Korea. It turns out that the average duration period is 45.5 days from 2003 to 2017, which is very close to 46 days from the probabilistic model derived in this study.

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

A Probabilistic Context Sensitive Rewriting Method for Effective Transliteration Variants Generation (효과적인 외래어 이형태 생성을 위한 확률 문맥 의존 치환 방법)

  • Lee, Jae-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.2
    • /
    • pp.73-83
    • /
    • 2007
  • An information retrieval system, using exact match, needs preprocessing or query expansion to generate transliteration variants in order to search foreign word transliteration variants in the documents. This paper proposes an effective method to generate other transliteration variants from a given transliteration. Because simple rewriting of confused characters produces too many false variants, the proposed method controls the generation priority by learning confusion patterns from real uses and calculating their probability. Especially, the left and right context of a pattern is considered, and local rewriting probability and global rewriting probability are calculated to produce more probable variants in earlier stage. The experimental result showed that the method was very effective by showing more than 80% recall with top 20 generations for a transliteration variants set collected from KT SET 2.0.

Sector Based Scanning and Adaptive Active Tracking of Multiple Objects

  • Cho, Shung-Han;Nam, Yun-Young;Hong, Sang-Jin;Cho, We-Duke
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.6
    • /
    • pp.1166-1191
    • /
    • 2011
  • This paper presents an adaptive active tracking system with sector based scanning for a single PTZ camera. Dividing sectors on an image reduces the search space to shorten selection time so that the system can cover many targets. Upon the selection of a target, the system estimates the target trajectory to predict the zooming location with a finite amount of time for camera movement. Advanced estimation techniques using probabilistic reason suffer from the unknown object dynamics and the inaccurate estimation compromises the zooming level to prevent tracking failure. The proposed system uses the simple piecewise estimation with a few frames to cope with fast moving objects and/or slow camera movements. The target is tracked in multiple steps and the zooming time for each step is determined by maximizing the zooming level within the expected variation of object velocity and detection. The number of zooming steps is adaptively determined according to target speed. In addition, the iterative estimation of a zooming location with camera movement time compensates for the target prediction error due to the difference between speeds of a target and a camera. The effectiveness of the proposed method is validated by simulations and real time experiments.