• Title/Summary/Keyword: retrieval method

Search Result 1,557, Processing Time 0.027 seconds

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

Vector Approximation Bitmap Indexing Method for High Dimensional Multimedia Database (고차원 멀티미디어 데이터 검색을 위한 벡터 근사 비트맵 색인 방법)

  • Park Joo-Hyoun;Son Dea-On;Nang Jong-Ho;Joo Bok-Gyu
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.455-462
    • /
    • 2006
  • Recently, the filtering approach using vector approximation such as VA-file[1] or LPC-file[2] have been proposed to support similarity search in high dimensional data space. This approach filters out many irrelevant vectors by calculating the approximate distance from a query vector using the compact approximations of vectors in database. Accordingly, the total elapsed time for similarity search is reduced because the disk I/O time is eliminated by reading the compact approximations instead of original vectors. However, the search time of the VA-file or LPC-file is not much lessened compared to the brute-force search because it requires a lot of computations for calculating the approximate distance. This paper proposes a new bitmap index structure in order to minimize the calculating time. To improve the calculating speed, a specific value of an object is saved in a bit pattern that shows a spatial position of the feature vector on a data space, and the calculation for a distance between objects is performed by the XOR bit calculation that is much faster than the real vector calculation. According to the experiment, the method that this paper suggests has shortened the total searching time to the extent of about one fourth of the sequential searching time, and to the utmost two times of the existing methods by shortening the great deal of calculating time, although this method has a longer data reading time compared to the existing vector approximation based approach. Consequently, it can be confirmed that we can improve even more the searching performance by shortening the calculating time for filtering of the existing vector approximation methods when the database speed is fast enough.

Cancellation of Motion Artifact in MRI (MRI에 있어서 체동 아티팩트의 제거)

  • Kim, Eung-Kyeu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.37 no.3
    • /
    • pp.70-78
    • /
    • 2000
  • In this study, a new method for canceling MRI artifacts through the motion translation of image plane is presented Breathing often makes problems in a clinical diagnosis. Assuming that the head moves up and down due to breathing, rigid translational motions in only y(phase encoding axis) direction are treated Unlike the conventional Iterative phase retrieval algorithm, this method is based on the MRI imaging process and analyzing of Image property A new constraint condition with which the motion component and the true image component in the MRI signal can be separated by a simple algebraic operation is extracted After the x(read out) directional Fourier transformation of MRI signal is done, the y(phase encoding) directional spectrum phasing value is Just an algebraic sum of the Image component and the motion component Meanwhile, as It is known that the density of subcutaneous fat area is almost uniform in the head tomographs, the density distribution along a y directional line on this fat area is regarded as symmetric shape If the density function is symmetric, then the phase of spectrum changes linearly with the position Hence, the departure component from the linear function can be separated as the motion component Based on this constrant condition, the new method of artifact cancellation is presented Finally, the effectiveness of this algorithm IS shown by using a phantom with simulated motions.

  • PDF

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

An Interconnection Method for Streaming Framework and Multimedia Database (스트리밍 프레임워크와 멀티미디어 데이타베이스와의 연동기법)

  • Lee, Jae-Wook;Lee, Sung-Young;Lee, Jong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.7
    • /
    • pp.436-449
    • /
    • 2002
  • This paper describes on our experience of developing the Database Connector as an interconnection method between multimedia database, and the streaming framework. It is possible to support diverse and mature multimedia database services such as retrieval and join operation during the streaming if an interconnection method is provided in between streaming system and multimedia databases. The currently available interconnection schemes, however have mainly used the file systems or the relational databases that are Implemented with separated form of meta data, which deafs with information of multimedia contents, and streaming data which deals with multimedia data itself. Consequently, existing interconnection mechanisms could not come up with many virtues of multimedia database services during the streaming operation. In order to resolve these drawbacks, we propose a novel scheme for an interconnection between streaming framework and multimedia database, called the Inter-Process Communication (IPC) based Database connector, under the assumption that two systems are located in a same host. We define four transaction primitives; Read, Write, Find, Play, as well as define the interface for transactions that are implemented based on the plug-in, which in consequence can extend to other multimedia databases that will come for some later years. Our simulation study show that performance of the proposed IPC based interconnection scheme is not much far behind compared with that of file systems.

Study on the experience of battered women maintaining non-violent marriage relationship -based on battered women using formal protective system- (가정폭력피해여성의 비폭력적 결혼관계 유지 경험에 관한 연구 - 공식적 보호체계 이용경험이 있는 피해여성을 중심으로 -)

  • Kim, Ju-Hyun;Lee, Yun-Ho
    • Korean Journal of Family Social Work
    • /
    • no.23
    • /
    • pp.5-42
    • /
    • 2008
  • This research utilized the Grounded Theory and Giorgi's phenomenological research method to analyze the experience of battered women who are maintaining a non-violent marriage relationship. Due to the low accessibility of subjects and the distinctiveness of the topic, I have selected 4 participants and conducted in-depth interviews. As for data analysis results, there were 5 superior component elements and 19 inferior component elements for the experience of maintaining a non-violent marriage relationship and these appeared according to time. Thus, it can be analyzed as follows: 'the vicious cycle stage of violence: caught in a trap,' 'stage of breakaway from vicious violence cycle: making the self-rescue measures,' 'entry to a new track: retrieval of autonomy,' 'stage of forming a non-violence track: preventing the recurrence of the violence,' 'stage of maintaining non-violent track: conversion to non-violent relationship.' This research result will be useful in seeking an effective social welfare intervention plan for successful non-violent relationship in order to help 50% of the battered women from family violence who wish to maintain marriage.

Follicular Stimulation and Laparoscopic Ovum Pick-up (LOPU) in Repeatedly Superovulated Korean Black Goats (반복적인 과배란 처치 경험이 있는 한국흑염소에서 난포 자극 및 복강경을 이용한 난포란 채취(LOPU))

  • Lee, Yong-Boum;Lee, Doo-Soo;Cho, Sang-Cheol;Shin, Sang Tae
    • Journal of Embryo Transfer
    • /
    • v.29 no.1
    • /
    • pp.35-41
    • /
    • 2014
  • Laparoscopic ovum pick-up (LOPU) is a convenient method for collecting oocytes in small ruminants. LOPU has the advantage of being a less invasive means of oocyte collection, thereby allowing for a repeated usage of the oocyte donor animals. A total of 25 Korean black goats were used in the winter season (December to February) and LOPU was applied to the goats which had been treated for superovulation more than two times during the last twelve months. Estrus was synchronized with an intravaginal insert containing 0.3 g progesterone for 10 to 12 days. Ovaries were hyperstimulated with eCG 1,000 IU oneshot, FSH with eCG (50 mg / 1,000 IU; 70 mg / 500 IU; 70 mg / 1,000 IU) oneshot or FSH multiple-shot with eCG oneshot ($20mg{\times}6/300IU$) given intramuscularly 72 h prior to LOPU. For these groups, the number of follicles (mean ${\pm}$ SEM) observed which developed to larger than 2 mm in diameter were $1.6{\pm}2.5$, $4.3{\pm}3.1$, $5.5{\pm}4.2$, $6.6{\pm}2.1$ and $8.8{\pm}7.8$, respectively. Oocytes were aspirated by using OPU needles and a vacuum pump. The overall oocyte retrieval rates were 41.4%. Oocytes were matured in TCM-199 supplemented with 10% (w/v) bovine serum albumin + $10{\mu}g/ml$ FSH + $1{\mu}g/ml$ $17{\beta}$-estradiol for 27 h at $39^{\circ}C$ in 5% $CO_2$ in air. Oocytes were parthenogenetically activated by ionomycin combined with 6-diethylaminopurine (6-DMAP). Total oocyte maturation and cleavage rate were 67.3% and 78.8%, respectively. In summary, LOPU is a useful oocyte collection method in Korean black goats that can provide immature oocytes for transgenesis or nuclear transfer.

Crepe Search System Design using Web Crawling (웹 크롤링 이용한 크레페 검색 시스템 설계)

  • Kim, Hyo-Jong;Han, Kun-Hee;Shin, Seung-Soo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.261-269
    • /
    • 2017
  • The purpose of this paper is to provide a search system using a method of accessing the web in real time without using a database server in order to guarantee the up-to-date information in a single network, rather than using a plurality of bots connected by a wide area network Design. The method of the research is to design and analyze the system which can search the person and keyword quickly and accurately in crepe system. In the crepe server, when the user registers information, the body tag matching conversion process stores all the information as it is, since various styles are applied to each user, such as a font, a font size, and a color. The crepe server does not cause a problem of body tag matching. However, when executing the crepe retrieval system, the style and characteristics of users can not be formalized. This problem can be solved by using the html_img_parser function and the Go language html parser package. By applying queues and multiple threads to a general-purpose web crawler, rather than a web crawler design that targets a specific site, it is possible to utilize a multiplier that quickly and efficiently searches and collects various web sites in various applications.

Retrieval of Relative Surface Temperature from Single-channel Middle-infrared (MIR) Images (단일밴드 중적외선 영상으로부터 표면온도 추정을 위한 상대온도추정알고리즘의 연구)

  • Wook, Park;Won, Joong-Sun;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.29 no.1
    • /
    • pp.95-104
    • /
    • 2013
  • In this study, a novel method is proposed for retrieving relative surface temperature from single-channel middle infra-red (MIR, 3-5 ${\mu}m$) remotely sensed data. In order to retrieve absolute temperature from MIR data, it is necessary to accommodate at least atmospheric effects, surface emissivity and reflected solar radiance. Instead of retrieving kinematic temperature of each target, we propose an alternative to retrieve the relative temperature between two targets. The core idea is to minimize atmospheric effects by assuming that the differential at-sensor radiance between two targets experiences the same atmospheric effects. To reduce effective simplify atmospheric parameters, each atmospheric parameter was examined by MODTRAN and MIR emissivity derived from ASTER spectral libraries. Simulation results provided a required accuracy of 2 K for materials with a temperature of 300 K within 0.1 emissivity errors. The algorithm was tested using MODIS band 23 MIR day time images for validation. The accuracy of retrieved relative temperature was $0.485{\pm}1.552$ K. The results demonstrated that the proposed algorithm was able to produce relative temperature with a required accuracy from only single-channel radiance data. However, this method has limitations when applied to materials having very low temperatures using day time MIR images.