• Title/Summary/Keyword: sequence analyzer

Search Result 57, Processing Time 0.023 seconds

Syllable-based POS Tagging without Korean Morphological Analysis (형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅)

  • Shim, Kwang-Seob
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.3
    • /
    • pp.327-345
    • /
    • 2011
  • In this paper, a new approach to Korean POS (Part-of-Speech) tagging is proposed. In previous works, a Korean POS tagger was regarded as a post-processor of a morphological analyzer, and as such a tagger was used to determine the most likely morpheme/POS sequence from morphological analysis. In the proposed approach, however, the POS tagger is supposed to generate the most likely morpheme and POS pair sequence directly from the given sentences. 398,632 eojeol POS-tagged corpus and 33,467 eojeol test data are used for training and evaluation, respectively. The proposed approach shows 96.31% of POS tagging accuracy.

  • PDF

Robust Planar Shape Recognition Using Spectrum Analyzer and Fuzzy ARTMAP (스펙트럼 분석기와 퍼지 ARTMAP 신경회로망을 이용한 Robust Planar Shape 인식)

  • 한수환
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.7 no.2
    • /
    • pp.34-42
    • /
    • 1997
  • This paper deals with the recognition of closed planar shape using a three dimensional spectral feature vector which is derived from the FFT(Fast Fourier Transform) spectrum of contour sequence and fuzzy ARTMAP neural network classifier. Contour sequences obtained from 2-D planar images represent the Euclidean distance between the centroid and all boundary pixels of the shape, and are related to the overall shape of the images. The Fourier transform of contour sequence and spectrum analyzer are used as a means of feature selection and data reduction. The three dimensional spectral feature vectors are extracted by spectrum analyzer from the FFT spectrum. These spectral feature vectors are invariant to shape translation, rotation and scale transformation. The fuzzy ARTMAP neural network which is combined with two fuzzy ART modules is trained and tested with these feature vectors. The experiments including 4 aircrafts and 4 industrial parts recognition process are presented to illustrate the high performance of this proposed method in the recognition problems of noisy shapes.

  • PDF

Transmitter Identification Signal Analyzer (송신기 식별 신호분석기)

  • Park, Sung-Ik;Lee, Jae-Young;Kim, Heung-Mook;Oh, Wang-Rok
    • Journal of Broadcast Engineering
    • /
    • v.13 no.3
    • /
    • pp.350-364
    • /
    • 2008
  • Single frequency network (SFN) design based on the Advanced Television Systems Committee (ATSC) specification, a terrestrial digital television (DTV) system, normally causes a interference problem, among signals from multiple transmitters or repeaters. To solve this, the ATSC recommended practice (RP) introduces a transmitter identification (TxID) signal embedded in a signal from each transmitter or repeater. A TxID signal analyzer is then used to detect the TxID signal, and following the analysis results, a SFN design can be adjusted. This paper discusses the generation and usages of Kasami sequence, is used the TxID signal. The configuration of the TxID signal analyzer to efficiently detect TxID signal is proposed and the results of theoretical performance analysis are provided. Moreover, computer simulation and laboratory test results are provided to evaluate the performance of TxID signal analyzer and the theoretical performance analysis.

Single Nucleotide Polymorphism Marker Discovery from Transcriptome Sequencing for Marker-assisted Backcrossing in Capsicum

  • Kang, Jin-Ho;Yang, Hee-Bum;Jeong, Hyeon-Seok;Choe, Phillip;Kwon, Jin-Kyung;Kang, Byoung-Cheorl
    • Horticultural Science & Technology
    • /
    • v.32 no.4
    • /
    • pp.535-543
    • /
    • 2014
  • Backcross breeding is the method most commonly used to introgress new traits into elite lines. Conventional backcross breeding requires at least 4-5 generations to recover the genomic background of the recurrent parent. Marker-assisted backcrossing (MABC) represents a new breeding approach that can substantially reduce breeding time and cost. For successful MABC, highly polymorphic markers with known positions in each chromosome are essential. Single nucleotide polymorphism (SNP) markers have many advantages over other marker systems for MABC due to their high abundance and amenability to genotyping automation. To facilitate MABC in hot pepper (Capsicum annuum), we utilized expressed sequence tags (ESTs) to develop SNP markers in this study. For SNP identification, we used Bukang $F_1$-hybrid pepper ESTs to prepare a reference sequence through de novo assembly. We performed large-scale transcriptome sequencing of eight accessions using the Illumina Genome Analyzer (IGA) IIx platform by Solexa, which generated small sequence fragments of about 90-100 bp. By aligning each contig to the reference sequence, 58,151 SNPs were identified. After filtering for polymorphism, segregation ratio, and lack of proximity to other SNPS or exon/intron boundaries, a total of 1,910 putative SNPs were chosen and positioned to a pepper linkage map. We further selected 412 SNPs evenly distributed on each chromosome and primers were designed for high throughput SNP assays and tested using a genetic diversity panel of 27 Capsicum accessions. The SNP markers clearly distinguished each accession. These results suggest that the SNP marker set developed in this study will be valuable for MABC, genetic mapping, and comparative genome analysis.

Eliminating Redundant Alarms of Buffer Overflow Analysis Using Context Refinements (분석 문맥 조절 기법을 이용한 버퍼 오버플로우 분석의 중복 경보 제거)

  • Kim, You-Il;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.12
    • /
    • pp.942-945
    • /
    • 2010
  • In order to reduce the efforts to inspect the reported alarms from a static buffer overflow analyzer, we present an effective method to filter out redundant alarms. In the static analysis, a sequence of multiple alarms are frequently found due to the same cause in the code. In such a case, it is sufficient and reasonable for programmers to examine the first alarm instead of the entire alarms in the same sequence. Based on this observation, we devise a buffer overflow analysis that filters out redundant alarms with our context refinement technique. Our experiment with several open source programs shows that our method reduces the reported alarms by 23% on average.

A Study for Development of a Marine Diesel Engine from a 500Ps Commercial Vehicle Diesel Engine (500Ps급 상용차량 디젤엔진을 이용한 선박용 디젤엔진 개발 연구)

  • Sim, Han-Sub
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.12 no.6
    • /
    • pp.125-131
    • /
    • 2013
  • This study was carried out to develop a diesel engine for marine propulsion. This marine diesel engine was developed based on a 500Ps vehicle diesel engine. Many main parts, such as the intercooler, radiator, and engine controller were designed for the marine diesel engine. The intercooler was designed to be of sea water cooling type; inlet air is cooled by sea water. Engine coolant is cooled by sea water in the radiator too. The water cooling heat exchanger has high cooling performance. In the cooling system, consists of the intercooler and the radiator, the sea water passes through the intercooler and then the radiator, in sequence. This process is very effective compared to the reverse method in which sea water passes through the radiator and then the intercooler, in sequence. The control performance of the engine controller and the fuel injection rate were improved using an engine speed controller. This system was tested on an engine dynamometer and an exhaust gas analyzer using the marine diesel engine test method. Test results show that the 500Ps marine diesel engine satisfied the IMO NOx regulations; Tier II.

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

Null Allele in the D18S51 Locus Responsible for False Homozygosities and Discrepancies in Forensic STR Analysis

  • Eom, Yong-Bin
    • Biomedical Science Letters
    • /
    • v.17 no.2
    • /
    • pp.151-155
    • /
    • 2011
  • Short tandem repeats (STRs) loci are the genetic markers used for forensic human identity test. With multiplex polymerase chain reaction (PCR) assays, STRs are examined and measured PCR product length relative to sequenced allelic ladders. In the repeat region and the flanking region of the commonly-used STR may have DNA sequence variation. A mismatch due to sequence variation in the DNA template may cause allele drop-out (i.e., a "null" or "silent" allele) when it falls within PCR primer binding sites. The STR markers were co-amplified in a single reaction by using commercial PowerPlex$^{(R)}$ 16 system and AmpFlSTR$^{(R)}$ Identifiler$^{(R)}$ PCR amplification kits. Separation of the PCR products and fluorescence detection were performed by ABI PRISM$^{(R)}$ 3100 Genetic Analyzer with capillary electrophoresis. The GeneMapper$^{TM}$ ID software were used for size calling and analysis of STR profiles. Here, this study described a forensic human identity test in which allelic drop-out occurred in the STR system D18S51. During the course of human identity test, two samples with a homozygous (16, 16 and 21, 21) genotype at D18S51 locus were discovered using the PowerPlex$^{(R)}$ 16 system. The loss of alleles was confirmed when the samples were amplified using AmpFlSTR$^{(R)}$ Identifiler$^{(R)}$ PCR amplification kit and resulted in a heterozygous (16, 20 and 20, 21) genotype at this locus each other. This discrepancy results suggest that appropriate measures should be taken for database comparisons and that allele should be further investigated by sequence analysis and be reported to the forensic community.

Genetic Diversity Analysis of Wood-cultivated Ginseng using Simple Sequence Repeat Markers (SSR 마커를 이용한 산양삼의 유전적 다양성 분석)

  • Gil, Jinsu;Um, Yurry;Byun, Jae Kyung;Chung, Jong Wook;Lee, Yi;Chung, Chan Moon
    • Korean Journal of Medicinal Crop Science
    • /
    • v.25 no.6
    • /
    • pp.389-396
    • /
    • 2017
  • Background: Panax ginseng C. A. Meyer is wood-cultivated ginseng (WCG) in Korea which depends on an artificial forest growth method. To produce this type of ginseng, various P. ginseng cultivars can be used. To obtain a WCG similar to wild ginseng (WG), this method is usually performed in a mountain using seeds or seedlings of cultivated ginseng (CG) and WG. Recently, the WCG industry is suffering a problem in that Panax notoginseng (Burk.) F. H. Chen or Panax quinquefolium L. are being sold as WCG Korean market; These morphological similarities have created confusion among customers. Methods and Results: WCG samples were collected from five areas in Korea. After polymerase chain reaction (PCR) amplification using the primer pair labeled with fluorescence dye (FAM, NED, PET, or VIC), fragment analysis were performed. PCR products were separated by capillary electrophoresis with an ABI 3730 DNA analyzer. From the results, WCG cultivated in Korea showed very diverse genetic background. Conclusions: In this study, we tried to develop a method to discriminate between WCG, P. notoginseng or P. quinquefolium using simple sequence repeat (SSR) markers. Furthermore, we analyzed the genetic diversity of WCG collected from five cultivation areas in Korea.

Fragment Combination From DNA Sequence Data Using Fuzzy Reasoning Method (퍼지 추론기법을 이용한 DNA 염기 서열의 단편결합)

  • Kim, Kwang-Baek;Park, Hyun-Jung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2329-2334
    • /
    • 2006
  • In this paper, we proposed a method complementing failure of combining DNA fragments, defect of conventional contig assembly programs. In the proposed method, very long DNA sequence data are made into a prototype of fragment of about 700 bases that can be analyzed by automatic sequence analyzer at one time, and then matching ratio is calculated by comparing a standard prototype with 3 fragmented clones of about 700 bases generated by the PCR method. In this process, the time for calculation of matching ratio is reduced by Compute Agreement algorithm. Two candidates of combined fragments of every prototype are extracted by the degree of overlapping of calculated fragment pairs, and then degree of combination is decided using a fuzzy reasoning method that utilizes the matching ratios of each extracted fragment, and A, C, G, T membership degrees of each DNA sequence, and previous frequencies of each A, C, G, T. In this paper. DNA sequence combination is completed by the iteration of the process to combine decided optimal test fragments until no fragment remains. For the experiments, fragments or about 700 bases were generated from each sequence of 10,000 bases and 100,000 bases extracted from 'PCC6803', complete protein genome. From the experiments by applying random notations on these fragments, we could see that the proposed method was faster than FAP program, and combination failure, defect of conventional contig assembly programs, did not occur.