Search | Korea Science

Methodology for Overcoming the Problem of Position Embedding Length Limitation in Pre-training Models (사전 학습 모델의 위치 임베딩 길이 제한 문제를 극복하기 위한 방법론)

Minsu Jeong;Tak-Sung Heo;Juhwan Lee;Jisu Kim;Kyounguk Lee;Kyungsun Kim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.463-467
- /
- 2023
사전 학습 모델을 특정 데이터에 미세 조정할 때, 최대 길이는 사전 학습에 사용한 최대 길이 파라미터를 그대로 사용해야 한다. 이는 상대적으로 긴 시퀀스의 처리를 요구하는 일부 작업에서 단점으로 작용한다. 본 연구는 상대적으로 긴 시퀀스의 처리를 요구하는 질의 응답(Question Answering, QA) 작업에서 사전 학습 모델을 활용할 때 발생하는 시퀀스 길이 제한에 따른 성능 저하 문제를 극복하는 방법론을 제시한다. KorQuAD v1.0과 AIHub에서 확보한 데이터셋 4종에 대하여 BERT와 RoBERTa를 이용해 성능을 검증하였으며, 실험 결과, 평균적으로 길이가 긴 문서를 보유한 데이터에 대해 성능이 향상됨을 확인할 수 있었다.
PDF

Cryptographic synchronization signal generation method using maximal length sequence (최대길이 시퀀스를 이용한 암호동기신호 생성 기법)

Son, Young-ho;Bae, Keun-sung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.7
- /
- pp.1401-1410
- /
- 2017
Cryptographic synchronization which synchronizes internal state of cryptographic algorithm and ciphertext stream between an encryptor and a decryptor affects the quality of secure communication. If there happens a synchronization loss between a transmitter and a receiver in a secure communication, the output of the receiver is unintelligible until resynchronization is made. Especially, in the secure communication on a wireless channel with high BER, synchronization performance can dominate its quality. In this paper, we proposed a novel and noise robust synchronization signal generation method as well as its detection algorithm. We generated a synchronization signal in the form of a masking structure based on the maximal length sequence, and developed a detection algorithm using a correlation property of the maximal length sequence. Experimental results have demonstrated that the proposed synchronization signal outperforms the conventional concatenated type synchronization signal in a noisy environment.
https://doi.org/10.6109/jkiice.2017.21.7.1401 인용 PDF KSCI

Measurement of Travel Time Using Sequence Pattern of Vehicles (차종 시퀀스 패턴을 이용한 구간통행시간 계측)

Lim, Joong-Seon;Choi, Gyung-Hyun;Oh, Kyu-Sam;Park, Jong-Hun
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.7 no.5
- /
- pp.53-63
- /
- 2008
In this paper, we propose the regional travel time measurement algorithm using the sequence pattern matching to the type of vehicles between the origin of the region and the end of the region, that could be able to overcome the limit of conventional method such as Probe Car Method or AVI Method by License Plate Recognition. This algorithm recognizes the vehicles as a sequence group with a definite length, and measures the regional travel time by searching the sequence of the origin which is the most highly similar to the sequence of the end. According to the assumption of similarity cost function, there are proposed three types of algorithm, and it will be able to estimate the average travel time that is the most adequate to the information providing period by eliminating the abnormal value caused by inflow and outflow of vehicles. In the result of computer simulation by the length of region, the number of passing cars, the length of sequence, and the average maximum error rate are measured within 3.46%, which means that this algorithm is verified for its superior performance.
PDF

A Single Index Approach for Subsequence Matching that Supports Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 단일 색인을 사용한 정규화 변환 지원 서브시퀀스 매칭)

Moon Yang-Sae;Kim Jin-Ho;Loh Woong-Kee
- The KIPS Transactions:PartD
- /
- v.13D no.4 s.107
- /
- pp.513-524
- /
- 2006
Normalization transform is very useful for finding the overall trend of the time-series data since it enables finding sequences with similar fluctuation patterns. The previous subsequence matching method with normalization transform, however, would incur index overhead both in storage space and in update maintenance since it should build multiple indexes for supporting arbitrary length of query sequences. To solve this problem, we propose a single index approach for the normalization transformed subsequence matching that supports arbitrary length of query sequences. For the single index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. The inclusion-normalization transform normalizes a window by using the mean and the standard deviation of a subsequence that includes the window. Next, we formally prove correctness of the proposed method that uses the inclusion-normalization transform for the normalization transformed subsequence matching. We then propose subsequence matching and index building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to $2.5{\sim}2.8$ times over the previous method. Our approach has an additional advantage of being generalized to support many sorts of other transforms as well as normalization transform. Therefore, we believe our work will be widely used in many sorts of transform-based subsequence matching methods.
https://doi.org/10.3745/KIPSTD.2006.13D.4.513 인용 PDF KSCI

Generalization of Window Construction for Subsequence Matching in Time-Series Databases (시계열 데이터베이스에서의 서브시퀀스 매칭을 위한 윈도우 구성의 일반화)

Moon, Yang-Sae;Han, Wook-Shin;Whang, Kyu-Young
- Journal of KIISE:Databases
- /
- v.28 no.3
- /
- pp.357-372
- /
- 2001
In this paper, we present the concept of generalization in constructing windows for subsequence matching and propose a new subsequence matching method. GeneralMatch, based on the generalization. The earlier work of Faloutsos et al.(FRM in short) causes a lot of false alarms due to lack of the point-filtering effect. DualMatch, which has been proposed by the authors, improves performance significantly over FRM by exploiting the point filtering effect, but it has the problem of having a smaller maximum window size (half that FRM) given the minimum query length. GeneralMatch, an improvement of DualMatch, offers advantages of both methods: it can use large windows like FRM and, at the same time, can exploit the point-filtering effect like DualMatch. GeneralMatch divides data sequences into J-sliding windows (generalized sliding windows) and the query sequence into J-disjoint windows (generalized disjoint windows). We formally prove that our GeneralMatch is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of GeneralMatch. We then propose a method of determining the value of J that minimizes the number of page accesses, Experimental results for real stock data show that, for low selectivities ($10^{-6}~10^{-4}$), GeneralMatch improves performance by 114% over DualMatch and by 998% iver FRM on the average; for high selectivities ($10^{-6}~10^{-4}$), by 46% over DualMatch and by 65% over FRM on the average.
PDF

PRML detection using the patterns of run-length limited codes (런-길이 제한 코드의 패턴을 이용한 PRML 검출 방법)

Lee Joo hyun;Lee Jae jin
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.3C
- /
- pp.77-82
- /
- 2005
Partial response maximum likelihood (PRML) detection using the Viterbi algorithm involves the calculation of likelihood metrics that determine the most likely sequence of decoded data. In general, it is assumed that branches at each node in the trellis diagram have same probabilities. If modulation code with minimum and maximum run-length constraints is used, the occurrence ratio (Ro) of each particular pattern is different, and therefore the assumption is not true. We present a calculation scheme of the likelihood metrics for the PRML detection using the occurrence ratio. In simulation, we have tested the two (1,7) run-length-limited codes and calculated the occurrence ratios as the orders of PR targets are changed. We can identify that the PRML detections using the occurrence ratio provide more than about 0.5dB gain compared to conventional PRML detections at 10/sup -5/ BER in high-density magnetic recording and optical recording channels.
PDF KSCI

Multipath Diversity Reception of Noncoherent FSK DS/SSMA Communications (Noncoherent FSK DS/SSMA 통신의 다중 경로 다이버시티 수신 특성)

안재영;이재경;황금찬
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.16 no.7
- /
- pp.663-679
- /
- 1991
본 논문에서는 다중 경로 페이딩 채널에서 최대 다중 경로 지연폭이 한 비트 폭보다 큰 경우 발생할 수 있는 심볼간 간섭을 극복하기 위해 M-ary 신호 방식과 절환 신호 방식을 채용한 다중 경로 다이버시티 수신 noncoherent F나 ds / SSMA 통신 시스템의 평균 오율을 평가하였다. 시스템의 평균 오율은 가우스 근사법을 이용해 채널 파라메타와 PN 시퀀스의 길이와 같은 시스템 파라메타에 대한 식으로 표현하였고, 이러한 결과식을 이용해 M-ary FSK 시스템과 두 종류의 절환 수신기에 대한 FSK시스템의 평균 오율을 수치적으로 분석하였다.
PDF

Low Complexity Noise Predictive Maximum Likelihood Detection Method for High Density Perpendicular Magnetic Recording: (고밀도 수직자기기록을 위한 저복잡도 잡음 예측 최대 유사도 검출 방법)

김성환;이주현;이재진
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.27 no.6A
- /
- pp.562-567
- /
- 2002
Noise predictive maximum likelihood(NPML) detector embeds noise predictions/ whitening process in branch metric calculation of Viterbi detector and improves the reliability of branch metric computation. Therefore, PRML detector with a noise predictor achieves some performance improvement and has an advantage of low complexity. This paper shows that NP(1221)ML system through noise predictive PR-equalized signal has less complexity and better performance than high order PR(12321)ML system in high density perpendicular magnetic recording. The simulation results are evaluated using (1) random sequence and (2) run length limited (1,7) sequence, and they are applied to linear channel and nonlinear channel with normalized linear density $1.0{\leq}K_p{\leq}3.0$.
PDF KSCI

An X-masking Scheme for Logic Built-In Self-Test Using a Phase-Shifting Network (위상천이 네트워크를 사용한 X-마스크 기법)

Song, Dong-Sup;Kang, Sung-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.2
- /
- pp.127-138
- /
- 2007
In this paper, we propose a new X-masking scheme for utilizing logic built-in self-test The new scheme exploits the phase-shifting network which is based on the shift-and-add property of maximum length pseudorandom binary sequences(m-sequences). The phase-shifting network generates mask-patterns to multiple scan chains by appropriately shifting the m-sequence of an LFSR. The number of shifts required to generate each scan chain mask pattern can be dynamically reconfigured during a test session. An iterative simulation procedure to synthesize the phase-shifting network is proposed. Because the number of candidates for phase-shifting that can generate a scan chain mask pattern are very large, the proposed X-masking scheme reduce the hardware overhead efficiently. Experimental results demonstrate that the proposed X-masking technique requires less storage and hardware overhead with the conventional methods.
PDF KSCI

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

Kang, Tae-Ho;Yoo, Jae-Soo
- The KIPS Transactions:PartD
- /
- v.15D no.2
- /
- pp.155-162
- /
- 2008
Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
https://doi.org/10.3745/KIPSTD.2008.15-D.2.155 인용 PDF KSCI

Search Result 14, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)