Search | Korea Science

Comparing Byte Pair Encoding Methods for Korean (음절 단위 및 자모 단위의 Byte Pair Encoding 비교 연구)

Lee, Chanhee;Lee, Dongyub;Hur, YunA;Yang, Kisu;Lim, Heuiseok
- Annual Conference on Human and Language Technology
- /
- 2018.10a
- /
- pp.291-295
- /
- 2018
한국어는 교착어적 특성이 강한 언어로, 교착어적 특성이 없는 영어 등의 언어와 달리 형태소의 수에 따라 조합 가능한 어절의 수가 매우 많으므로 어절 단위의 처리가 매우 어렵다. 따라서 어절을 더 작은 단위로 분해하는 전처리 단계가 요구되는데, 형태소 분석이 이를 위해 주로 사용되었다. 하지만 지도학습 방법을 이용한 형태소 분석 시스템은 다량의 학습 데이터가 요구되고, 비지도학습 방법을 이용한 형태소 분석은 성능에 큰 하락을 보인다. Byte Pair Encoding은 데이터를 압축하는 알고리즘으로, 이를 자연어처리 분야에 응용하면 비지도학습 방법으로 어절을 더 작은 단위로 분해할 수 있다. 본 연구에서는 한국어에 Byte Pair Encoding을 적용하는 두 가지 방법인 음절 단위 처리와 자모 단위 처리의 성능 및 특성을 정량적, 정성적으로 분석하는 방법을 제안하였다. 또한, 이 방법을 세종 말뭉치에 적용하여 각각의 알고리즘을 이용한 어절 분해를 실험하고, 그 결과를 어절 분해 정확도, 편향, 편차를 바탕으로 비교, 분석하였다.
PDF

Improvement of multi layer perceptron performance using combination of gradient descent and harmony search for prediction of groundwater level (지하수위 예측을 위한 경사하강법과 화음탐색법의 결합을 이용한 다층퍼셉트론 성능향상)

Lee, Won Jin;Lee, Eui Hoon
- Proceedings of the Korea Water Resources Association Conference
- /
- 2022.05a
- /
- pp.186-186
- /
- 2022
강수 및 침투 등으로 발생하는 지하수위의 변동을 예측하는 것은 지하수 자원의 활용 및 관리에 필수적이다. 지하수위의 변동은 지하수 자원의 활용 및 관리뿐만이 아닌 홍수 발생과 지반의 응력상태 등에 직접적인 영향을 미치기 때문에 정확한 예측이 필요하다. 본 연구는 인공신경망 중 다층퍼셉트론(Multi Layer Perceptron, MLP)을 이용한 지하수위 예측성능 향상을 위해 MLP의 구조 중 Optimizer를 개량하였다. MLP는 입력자료와 출력자료간 최적의 상관관계(가중치 및 편향)를 찾는 Optimizer와 출력되는 값을 결정하는 활성화 함수의 연산을 반복하여 학습한다. 특히 Optimizer는 신경망의 출력값과 관측값의 오차가 최소가 되는 상관관계를 찾는 연산자로써 MLP의 학습 및 예측성능에 직접적인 영향을 미친다. 기존의 Optimizer는 경사하강법(Gradient Descent, GD)을 기반으로 하는 Optimizer를 사용했다. 하지만 기존의 Optimizer는 미분을 이용하여 상관관계를 찾기 때문에 지역탐색 위주로 진행되며 기존에 생성된 상관관계를 저장하는 구조가 없어 지역 최적해로 수렴할 가능성이 있다는 단점이 있다. 본 연구에서는 기존 Optimizer의 단점을 개선하기 위해 지역탐색과 전역탐색을 동시에 고려할 수 있으며 기존의 해를 저장하는 구조가 있는 메타휴리스틱 최적화 알고리즘을 이용하였다. 메타휴리스틱 최적화 알고리즘 중 구조가 간단한 화음탐색법(Harmony Search, HS)과 GD의 결합모형(HS-GD)을 MLP의 Optimizer로 사용하여 기존 Optimizer의 단점을 개선하였다. HS-GD를 이용한 MLP의 성능검토를 위해 이천시 지하수위 예측을 실시하였으며 예측 결과를 기존의 Optimizer를 이용한 MLP 및 HS를 이용한 MLP의 예측결과와 비교하였다.
PDF

Performance Improvement of Adaptive Hierarchical Hexagon Search by Extending the Search Patterns (탐색 패턴 확장에 의한 적응형 계층 육각 탐색의 성능 개선)

Kwak, No-Yoon
- Journal of Digital Contents Society
- /
- v.9 no.2
- /
- pp.305-315
- /
- 2008
Pre-proposed AHHS(Adaptive Hierarchical Hexagon Search) is a kind of the fast hierarchical block matching algorithm based on the AHS(Adaptive Hexagon Search). It is characterized as keeping the merits of the AHS capable of fast estimating motion vectors and also adaptively reducing the local minima often occurred in the video sequences with higher spatio-temporal motion activity. The objective of this paper is to propose the method effectively extending the horizontal biased pattern and the vertical biased pattern of the AHHS to improve its predictive image quality. In the paper, based on computer simulation results for multiple video sequences with different motion characteristics, the performance of the proposed method was analysed and assessed in terms of the predictive image quality and the computational time. The simulation results indicated that the proposed method was both suitable for (quasi-) stationary and large motion searches. While the proposed method increased the computational load on the process extending the hexagon search patterns, it could improve the predictive image quality so as to cancel out the increase of the computational load.
PDF

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

Choi, Hyun-Hwa;Lee, Kyu-Chul
- The KIPS Transactions:PartD
- /
- v.19D no.2
- /
- pp.151-160
- /
- 2012
Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.
https://doi.org/10.3745/KIPSTD.2012.19D.2.151 인용 PDF KSCI

Multidimensional Ring-Delta Network: A High-Performance Fault-Tolerant Switching Networks (다차원 링-델타 망: 고성능 고장감내 스위칭 망)

Park, Jae-Hyun
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.1B
- /
- pp.1-7
- /
- 2010
In this paper, a high-performance fault-tolerant switching network using a deflection self-routing was proposed. From an abstract algebraic analysis of the topological properties of the Delta network, which is a baseline switching network, we derive the Multidimensional Ring-Delta network: a multipath switching network using a deflection self-routing algorithm. All of the links including already existing links of the Delta network are used to provide the alternate paths detouring faulty/congested links. We ran a simulation analysis under the traffic loads having the non-uniform address distributions that are usual in Internet. The throughput of $1024\;{\times}\;1024$ switching network proposed is better than that of the 2D ring-Banyan network by 13.3 %, when the input traffic load is 1.0 and the hot ratio is 0.9. The reliability of $64\;{\times}\;64$ switching network proposed is better than that of the 2D ring-Banyan network by 46.6%.
PDF KSCI

A Tuning Algorithm for the Multidimensional Type Inheritance Index of XML Databases (XML 데이터베이스 다차원 타입상속 색인구조의 조율 알고리즘)

Lee, Jong-Hak
- Journal of Korea Multimedia Society
- /
- v.14 no.2
- /
- pp.269-281
- /
- 2011
For the MD-TIX(multidimensional type inheritance index) that supports query processing for the type inheritance concept in XML databases, this paper presents an index tuning algorithm that enhances the performance of the XML query processing according to the query pattern. The MD-TIX uses a multidimensional index structure to support complex XML queries involving both nested elements and type inheritance hierarchies. In this index tuning algorithm, we first determine a shape of index page regions by using the query information about the user's query pattern, and then construct an optimal MD-TIX by applying a region splitting strategy that makes the shape of the page regions into the predetermined one. The performance evaluation results indicate that the proposed tuning algorithm builds an optimal MD-TIX by a given query pattern, and in the case of the three-dimensional query regions for the nested predicates of path length 2, the performance is much enhanced according to the skewed degree of the query region's shape.
https://doi.org/10.9717/kmms.2011.14.2.269 인용 PDF KSCI

A New Adaptive Window Size-based Three Step Search Scheme (적응형 윈도우 크기 기반 NTSS (New Three-Step Search Algorithm) 알고리즘)

Yu Jonghoon;Oh Seoung-Jun;Ahn Chang-bum;Park Ho-Chong
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.43 no.1 s.307
- /
- pp.75-84
- /
- 2006
With considering center-biased characteristic, NTSS(New Three-Step Search Algorithm) can improve the performance of TSS(Three-Step Search Algorithm) which is one of the most popular fast block matching algorithms(BMA) to search a motion vector in a video sequence. Although NTSS has generally better Quality than TSS for a small motion sequence, it is hard to say that NTSS can provide better quality than TSS for a large motion sequence. It even deteriorates the quality to increase a search window size using NTSS. In order to address this drawback, this paper aims to develop a new adaptive window size-based three step search scheme, called AWTSS, which can improve quality at various window sizes in both the small and the large motion video sequences. In this scheme, the search window size is dynamically changed to improve coding efficiency according to the characteristic of motion vectors. AWTSS can improve the video quality more than 0.5dB in case of large motion with keeping the same quality in case of small motion.
PDF KSCI

Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM (SMOTE와 Light GBM 기반의 불균형 데이터 개선 기법)

Young-Jin, Han;In-Whee, Joe
- KIPS Transactions on Computer and Communication Systems
- /
- v.11 no.12
- /
- pp.445-452
- /
- 2022
Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.
https://doi.org/10.3745/KTCCS.2022.11.12.445 인용 PDF KSCI

Automatic Text Categorization Using Hybrid Multiple Model Schemes (하이브리드 다중모델 학습기법을 이용한 자동 문서 분류)

명순희;김인철
- Journal of the Korean Society for information Management
- /
- v.19 no.4
- /
- pp.35-51
- /
- 2002
Inductive learning and classification techniques have been employed in various research and applications that organize textual data to solve the problem of information access. In this study, we develop hybrid model combination methods which incorporate the concepts and techniques for multiple modeling algorithms to improve the accuracy of text classification, and conduct experiments to evaluate the performances of proposed schemes. Boosted stacking, one of the extended stacking schemes proposed in this study yields higher accuracy relative to the conventional model combination methods and single classifiers.
https://doi.org/10.3743/KOSIM.2002.19.4.035 인용 PDF

A Design for Reduced-Order Observer Based Optimal Regulator in the Discrete System (이산형 시스템에서의 최소차수의 관측자를 이용한 최적 레귤레이터의 개발)

김한실
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.36S no.3
- /
- pp.47-56
- /
- 1999
제한된 출력 즉 오차 측정된 출력 값만을 사용하여 원하는 목표치에 도달하도록 하는 제어 문제를 푸는데 많은 연구가 진행되어 왔다. 종종 그러한 제어기를 설계할 때 해를 구하기 어려운 Non Linear Two Point Boundary Value Problem에 직면하게 된다. 특히 Reduced order 추정자 알고리즘은 백색 잡음에 의하여 영향을 받은 선형 시스템의 측정된 상태 뿐 만 아니라 보조 상태를 추정하기 위하여 개발되었다. 추정자를 설계할 때 상태는 무편향성이고 추정자의 편차는 추정자 및 추정상태와 공통되는 상태에 대한 모든 출력의 subspace에 수직이 된다. 특히 reduced order에서의 필터 성능은 full order에서의 필터 성능에 대해 suboptimal 이지만 상응한 Riccati equation을 푸는데 계산시간이 줄고 memory사용이 적은 이점이 있다. 본 논문에서는 Kronecker algebra와 선택행렬을 이용하여 Non Linear Two Point Boundary Value Problem을 Linear Two Point Boundary Value Problem으로 변환시켜 부수적으로 수반되는 대수적인 Riccati equation을 유도함으로써 문제를 쉽게 해결하는데 있다.
PDF

Search Result 73, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)