Search | Korea Science

Song, Yeongkil;Jeong, Seokwon;Kim, Harksoo
- Journal of KIISE
- /
- v.42 no.11
- /
- pp.1397-1403
- /
- 2015
A named entity(NE) dictionary is an important resource for the performance of NE recognition. However, it is not easy to construct a NE dictionary manually since human annotation is time consuming and labor-intensive. To save construction time and reduce human labor, we propose a semi-automatic system for the construction of a NE dictionary. The proposed system constructs a pseudo-document with Wiki-categories per NE class by using an active learning technique. Then, it calculates similarities between Wiki entries and pseudo-documents using the BM25 model, a well-known information retrieval model. Finally, it classifies each Wiki entry into NE classes based on similarities. In experiments with three different types of NE class sets, the proposed system showed high performance(macro-average F1-score of 0.9028 and micro-average F1-score 0.9554).
https://doi.org/10.5626/JOK.2015.42.11.1397 인용 KSCI

Jeong, Do-Heon
- Journal of the Korean Society for information Management
- /
- v.36 no.4
- /
- pp.83-105
- /
- 2019
This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.
https://doi.org/10.3743/KOSIM.2019.36.4.083 인용 PDF KSCI

Lim, Geun-Young;Cho, Young-Bok
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.23 no.5
- /
- pp.533-539
- /
- 2019
This study proposes a malware classification model that can handle arbitrary length input data using the Microsoft Malware Classification Challenge dataset. We are based on imaging existing data from malware. The proposed model generates a lot of images when malware data is large, and generates a small image of small data. The generated image is learned as time series data by Dynamic RNN. The output value of the RNN is classified into malware by using only the highest weighted output by applying the Attention technique, and learning the RNN output value by Residual CNN again. Experiments on the proposed model showed a Micro-average F1 score of 92% in the validation data set. Experimental results show that the performance of a model capable of learning and classifying arbitrary length data can be verified without special feature extraction and dimension reduction.
https://doi.org/10.6109/jkiice.2019.23.5.533 인용 PDF KSCI HTML