• Title/Summary/Keyword: Dictionary Learning

Search Result 141, Processing Time 0.018 seconds

How to Use Effective Dictionary Feature for Deep Learning based Named Entity Recognition (딥러닝 기반의 개체명 인식을 위한 효과적인 사전 자질 사용 방법)

  • Kim, Hong-Jin;Kim, Hark-Soo
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.293-296
    • /
    • 2019
  • 개체명 인식은 입력 문장에서 인명, 지명, 기관명, 날짜, 시간과 같이 고유한 의미를 갖는 단어들을 찾아 개체명을 부착하는 기술이다. 최근 개체명 인식기는 형태소 단위나 음절 단위의 입력을 사용하는 연구가 주로 진행되고 있다. 그러나 형태소 단위 개체명 인식은 미등록어를 처리하지 못하는 문제점이 존재하고 음절 단위 개체명 인식은 단어의 의미를 제대로 반영하지 못하는 문제점이 존재한다. 본 논문에서는 이 문제점을 보완하기 위해 품사 정보를 활용한 음절 단위 개체명 인식기를 제안한다. 또한 개체명 인식 성능에 큰 영향을 미치는 개체명 사전 자질을 더 효과적으로 사용할 수 있는 방법을 제안하며 이 방법을 사용했을 때 기존의 방법보다 향상된 개체명 인식 성능(F1-score 0.8576)을 보였다.

  • PDF

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

The Prediction of Cryptocurrency on Using Text Mining and Deep Learning Techniques : Comparison of Korean and USA Market (텍스트 마이닝과 딥러닝을 활용한 암호화폐 가격 예측 : 한국과 미국시장 비교)

  • Won, Jonggwan;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • In this study, we predicted the bitcoin prices of Bithum and Coinbase, a leading exchange in Korea and USA, using ARIMA and Recurrent Neural Networks(RNNs). And we used news articles from each country to suggest a separated RNN model. The suggested model identifies the datasets based on the changing trend of prices in the training data, and then applies time series prediction technique(RNNs) to create multiple models. Then we used daily news data to create a term-based dictionary for each trend change point. We explored trend change points in the test data using the daily news keyword data of testset and term-based dictionary, and apply a matching model to produce prediction results. With this approach we obtained higher accuracy than the model which predicted price by applying just time series prediction technique. This study presents that the limitations of the time series prediction techniques could be overcome by exploring trend change points using news data and various time series prediction techniques with text mining techniques could be applied to improve the performance of the model in the further research.

A Study on Teaching and Learning Strategies to Enhance Information Utilization of North Korean Defectors (북한이탈주민의 정보 활용 강화를 위한 교수학습 전략 연구)

  • Lee, Sunhee;Byun, Hoseung
    • The Journal of Korean Association of Computer Education
    • /
    • v.23 no.2
    • /
    • pp.73-82
    • /
    • 2020
  • The main purpose of this study was to analyze the general informatization situation of North Korean defectors and to study the characteristics and needs of the learners in order to provide the directions of information education for them. The results of the study showed the following characteristics of the North Korean defectors: They are slow learners due to the fear of new devices, have difficulty in learning due to the unfamiliar language of information and English, and indifferent when the situation is not related to themselves. Based on these learner characteristics and needs, this study suggests the strategies of step-by-step repetition, use of North and South Korean dictionary of the information terminology, apply job-centered and communication abilities, and suggested a four-element STEP model. Raising the level of informatization of North Korean defectors will help establish a successful settlement to South Korea. This will be a valuable foundation and a stepping stone for the future unification of Korea.

Development of Multimedia Database for Earth Science Learning (지구과학 학습을 위한 멀티미디어 학습 자료 데이터베이스 개발)

  • Lee, Won-Kook;Kim, Yeo-Sang;Kim, Chil-Young;Kim, Jong-Hun;Kim, Hee-Soo
    • Journal of the Korean earth science society
    • /
    • v.21 no.2
    • /
    • pp.116-127
    • /
    • 2000
  • This study is aimed at the development of multimedia learning program for earth science in the middle and high school. This program was made of HTML format and includes a variety of texts, graphs, pictures, drawings, animations, and moving image materials. And it was composed of six database elements(learning context, terminology dictionary, practical science, inquiry actvity, image material, and test item). The results of applying this program to students and teachers gave affirmative answers. The program is being offered on an internet website under Institute of Science Education of Kongju National University.

  • PDF

Design and Implementation of Educational Information Sharing Systems using Bookmark (즐겨찾기를 이용한 교육용 정보공유시스템의 설계 및 구현)

  • Han, Sun-Gwan
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.6
    • /
    • pp.77-84
    • /
    • 2004
  • This study proposed the agent system for educational information sharing using bookmark. In order to search and share the educational information effectively, we designed DAML+OIL-typed bookmark information. Proposed system in this study had the P2P type based on Client-Server type. We implemented the bookmark agent that has the intelligent characteristics, that is, automatic categorization of peers and documents, autonomous communication between agents using DAML, and delicate information searching using the ontology dictionary in Semantic Web environment. Hereafter, this study will contribute to activate sharing and searching educational information as well as proposed system will offer the important technologies for SCORM-based e-learning environment.

  • PDF

Unsupervised Semantic Role Labeling for Korean Adverbial Case (비지도 학습을 기반으로 한 한국어 부사격의 의미역 결정)

  • Kim, Byoung-Soo;Lee, Yong-Hun;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.112-122
    • /
    • 2007
  • Training a statistical model for semantic role labeling requires a large amount of manually tagged corpus. However. such corpus does not exist for Korean and constructing one from scratch is a very long and tedious job. This paper suggests a modified algorithm of self-training, an unsupervised algorithm, which trains a semantic role labeling model from any raw corpora. For initial training, a small tagged corpus is automatically constructed iron case frames in Sejong Electronic Dictionary. Using the corpus, a probabilistic model is trained incrementally, which achieves 83.00% of accuracy in 4 selected adverbial cases.

Neural-network-based Impulse Noise Removal Using Group-based Weighted Couple Sparse Representation

  • Lee, Yongwoo;Bui, Toan Duc;Shin, Jitae;Oh, Byung Tae
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3873-3887
    • /
    • 2018
  • In this paper, we propose a novel method to recover images corrupted by impulse noise. The proposed method uses two stages: noise detection and filtering. In the first stage, we use pixel values, rank-ordered logarithmic difference values, and median values to train a neural-network-based impulse noise detector. After training, we apply the network to detect noisy pixels in images. In the next stage, we use group-based weighted couple sparse representation to filter the noisy pixels. During this second stage, conventional methods generally use only clean pixels to recover corrupted pixels, which can yield unsuccessful dictionary learning if the noise density is high and the number of useful clean pixels is inadequate. Therefore, we use reconstructed pixels to balance the deficiency. Experimental results show that the proposed noise detector has better performance than the conventional noise detectors. Also, with the information of noisy pixel location, the proposed impulse-noise removal method performs better than the conventional methods, through the recovered images resulting in better quality.

Destripe Hyperspectral Images with Spectral-spatial Adaptive Unidirectional Variation and Sparse Representation

  • Zhou, Dabiao;Wang, Dejiang;Huo, Lijun;Jia, Ping
    • Journal of the Optical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.752-761
    • /
    • 2016
  • Hyperspectral images are often contaminated with stripe noise, which severely degrades the imaging quality and the precision of the subsequent processing. In this paper, a variational model is proposed by employing spectral-spatial adaptive unidirectional variation and a sparse representation. Unlike traditional methods, we exploit the spectral correction and remove stripes in different bands and different regions adaptively, instead of selecting parameters band by band. The regularization strength adapts to the spectrally varying stripe intensities and the spatially varying texture information. Spectral correlation is exploited via dictionary learning in the sparse representation framework to prevent spectral distortion. Moreover, the minimization problem, which contains two unsmooth and inseparable $l_1$-norm terms, is optimized by the split Bregman approach. Experimental results, on datasets from several imaging systems, demonstrate that the proposed method can remove stripe noise effectively and adaptively, as well as preserve original detail information.

Parting Lyrics Emotion Classification using Word2Vec and LSTM (Word2Vec과 LSTM을 활용한 이별 가사 감정 분류)

  • Lim, Myung Jin;Park, Won Ho;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.90-97
    • /
    • 2020
  • With the development of the Internet and smartphones, digital sound sources are easily accessible, and accordingly, interest in music search and recommendation is increasing. As a method of recommending music, research using melodies such as pitch, tempo, and beat to classify genres or emotions is being conducted. However, since lyrics are becoming one of the means of expressing human emotions in music, the role of the lyrics is increasing, so a study of emotion classification based on lyrics is needed. Therefore, in this thesis, we analyze the emotions of the farewell lyrics in order to subdivide the farewell emotions based on the lyrics. After constructing an emotion dictionary by vectoriziong the similarity between words appearing in the parting lyrics through Word2Vec learning, we propose a method of classifying parting lyrics emotions using Word2Vec and LSTM, which classify lyrics by similar emotions by learning lyrics using LSTM.