Search | Korea Science

Sentence model based subword embeddings for a dialog system

Chung, Euisok;Kim, Hyun Woo;Song, Hwa Jeon
- ETRI Journal
- /
- v.44 no.4
- /
- pp.599-612
- /
- 2022
This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.
https://doi.org/10.4218/etrij.2020-0245 인용 PDF KSCI

Captive Portal Recommendation System Based on Word Embedding Model (단어 임베딩 모델 기반 캡티브 포털 메뉴 추천 시스템)

Dong-Hun Yeo;Byung-Il Hwang;Dong-Ju Kim
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.07a
- /
- pp.11-12
- /
- 2023
본 논문에서는 상점 내 캡티브 포털을 활용하여 수집된 주문 정보 데이터를 바탕으로 사용자가 선호하는 메뉴를 추천하는 시스템을 제안한다. 이 시스템은 식품 관련 공공 데이터셋으로 학습된 단어 임베딩 모델(Word Embedding Model)로 메뉴명을 벡터화하여 그와 유사한 벡터를 가지는 메뉴를 추천한다. 이 기법은 캡티브 포털에서 수집되는 데이터 특성상 사용자의 개인정보가 비식별화 되고 선택 항목에 대한 정보도 제한되므로 기존의 단어 임베딩 모델을 추천 시스템에 적용하는 경우에 비해 유리하다. 본 논문에서는 실제 동일한 시스템을 사용하는 상점들의 구매 기록 데이터를 활용한 검증 데이터를 확보하여 제안된 추천 시스템이 Precision@k(k=3) 구매 예측에 유의미함을 보인다.
PDF

Preliminary Studies on Embedding Qualitative Reasoning into Qualitative Analysis and Laboratory Simulation

Pang, Jen-Sen;Syed Mustapha, S.M.F.D;Mohd.Zain, Sharifuddin
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2001.01a
- /
- pp.230-236
- /
- 2001
In this paper, we explored the possibilities of embedding Qualitative Reasoning techniques, the Qualitative Process Theory (QPT), and its implementation in the field of inorganic chemistry. The target field of implementation is Qualitative Chemical Analysis and Laboratory Simulation. By embedding such technique in this education software we aim to combine theory and practice into a single package. The system, are able to generate reasoning and explanation based on chemical theories, helping student in mastering basic chemistry knowledge and practical skill as well. We also review the suitability of embedding QPT techniques into chemistry in general, by comparing some examples from both fields.
PDF

An Intelligence Embedding Quadruped Pet Robot with Sensor Fusion (센서 퓨전을 통한 인공지능 4족 보행 애완용 로봇)

Lee Lae-Kyoung;Park Soo-Min;Kim Hyung-Chul;Kwon Yong-Kwan;Kang Suk-Hee;Choi Byoung-Wook
- Journal of Institute of Control, Robotics and Systems
- /
- v.11 no.4
- /
- pp.314-321
- /
- 2005
In this paper an intelligence embedding quadruped pet robot is described. It has 15 degrees of freedom and consists of various sensors such as CMOS image, voice recognition and sound localization, inclinometer, thermistor, real-time clock, tactile touch, PIR and IR to allows owners to interact with pet robot according to human's intention as well as the original features of pet animals. The architecture is flexible and adopts various embedded processors for handling sensors to provide modular structure. The pet robot is also used for additional purpose such like security, gaming visual tracking, and research platform. It is possible to generate various actions and behaviors and to download voice or music files to maintain a close relation of users. With cost-effective sensor, the pet robot is able to find its recharge station and recharge itself when its battery runs low. To facilitate programming of the robot, we support several development environments. Therefore, the developed system is a low-cost programmable entertainment robot platform.
https://doi.org/10.5302/J.ICROS.2005.11.4.314 인용 PDF KSCI

Proper Noun Embedding Model for the Korean Dependency Parsing

Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
- Journal of Multimedia Information System
- /
- v.9 no.2
- /
- pp.93-102
- /
- 2022
Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.
https://doi.org/10.33851/JMIS.2022.9.2.93 인용 PDF KSCI HTML

Gated Multi-channel Network Embedding for Large-scale Mobile App Clustering

Yeo-Chan Yoon;Soo Kyun Kim
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.6
- /
- pp.1620-1634
- /
- 2023
This paper studies the task of embedding nodes with multiple graphs representing multiple information channels, which is useful in a large volume of network clustering tasks. By learning a node using multiple graphs, various characteristics of the node can be represented and embedded stably. Existing studies using multi-channel networks have been conducted by integrating heterogeneous graphs or limiting common nodes appearing in multiple graphs to have similar embeddings. Although these methods effectively represent nodes, it also has limitations by assuming that all networks provide the same amount of information. This paper proposes a method to overcome these limitations; The proposed method gives different weights according to the source graph when embedding nodes; the characteristics of the graph with more important information can be reflected more in the node. To this end, a novel method incorporating a multi-channel gate layer is proposed to weigh more important channels and ignore unnecessary data to embed a node with multiple graphs. Empirical experiments demonstrate the effectiveness of the proposed multi-channel-based embedding methods.
https://doi.org/10.3837/tiis.2023.06.005 인용 PDF HTML

An Exploratory Approach to Discovering Salary-Related Wording in Job Postings in Korea

Ha, Taehyun;Coh, Byoung-Youl;Lee, Mingook;Yun, Bitnari;Chun, Hong-Woo
- Journal of Information Science Theory and Practice
- /
- v.10 no.spc
- /
- pp.86-95
- /
- 2022
Online recruitment websites discuss job demands in various fields, and job postings contain detailed job specifications. Analyzing this text can elucidate the features that determine job salaries. Text embedding models can learn the contextual information in a text, and explainable artificial intelligence frameworks can be used to examine in detail how text features contribute to the models' outputs. We collected 733,625 job postings using the WORKNET API and classified them into low, mid, and high-range salary groups. A text embedding model that predicts job salaries based on the text in job postings was trained with the collected data. Then, we applied the SHapley Additive exPlanations (SHAP) framework to the trained model and discovered the significant words that determine each salary class. Several limitations and remaining words are also discussed.
https://doi.org/10.1633/JISTaP.2022.10.S.9 인용 PDF KSCI

Korean Phoneme Sequence based Word Embedding (한국어 음소열 기반 워드 임베딩 기술)

Chung, Euisok;Jeon, Hwa Jeon;Lee, Sung Joo;Park, Jeon-Gue
- Annual Conference on Human and Language Technology
- /
- 2017.10a
- /
- pp.225-227
- /
- 2017
본 논문은 한국어 서브워드 기반 워드 임베딩 기술을 다룬다. 미등록어 문제를 가진 기존 워드 임베딩 기술을 대체할 수 있는 새로운 워드 임베딩 기술을 한국어에 적용하기 위해, 음소열 기반 서브워드 자질 검증을 진행한다. 기존 서브워드 자질은 문자 n-gram을 사용한다. 한국어의 경우 특정 단음절 발음은 단어에 따라 달라진다. 여기서 음소열 n-gram은 특정 서브워드 자질의 변별력을 확보할 수 있다는 장점이 있다. 본 논문은 서브워드 임베딩 기술을 재구현하여, 영어 환경에서 기존 워드 임베딩 사례와 비교하여 성능 우위를 확보한다. 또한, 한국어 음소열 자질을 활용한 실험 결과에서 의미적으로 보다 유사한 어휘를 벡터 공간상에 근접시키는 결과를 보여 준다.
PDF

Korean Phoneme Sequence based Word Embedding (한국어 음소열 기반 워드 임베딩 기술)

Chung, Euisok;Jeon, Hwa Jeon;Lee, Sung Joo;Park, Jeon-Gue
- 한국어정보학회:학술대회논문집
- /
- 2017.10a
- /
- pp.225-227
- /
- 2017
본 논문은 한국어 서브워드 기반 워드 임베딩 기술을 다룬다. 미등록어 문제를 가진 기존 워드 임베딩 기술을 대체할 수 있는 새로운 워드 임베딩 기술을 한국어에 적용하기 위해, 음소열 기반 서브워드 자질 검증을 진행한다. 기존 서브워드 자질은 문자 n-gram을 사용한다. 한국어의 경우 특정 단음절 발음은 단어에 따라 달라진다. 여기서 음소열 n-gram은 특정 서브워드 자질의 변별력을 확보할 수 있다는 장점이 있다. 본 논문은 서브워드 임베딩 기술을 재구현하여, 영어 환경에서 기존 워드 임베딩 사례와 비교하여 성능 우위를 확보한다. 또한, 한국어 음소열 자질을 활용한 실험 결과에서 의미적으로 보다 유사한 어휘를 벡터 공간상에 근접시키는 결과를 보여 준다.
PDF

Opera Clustering: K-means on librettos datasets

Jeong, Harim;Yoo, Joo Hun
- Journal of Internet Computing and Services
- /
- v.23 no.2
- /
- pp.45-52
- /
- 2022
With the development of artificial intelligence analysis methods, especially machine learning, various fields are widely expanding their application ranges. However, in the case of classical music, there still remain some difficulties in applying machine learning techniques. Genre classification or music recommendation systems generated by deep learning algorithms are actively used in general music, but not in classical music. In this paper, we attempted to classify opera among classical music. To this end, an experiment was conducted to determine which criteria are most suitable among, composer, period of composition, and emotional atmosphere, which are the basic features of music. To generate emotional labels, we adopted zero-shot classification with four basic emotions, 'happiness', 'sadness', 'anger', and 'fear.' After embedding the opera libretto with the doc2vec processing model, the optimal number of clusters is computed based on the result of the elbow method. Decided four centroids are then adopted in k-means clustering to classify unsupervised libretto datasets. We were able to get optimized clustering based on the result of adjusted rand index scores. With these results, we compared them with notated variables of music. As a result, it was confirmed that the four clusterings calculated by machine after training were most similar to the grouping result by period. Additionally, we were able to verify that the emotional similarity between composer and period did not appear significantly. At the end of the study, by knowing the period is the right criteria, we hope that it makes easier for music listeners to find music that suits their tastes.
https://doi.org/10.7472/jksii.2022.23.2.45 인용 PDF KSCI HTML

Search Result 76, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)