Search | Korea Science

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.1
- /
- pp.1-8
- /
- 2021
In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.
https://doi.org/10.3745/KTSDE.2021.10.1.1 인용 PDF KSCI

Korean Part-Of-Speech Tagging by using Head-Tail Tokenization (Head-Tail 토큰화 기법을 이용한 한국어 품사 태깅)

Suh, Hyun-Jae;Kim, Jung-Min;Kang, Seung-Shik
- Smart Media Journal
- /
- v.11 no.5
- /
- pp.17-25
- /
- 2022
Korean part-of-speech taggers decompose a compound morpheme into unit morphemes and attach part-of-speech tags. So, here is a disadvantage that part-of-speech for morphemes are over-classified in detail and complex word types are generated depending on the purpose of the taggers. When using the part-of-speech tagger for keyword extraction in deep learning based language processing, it is not required to decompose compound particles and verb-endings. In this study, the part-of-speech tagging problem is simplified by using a Head-Tail tokenization technique that divides only two types of tokens, a lexical morpheme part and a grammatical morpheme part that the problem of excessively decomposed morpheme was solved. Part-of-speech tagging was attempted with a statistical technique and a deep learning model on the Head-Tail tokenized corpus, and the accuracy of each model was evaluated. Part-of-speech tagging was implemented by TnT tagger, a statistical-based part-of-speech tagger, and Bi-LSTM tagger, a deep learning-based part-of-speech tagger. TnT tagger and Bi-LSTM tagger were trained on the Head-Tail tokenized corpus to measure the part-of-speech tagging accuracy. As a result, it showed that the Bi-LSTM tagger performs part-of-speech tagging with a high accuracy of 99.52% compared to 97.00% for the TnT tagger.
https://doi.org/10.30693/SMJ.2022.11.5.17 인용 PDF KSCI

A Study of Analysis and Response and Plan for National and International Security Practices using Fin-Tech Technologies (핀테크 금융 기술을 이용한 국내외 보안 사례 분석 및 대응 방안에 대한 연구)

Shin, Seung-Soo;Jeong, Yoon-Su;An, Yu-Jin
- Journal of Convergence Society for SMB
- /
- v.5 no.3
- /
- pp.1-7
- /
- 2015
Recently, finance technology related to Fin-Tech has emerged while national and international financial incidents have increased. Security technologies that are currently operated in the financial institutions, have been reported to be vulnerable to security attacks. In this paper, we propose a method of response and plan of security incident using Fin-Tech technology in the divers authentication methods and the usage of biometrics. Proposed method provides a convenient banking services to the users by integrating IT technology, such as personal asset management, crowdfunding to finance technology. Also, the proposed method may provide the security with ease by applying the security technologies such as PCI-DSS, tokenization technique, FDS, the block chain. Proposed method analyzes a number of security cases in relation to the Fin-Tech, financial technologies, for a response.
PDF

Policy-based performance comparison study of Real-time Simultaneous Translation (실시간 동시통번역의 정책기반 성능 비교 연구)

Lee, Jungseob;Moon, Hyeonseok;Park, Chanjun;Seo, Jaehyung;Eo, Sugyeong;Lee, Seungjun;Koo, Seonmin;Lim, Heuiseok
- Journal of the Korea Convergence Society
- /
- v.13 no.3
- /
- pp.43-54
- /
- 2022
Simultaneous translation is online decoding to translates with only subsentence. The goal of simultaneous translation research is to improve translation performance against delay. For this reason, most studies find trade-off performance between delays. We studied the experiments of the fixed policy-based simultaneous translation in Korean. Our experiments suggest that Korean tokenization causes many fragments, resulting in delay compared to other languages. We suggest follow-up studies such as n-gram tokenization to solve the problems.
https://doi.org/10.15207/JKCS.2022.13.03.043 인용 PDF KSCI

Design of MD Authentication and Privacy for Mobile Micro-payment based on NFC (NFC 기반 모바일 소액 결제를 위한 MD 인증과 프라이버시 설계)

Kim, Yong-Il;Kim, Dae-Gue;Cha, Byung-Rae
- Journal of Advanced Navigation Technology
- /
- v.17 no.1
- /
- pp.47-55
- /
- 2013
In this paper, we propose the micropayment model based on NFC, authentication, and privacy technique to support micro-payment in aspect of information technology to reinvigorate the traditional market. The micropayment model supports facilities of payment using smart phone based on NFC, and the encryption and tokenization support the functions of MD authentication, indirection authentication, and privacy of user's payment.
https://doi.org/10.12673/jkoni.2013.17.01.047 인용 PDF KSCI

A Methodology for Urdu Word Segmentation using Ligature and Word Probabilities

Khan, Yunus;Nagar, Chetan;Kaushal, Devendra S.
- International Journal of Ocean System Engineering
- /
- v.2 no.1
- /
- pp.24-31
- /
- 2012
This paper introduce a technique for Word segmentation for the handwritten recognition of Urdu script. Word segmentation or word tokenization is a primary technique for understanding the sentences written in Urdu language. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A method is proposed for word segmentation in this paper. It finds the boundaries of words in a sequence of ligatures using probabilistic formulas, by utilizing the knowledge of collocation of ligatures and words in the corpus. The word identification rate using this technique is 97.10% with 66.63% unknown words identification rate.
https://doi.org/10.5574/IJOSE.2012.2.1.024 인용 PDF KSCI

KoRIBES : A Study on the Problems of RIBES in Automatic Evaluation English-Korean Patent Machine Translation (특허 기계 번역에 대한 RIBES 한국어 자동평가 문제에 대한 고찰)

Jang, Hyeon-Jin;Jang, Moon-Seok;Noh, Han-Sung
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.543-547
- /
- 2020
자연어 처리에서 기계번역은 가장 많이 사용되고 빠르게 발전하고 있다. 기계번역에 있어서 사람의 평가가 가장 정확하고 중요하지만 많은 시간과 비용이 발생된다. 이에 기계번역을 자동 평가하는 방법들이 많이 제안되어 사용되고 있지만, 한국어 특성을 잘 반영한 자동평가 방법은 연구되지 않고 있다. BLEU와 같은 자동평가 방법을 많이 사용하고 있지만 언어의 특성 차이로 인해 원하는 평가결과를 얻지 못하는 경우가 발생하며, 특히 특허나 논문과 같은 기술문서의 번역에서는 더 많이 발생한다. 이에 본 논문에서는 단어의 정밀도와 어순이 평가에 영향이 있는 RIBES를 가지고 특허 기계 번역에서 영어→한국어로 기계 번역된 결과물의 자동평가에 대해 사람의 평가와 유사한 결과를 얻기 위해 tokenization 과정에서 복합 형태소 분리를 통한 평가방법을 제안하고자 한다.
PDF

NFT Tokenization of Real Estate and Divisible FT Trading with Asset Portfolio Management (부동산 소유권 NFT 와 분할 판매 및 거래 시스템 설계)

Kim, Young-Gun;Kim, Seong-Whan;Song, Hyo Jung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.11a
- /
- pp.258-260
- /
- 2022
대체 불가능 토큰 (NFT, non-fungible token)은 고유하고 더 이상 분할할 수 없는 특성을 가지고 있다. NFT 는 디지털 콘텐츠에 대한 소유권을 증명해 주지만 현재 1) 소유권 증명 이상의 유틸리티가 명확하지 않고, 2) 토큰이지만 유동성이 거의 없으며, 3) 가격이 예측 불가능하다. 특히, 부동산의 경우 가격이 매우 높은 특징으로 인하여 투자 진입장벽이 매우 높다. NFT 분할을 하면 유동성의 증가, 그리고 접근성 증가에 따른 커뮤니티 볼륨의 증가를 기대해 볼 수 있다. 이러한 특성을 활용하여 기존에 투자하기 어려웠던 부동산을 다양한 기술을 활용하여 쉽게 투자를 할 수 있게 된다. 또한, Black Litterman 모델을 활용하여 보다 여러 종류의 NFT 들에 대한 최적 포트폴리오를 구성할 수 있는 알고리즘을 설계하고 구현하였다.
https://doi.org/10.3745/PKIPS.y2022m11a.258 인용 PDF

Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts (한의학 고문헌 텍스트 분석을 위한 비지도학습 기반 단어 추출 방법 비교)

Oh, Junho
- Journal of Korean Medical classics
- /
- v.32 no.3
- /
- pp.47-57
- /
- 2019
Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning. Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them. Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone. Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.
https://doi.org/10.14369/jkmc.2019.32.3.047 인용 PDF KSCI HTML

Phrase-Chunk Level Hierarchical Attention Networks for Arabic Sentiment Analysis

Abdelmawgoud M. Meabed;Sherif Mahdy Abdou;Mervat Hassan Gheith
- International Journal of Computer Science & Network Security
- /
- v.23 no.9
- /
- pp.120-128
- /
- 2023
In this work, we have presented ATSA, a hierarchical attention deep learning model for Arabic sentiment analysis. ATSA was proposed by addressing several challenges and limitations that arise when applying the classical models to perform opinion mining in Arabic. Arabic-specific challenges including the morphological complexity and language sparsity were addressed by modeling semantic composition at the Arabic morphological analysis after performing tokenization. ATSA proposed to perform phrase-chunks sentiment embedding to provide a broader set of features that cover syntactic, semantic, and sentiment information. We used phrase structure parser to generate syntactic parse trees that are used as a reference for ATSA. This allowed modeling semantic and sentiment composition following the natural order in which words and phrase-chunks are combined in a sentence. The proposed model was evaluated on three Arabic corpora that correspond to different genres (newswire, online comments, and tweets) and different writing styles (MSA and dialectal Arabic). Experiments showed that each of the proposed contributions in ATSA was able to achieve significant improvement. The combination of all contributions, which makes up for the complete ATSA model, was able to improve the classification accuracy by 3% and 2% on Tweets and Hotel reviews datasets, respectively, compared to the existing models.
https://doi.org/10.22937/IJCSNS.2023.23.9.15 인용 PDF

Search Result 33, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)