Search | Korea Science

Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.1
- /
- pp.1-8
- /
- 2021
In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.
https://doi.org/10.3745/KTSDE.2021.10.1.1 인용 PDF KSCI