• Title/Summary/Keyword: words

Search Result 9,087, Processing Time 0.033 seconds

Knowledge Graph-based Korean New Words Detection Mechanism for Spam Filtering (스팸 필터링을 위한 지식 그래프 기반의 신조어 감지 매커니즘)

  • Kim, Ji-hye;Jeong, Ok-ran
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.79-85
    • /
    • 2020
  • Today, to block spam texts on smartphone, a simple string comparison between text messages and spam keywords or a blocking spam phone numbers is used. As results, spam text is sent in a gradually hanged way to prevent if from being automatically blocked. In particular, for words included in spam keywords, spam texts are sent to abnormal words using special characters, Chinese characters, and whitespace to prevent them from being detected by simple string match. There is a limit that traditional spam filtering methods can't block these spam texts well. Therefore, new technologies are needed to respond to changing spam text messages. In this paper, we propose a knowledge graph-based new words detection mechanism that can detect new words frequently used in spam texts and respond to changing spam texts. Also, we show experimental results of the performance when detected Korean new words are applied to the Naive Bayes algorithm.

A study on the measuring health literacy in patients with diabetes in Korea (당뇨병 환자의 건강정보이해능력 측정을 위한 기초 연구)

  • Kang, Soo Jin;Sim, Kang Hee;Chang, Soo Jung;Lee, Mi Sook
    • Korean Journal of Health Education and Promotion
    • /
    • v.33 no.5
    • /
    • pp.47-57
    • /
    • 2016
  • Objectives: To develop and evaluate the applicability of a health literacy instrument in patients with diabetes by measuring their ability to understand diabetes-related words. Methods: Diabetes-related words were extracted from the Korean Diabetes Association's website and literature reviews. In the first phase, three nursing researchers evaluated 2,661 diabetes-related words based on graded lexical vocabularies and what patients need to know about self-care, and the narrowed them to 255 words. In the second phase, a content validity assessment was conducted by an expert panel. In the third phase, the remaining 25 words were administered to 200 conscientious with type 2 patients aged 40 years old and using a Gallup survey from March 3 to 17, 2016 in Seoul, Korea. Descriptive analysis and Rasch analysis were performed to test psychometric properties. Results: The mean score was 21.47 with a range of 0 to 25. The Cronbach's ${\alpha}$ was .92. The health literacy instrument using diabetes-related words had a ceiling effect response. Conclusions: Diabetes-related words are useful and reliable items for testing the health literacy of diabetes patients. Future study is needed to develop and validate health literacy measures for diabetic patients.

A Study on the Smart Tourism Awareness through Bigdata Analysis

  • LEE, Song-Yi;LEE, Hwan-Soo
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.5
    • /
    • pp.45-52
    • /
    • 2020
  • Purpose: In the 4th industrial revolution, services that incorporate various smart technologies in the tourism sector have begun to gain popularity. Accordingly, academic discussions on smart tourism have also started to become active in various fields. Despite recent research, the definition of smart tourism is still ambiguous, and it is not easy to differentiate its scope or characteristics from traditional tourism concepts. Thus, this study aims to analyze the perception of smart tourism exposed online to identify the current point of smart tourism in Korea and present the research direction for conceptualizing smart tourism suitable for the domestic situation. Research design, data, and methodology: This study analyzes the perception of smart tourism exposed online based on 20,198 news data from portal sites over the past six years. Data on words used with smart tourism were collected from the leading portal sites Naver, Daum, and Google. Text mining techniques were applied to identify the social awareness status of smart tourism. Network analysis was used to visualize the results between words related to smart tourism, and CONCOR analysis was conducted to derive clusters formed by words having similarity. Results: As a result of keyword analysis, the frequency of words related to the development and construction of smart tourism areas was high. The analysis of the centrality of the connection between words showed that the frequency of keywords was similar, and that the words "smartphones" and "China" had relatively high connection centrality. The results of network analysis and CONCOR indicated that words were formed into eight groups including related technologies, promotion, globalization, service introduction, innovation, regional society, activation, and utilization guide. The overall results of data analysis showed that the development of smart tourism cities was a noticeable issue. Conclusions: This study is meaningful in that it clearly reflects the differences in the perception of smart tourism between online and research trends despite various efforts to develop smart tourism in Korea. In addition, this study highlights the need to understand smart tourism concepts and enhance academic discussions. It is expected that such academic discussions will contribute to improving the competitiveness of smart tourism research in Korea.

Japanese-Korean Machine Translation System Using Connection Forms of Neighboring Words (인접 단어들의 접속정보를 이용한 일한 기계번역 시스템)

  • Kim, Jung-In
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.998-1008
    • /
    • 2004
  • There are many syntactic similarities between Japanese and Korean languages. Using these similarities, we can make out the Japanese-Korean translation system without most of syntactic analysis and semantic analysis. To improve the translation rates greatly, we have been developing the Japanese-Korean translation system using these similarities from several years ago. However, the system remains some problems such as a translation of inflected words, processing of multi-translatable words and so on. In this paper, we suggest the new method of Japanese-Korean translation by using relations of two neighboring words. To solve the problems, we investigated the connection rules of auxiliary verbs priority. And we design the translation table which is consists of entry tables and connection forms tables. A case of only one translation word, we can translate a Korean to Japanese by direct matching method use of only entry table, otherwise we have to evaluate the connection value by connection forms tables and then we can select the best translation word.

  • PDF

Can Similarities in Medical thought be Quantified? - Focusing on Donguibogam, Uihagibmun and Gyeongagjeonseo - (의학 사상의 유사성은 계량 분석 될 수 있는가 - 『동의보감』과 『의학입문』, 『경악전서』를 중심으로 -)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.31 no.2
    • /
    • pp.71-82
    • /
    • 2018
  • Objectives : The purpose of this study is to compare the similarities among Donguibogam(DO), Uihagibmun(UI), and Gyeongagjeonseo(GY) in order to examine whether the medical thoughts embedded in the texts can be compared in a quantitative way. Methods : Under an empirical assumption that medical thoughts can be reduced to the frequency of major key words within the text, we selected the fourteen words of the four categories that are commonly used to describe physiology and pathology in Korean medicine as key words. And the frequency of these key words was measured and compared with each other in the three important medical texts in Korea. Results : As a result of quantitative analysis based on ${\chi}^2$ statistic, the key words in the books were distributed most heterogeneously in DO and distributed most homogeneously in UI. In comparison of the similarity analyzed by the same method, DO and UI were significantly more similar than those of DO and UI. The results of the word frequency pattern and the similarities of the book contents(CBDF) show that DO is influenced by UI, and the differences between standardized residuals and homogeneity tells us that internal context of both books are constructed differently. Conclusions : These results support the results of traditional research by experts. With the above, we were able to confirm that medical thoughts can be reduced to the frequency of major key words within the text, and compared through the frequency of such key words.

Sensibility Vocabulary for 3D Stereoscopic Image Ride Film (3D입체영상 라이드 필름의 감성어휘)

  • Song, Seung-Keun;Chae, Eel-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.11
    • /
    • pp.120-129
    • /
    • 2011
  • This research aims to investigate the representative affective words and the structure among them to scrutinize user's affect revealed in the ride film based on three dimension stereoscopic image. Previous studies related to the affect were reviewed and the affect words well-suited for three dimension stereoscopic image were collected. Suitability test for two hundred six basic affect words gathered as the result was done from sixty two typical users and four experts. Seventy seven candidate affect words have been selected and by the exclusion of similarity among them, finally twenty six words were extracted from the reduction process. Consequently fifteen representative words and the structure as the network between each word were revealed by using free association test based on twenty six affect words. We propose the affect research including sensors, emotions, and affects related to moving image rather than still mage during doing research affects in most of the previous studies. The future work includes the affect space and the affect effect for ride film based on three dimension stereoscopic image. This study can be adopted practically in the production of ride films and provided with a basic design guideline.

Revealing Hidden Relations between Query-Words for an Efficient Inducing User's Intention of an Information Search (효율적 검색의도 파악을 위한 쿼리 단어 가시화에 관한 연구)

  • Kwon, Soon-Jin;Hong, Chul-Eui;Kim, Won-Il
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.44-52
    • /
    • 2012
  • This paper proposes to increase an efficiency of somebody searching information by a visualization of an unseen query words with well-selected user's intent structures. If a search engine identifies user's intent to pursue information, it would be an effective search engine. To do so, it is needed that relationships between query-words are to be visible after recovering words lost during formulated, and that an intention structure/elements is to be established. This paper will review previous studies, after then, define a simple structure of the search intent, and show a process to expand and to generate the query words appropriate to the intent structure with a method for the visualization of the query words. In this process, some examples and tests are necessary that one of the multiple intent structured layers is to assign to a range of query-words. Increasing/Decreasing an efficiency are analyzed to find. Future research is needed how to automate a process to extend structural nodules of user's intent.

The exploration of the effects of word frequency and word length on Korean word recognition (한국어 단어재인에 있어서 빈도와 길이 효과 탐색)

  • Lee, Changhwan;Lee, Yoonhyoung;Kim, Tae Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.1
    • /
    • pp.54-61
    • /
    • 2016
  • Because a word is the basic unit of language processing, studies of the word recognition processing and the variables that contribute to word recognition processing are very important. Word frequency and word length are recognized as important factors on word recognition. This study examined the effects of those two variables on the Korean word recognition processing. In Experiment 1, two types of Hangul words, pure Hangul words and Hangul words with Hanja counterparts, were used to explore the frequency effects. A frequency effect was not observed for Hangul words with Hanja counterparts. In Experiment 2, the word length was manipulated to determine if the word length effect appears in Hangul words. Contrary to the expectation, one syllable words were processed more slowly than two syllable words. The possible explanations for these results and future research directions are discussed.

Emotional analysis system for social media using sentiment dictionary with newly-created words

  • Shin, Pan-Seop
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.133-140
    • /
    • 2020
  • Emotional analysis is an application of opinion mining that analyzes opinions and tendencies of people appearing in unstructured text. Recently, emotional analysis of social media has attracted attention, but social media contains newly-created words and slang, so it is not easy to analyze with existing emotional analysis. In this study, I design a new emotional analysis system to solve these problems. The proposed system is possible to analyze various emotions as well as positive and negative in social media including newly-created words and slang. First, I collect newly-created words and slang related to emotions that appear in social media. Then, expand the existing emotional model and use it to quantify the degree of sentiment in emotional words. Also, a new sentiment dictionary is constructed by reflecting the degree of sentiment. Finally, I design an emotional analysis system that applies an sentiment dictionary that includes newly-created words and an extended emotional model.

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

  • Nam, Gun-Min;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.157-165
    • /
    • 2021
  • Recently, studies using deep learning to analyze a large amount of text are being actively conducted. In particular, a pre-trained language model that applies the learning results of a large amount of text to the analysis of a specific domain text is attracting attention. Among various pre-trained language models, BERT(Bidirectional Encoder Representations from Transformers)-based model is the most widely used. Recently, research to improve the performance of analysis is being conducted through further pre-training using BERT's MLM(Masked Language Model). However, the traditional MLM has difficulties in clearly understands the meaning of sentences containing new words such as newly coined words. Therefore, in this study, we newly propose NTM(Newly coined words Target Masking), which performs masking only on new words. As a result of analyzing about 700,000 movie reviews of portal 'N' by applying the proposed methodology, it was confirmed that the proposed NTM showed superior performance in terms of accuracy of sensitivity analysis compared to the existing random masking.