• Title/Summary/Keyword: word database

Search Result 235, Processing Time 0.03 seconds

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.

Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

  • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.57-71
    • /
    • 2013
  • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.

An Investigation of Local Naming Issue of Tamarix aphylla (에셀나무(Tamarix aphylla)의 명칭문제에 대한 고찰)

  • Kim, Young-Sook
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.37 no.1
    • /
    • pp.56-67
    • /
    • 2019
  • In order to investigate the issue with the proper name of eshel(Tamarix aphylla) mentioned in the Bible, analysis of morphological taxonomy features of plants, studies on the symbolism of the Tamarix genus, analysis of examples in Korean classics and Chinese classics, and studies on the problems found in translations of Korean, Chinese and Japanese Bibles. The results are as follows. According to plant taxonomy, similar species of the Tamarix genus are differentiated by the leaf and flower, and because the size is very small about 2-4mm, it is difficult to differentiate by the naked eye. However, T. aphylla found in the plains of Israel and T. chinensis of China and Korea have distinctive differences in terms of the shape of the branch that droops and its blooming period. The Tamarix genus is a very precious tree that was planted in royal courtyards of ancient Mesopotamia and the Han(漢) Dynasty of China, and in ancient Egypt, it was said to be a tree that gave life to the dead. In the Bible, it was used as a sign of the covenant that God was with Abraham, and it also symbolized the prophet Samuel and the court of Samuel. When examining the example in Korean classics, the Tamarix genus was used as a common term in the Joseon Dynasty and it was often used as the medical term '$Ch{\bar{e}}ngli{\check{u}}$(檉柳)'. Meanwhile, the term 'wiseonglyu(渭城柳)' was used as a literary term. Upon researching the period and name of literature related to $Ch{\bar{e}}ngli{\check{u}}$(檉柳) among Chinese medicinal herb books, a total of 16 terms were used and among these terms, the term Chuísīliǔ(垂絲柳) used in the Chinese Bible cannot be found. There was no word called 'wiseonglyu(渭城柳)' that originated from the poem by Wang Wei(699-759) of Tang(唐) Dynasty and in fact, the word 'halyu(河柳)' that was related to Zhou(周) China. But when investigating the academic terms of China currently used, the words Chuísīliǔ(垂絲柳) and $Ch{\bar{e}}ngli{\check{u}}$(檉柳) are used equally, and therefore, it appears that the translation of eshel in the Chinese Bible as either Chuísīliǔ (垂絲柳) or $Ch{\bar{e}}ngli{\check{u}}$(檉柳) both appear to be of no issue. There were errors translating tamarix into 'やなぎ(willow)' in the Meiji Testaments(舊新約全書 1887), and translated correctly 'ぎょりゅう(檉柳)' since the Colloquial Japanese Bible(口語譯 聖書 1955). However, there are claims that 'gyoryu(ぎょりゅう 檉柳)' is not an indigenous species but an exotics species in the Edo Period, so it is necessary to reconsider the terminology. As apparent in the Korean classics examples analysis, there is high possibility that Korea's T. chinensis were grown in the Korean Peninsula for medicinal and gardening purposes. Therefore, the use of the medicinal term $Ch{\bar{e}}ngli{\check{u}}$(檉柳) or literary term 'wiseonglyu' in the Korean Bible may not be a big issue. However, the term 'wiseonglyu' is used very rarely even in China and as this may be connected to the admiration of China and Chinese things by literary persons of the Joseon Dynasty, so the use of this term should be reviewed carefully. Therefore, rather than using terms that may be of issue in the Bible, it is more feasible to transliterate the Hebrew word and call it eshel.

Establishment of the Korean Standard Vocal Sound into Character Conversion Rule (한국어 음가를 한글 표기로 변환하는 표준규칙 제정)

  • 이계영;임재걸
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.2
    • /
    • pp.51-64
    • /
    • 2004
  • The purpose of this paper is to establish the Standard Korean Vocal Sound into Character Conversion Rule (Standard VSCC Rule) by reversely applying the Korean Standard Pronunciation Rule that regulates the way of reading written Hangeul sentences. The Standard VSCC Rule performs a crucially important role in Korean speech recognition. The general method of speech recognition is to find the most similar pattern among the standard voice patterns to the input voice pattern. Each of the standard voice patterns is an average of several sample voice patterns. If the unit of the standard voice pattern is a word, then the number of entries of the standard voice pattern will be greater than a few millions (taking inflection and postpositional particles into account). This many entries require a huge database and an impractically too many comparisons in the process of finding the most similar pattern. Therefore, the unit of the standard voice pattern should be a syllable. In this case, we have to resolve the problem of the difference between the Korean vocal sounds and the writing characters. The process of converting a sequence of Korean vocal sounds into a sequence of characters requires our Standard VSCC Rule. Making use of our Standard VSCC Rule, we have implemented a Korean vocal sounds into Hangeul character conversion system. The Korean Standard Pronunciation Rule consists of 30 items. In order to show soundness and completeness of our Standard VSCC Rule, we have tested the conversion system with various data sets reflecting all the 30 items. The test results will be presented in this paper.

Trends Analysis on Research Articles of the Sharing Economy through a Meta Study Based on Big Data Analytics (빅데이터 분석 기반의 메타스터디를 통해 본 공유경제에 대한 학술연구 동향 분석)

  • Kim, Ki-youn
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.97-107
    • /
    • 2020
  • This study aims to conduct a comprehensive meta-study from the perspective of content analysis to explore trends in Korean academic research on the sharing economy by using the big data analytics. Comprehensive meta-analysis methodology can examine the entire set of research results historically and wholly to illuminate the tendency or properties of the overall research trend. Academic research related to the sharing economy first appeared in the year in which Professor Lawrence Lessig introduced the concept of the sharing economy to the world in 2008, but research began in earnest in 2013. In particular, between 2006 and 2008, research improved dramatically. In order to grasp the overall flow of domestic academic research of trends, 8 years of papers from 2013 to the present have been selected as target analysis papers, focusing on titles, keywords, and abstracts using database of electronic journals. Big data analysis was performed in the order of cleaning, analysis, and visualization of the collected data to derive research trends and insights by year and type of literature. We used Python3.7 and Textom analysis tools for data preprocessing, text mining, and metrics frequency analysis for key word extraction, and N-gram chart, centrality and social network analysis and CONCOR clustering visualization based on UCINET6/NetDraw, Textom program, the keywords clustered into 8 groups were used to derive the typologies of each research trend. The outcomes of this study will provide useful theoretical insights and guideline to future studies.

Analysis of Research Topics among Library, Archives and Museums using Topic Modeling (토픽 모델링을 활용한 도서관, 기록관, 박물관간의 연구 주제 분석)

  • Kim, Heesop;Kang, Bora
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.4
    • /
    • pp.339-358
    • /
    • 2019
  • The purpose of this study is to understand the topics of the research for the establishment of cooperative platform between libraries, archives, and museums that carry out the common task of providing knowledge information in a broad sense. To achieve the purpose of this study, 637 bibliographic information on three institutions were collected from the Web version of Scopus database. Among the collected bibliographic information, 5,218 words were extracted through NetMiner V.4 and analysed topic modeling. The results are as follows: First, as a result of analyzing the frequency of word appearance according to the tf-idf weight 'Preservation' was the most hottest topic. Second, the topic modeling analysis through LDA(Latent Dirichlet Allocation) algorithm resulted in 13 topic areas. Third, as a result of expressing 13 topic areas as a network, repository construction was the central topic, and the research topics such as cooperation among institutions, conservation environment for collections, system and policy discovery, life cycle of collections, exhibition of information resources, and information retrieval were closely related to the central topic. Fourth, the trend of 13 topic areas by year 1998 is limited to the specific subjects such as system and policy discovery, information retrieval, and life cycle of collections, while the subsequent studies have been carried out after that year.

A Study on Spoken Digits Analysis and Recognition (숫자음 분석과 인식에 관한 연구)

  • 김득수;황철준
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.3
    • /
    • pp.107-114
    • /
    • 2001
  • This paper describes Connected Digit Recognition with Considering Acoustic Feature in Korea. The recognition rate of connected digit is usually lower than word recognition. Therefore, speech feature parameter and acoustic feature are employed to make robust model for digit, and we could confirm the effect of Considering. Acoustic Feature throughout the experience of recognition. We used KLE 4 connected digit as database and 19 continuous distributed HMM as PLUs(Phoneme Like Units) using phonetical rules. For recognition experience, we have tested two cases. The first case, we used usual method like using Mel-Cepstrum and Regressive Coefficient for constructing phoneme model. The second case, we used expanded feature parameter and acoustic feature for constructing phoneme model. In both case, we employed OPDP(One Pass Dynamic Programming) and FSA(Finite State Automata) for recognition tests. When appling FSN for recognition, we applied various acoustic features. As the result, we could get 55.4% recognition rate for Mel-Cepstrum, and 67.4% for Mel-Cepstrum and Regressive Coefficient. Also, we could get 74.3% recognition rate for expanded feature parameter, and 75.4% for applying acoustic feature. Since, the case of applying acoustic feature got better result than former method, we could make certain that suggested method is effective for connected digit recognition in korean.

  • PDF

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments (제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue;Chung, Woo-Keun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.788-801
    • /
    • 2010
  • Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

Analysis of Articles Published in the Korean Journal of Oriental Medical Prescription (대한한의학방제학회지에 게재된 논문 동향 분석)

  • Kim, An-Na;Song, Mi-Young;Bae, Sun-Hee;Kim, Chul;Kim, Ha-Young;Kim, Young-Sik;Park, Kyoung-Bum;Kim, Hong-Jun
    • Herbal Formula Science
    • /
    • v.18 no.1
    • /
    • pp.57-77
    • /
    • 2010
  • Objective : This study reviews the recent trend of oriental medical prescription research. The data examined are the articles published in the Korean Journal of Oriental Medical Prescription from 1990 to 2009. Method : The data are retrieved through the internet database Oriental Medicine Advanced Searching Integrated System (OASIS) and the collection of the Korean Journal of Oriental Medical Prescription. The number of articles examined is 385, published in 25 volumes of the journal. This study examines the nature of the articles, research methods, subjects, and author information. Research subjects are sorted out by the OASIS key words for the articles published before 1999, and by key word indexes cited in the abstracts for the articles published sinceafter. Results : Among the 385 articles collected, 206 are research articles, 143 philological articles, 35 case studies, and 1 special contribution. A majority of research articles are experimental studies (199 articles or 96.6%), while clinical reports (5 articles or 2.43%), and others studies (2 articles) occupy a small portion. Most of experimental studies (183 articles or 91%) examine the effectiveness of certain prescriptions or treatments. Among the effectiveness studies, 114 articles (62.3%) employ in vivo experiment design, 52 articles (28.42%) in vitro experiments, and 17 articles (9.29%) both in vivo and in vitro experiments. In terms of research subject, the most frequently indexed key words are hepatotoxicity among diseases (9 articles), Bojungikgitang (Bu-Zhong-Yi-Qi-Tang) among prescriptions (10 articles), Buja (Acontii Tuber) among meteria medica (4 articles), immunity and anti-oxidation among efficacy terminology (6 articles each), and Donguibogam(東醫寶鑑) among references in the key words (25 articles). Universities are the main affiliation of authors (76.42%), followed by university hospitals (6.71%), non-academic research institutes (5.55%), local clinics (4.67), academic research institutes (2.81%), hospitals (2.38%), and others (1.44%). The most affiliated institute of the first and correspondent authors is Wonkwang University. In terms of authorship, co-authorship outnumbers sole-authorship by 82.08% to 17.92%. The proportion of authors of a single article is 63.54% which is near the author productivity distribution described by Lotka's law.

The Posttraumatic Stress Research Trends of Korean and Foreign Firefighters (국내외 소방대원의 외상 후 스트레스 연구경향)

  • Baek, Mi-Lye
    • The Korean Journal of Emergency Medical Services
    • /
    • v.13 no.2
    • /
    • pp.61-72
    • /
    • 2009
  • Purpose : This study aimed to analyze the posttraumatic stress research trends in Korean and foreign firefighters. Method : Total 63 published international articles were searched by Pub Med internet site and total 17 published Korean articles were searched by Korean Medical Database internet site using 'PTSD in firefighters'. These articles were analyzed by published time, domains of journal, research designs, key words and research subjects. Result : 1) By the published time, there were 29 disaster-related researches(46.0%) and 34 job-related researches(54.0%) among 63 international articles. However, there were 16 disaster-related researches(94.1%) and 1 job-related research (5.9%) of Korean 17 articles. 2) By the international research domain, 9 researches(14.3%) were published in The Journal of Nervous and Mental Disease. Among domestic research domain, there were 9 researches(52.9%) consisting of 6 master's degrees and 3 doctor degrees. In major analysis of Korean domain, the highest portion is 4 psychology researches. (23.5%) 3) In the term of the international research design, quantitative research methods were highly used in both 23 disaster-related researches (36.5%) and 30 job-related researches(47.5%). In domestic research, quantitative research methods were mostly used in 14 job-related researches(82.3%) and Q methodology was only used in 1 disaster-related research(5.9%). 4) Looking on the research content trends according to the key words, 9 researches (31.0%) done on posttraumatic stress and coping had the most research and was followed up by posttraumatic stress symptom. Among these researches, key words for PTSD(Posttraumatic Stress Disorder) and PTS(Posttraumatic Stress) were mostly used. Moreover, there was 1 domestic study done on verifying the trends of Posttraumatic Stress in disaster-related research with PTS as the key word. In job-related research, the relationship between the Posttraumatic Stress and other factors had the most with ten studies (62.5%). Among these researches, key words for 5 PTSD(31.3%) were mostly used. 5) According to the international research subjects, the Posttrau consist the most subjects with 16 cases each for disaster and job related stress ; however, domestic research had 16 studies(94.1%) only using firefighters and 1 (5.9%) with their families as subjects. Conclusion : Although the studies of Posttraumatic Stress on Korean firefighters had started later than those on Foreign firefighters, first used for crucial topics show research development in various fields of study and should be tested for studies like those done in abroad regarding multiple topics and methods.

  • PDF