• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.028 seconds

A Study on the Hyun-Mu Sutra(玄武經) of Jeungsan (증산계 『현무경』 연구)

  • Koo, Jung-hoe
    • Journal of the Daesoon Academy of Sciences
    • /
    • v.25_1
    • /
    • pp.25-85
    • /
    • 2015
  • In this study, source criticism (an establishment of authentic text) of the Hyun-Mu Sutra(玄武經) among different editions is studied and an attempt of a new interpretation appropriate to that is attempted. The Hyun-Mu Sutra, a scripture written in 1909, began to communicate with the world through the religions of Jeungsanism. In particular, it was remarkable that The Hyun-Mu Sutra was absorbed as canon textbooks Jeonkyung(典經), the Scriptures of Daesoonjinrihoe, The Fellowship of Daesoon Truth(大巡眞理) from a loner and secret pull-out of heritage traditions. However, this scripture though written in 1909 and more than 100 years has passed, remained in a state unestablished authentic text. The Hyun-Mu Sutra is the scripture consisted of 25 pages by the religions of Jeungsanism[Gang Il-sun 姜一淳(1871~1909)]. 33 page type of Hyun-Mu Sutra has been distributed in the world until now the authentic text of The Hyun-Mu Sutra. However, as a result of the examination, diagnostic scripture(病勢文) was found to have been added by descendants. After a review of authentic text of The Hyun-Mu Sutra, it concluded that there is no diagnostic scripture in primary The Hyun-Mu Sutra. Though The Hyun-Mu Sutra is a booklet of a small amount, the notation and expression is so unique, it has been in secrecy to read its contents. Interpretation way of The Hyun-Mu Sutra up to now can be summarized in two as follows. 1) approaches by I-ching 2) approaches by ten celestrial stemps and twelve earthly branches(10干12支). Approaches by I-ching among this sometimes was supplemented with Buddhist classification methods. Nevertheless, these studies can be evaluated limited because it fails to secure authentic text of The Hyun-Mu Sutra. In this study, the contents of The Hyun-Mu Sutra was examined itemized by focusing on the following four points. 1) The icon of The Hyun-Mu Sutra(玄武經符) is similar as normal talisman(符籍) but it has other features. 2) 'Reverse Fonts'(反書體)[the opposite view of the standard fonts(正書體), reflected in the mirror fonts] and size or location used in text is not in uniform. 3) letters in scripture were pointed and points were stamped in the left and upper and lower characters. 4) "Spiritual poem" (詠歌, the Korean traditional music with a view of elegance as an origin of eco), and the music with the Five-Sounds[宮Gung, 商Sang, 角Gak, 徵Chi, 羽Wu) were related. As a result, content analysis of The Hyun-Mu Sutra is carried out in the next four points. 1) The icon of The Hyun-Mu Sutra (玄武經符) has been primarily developed by Jeungsan. 2) 'Reverse Fonts'(反書體)[the opposite view of the standard fonts(正書體), reflected in the mirror fonts] and reverse location such as '宙宇' [the reverse of '宇宙'] represents based on a new world based on a forward and reverse I-ching(正易). 3) Dot and neighbor points is a symbolic map that guides the position of lateral new world(後天) and era(人尊) 4) Spiritual poem is the entrance to achieve the Realization of Do(道通). The above can be considered as the results of this study.

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts (인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별)

  • Kim, Junhong;Seo, Deokseong;Kim, Haedong;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.43 no.3
    • /
    • pp.192-202
    • /
    • 2017
  • This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.

Distinguishing Referential Expression 'Geot' Using Decision Tree (결정 트리를 이용한 지시 표현 '것'의 구별)

  • Jo, Eun-Kyoung;Kim, Hark-Soo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.880-888
    • /
    • 2007
  • Referential expression 'Geot' is often occurred in Korean dialogues. However, it has not been properly dealt with by the previous researchers of reference resolution, since it is not by itself the referential expression like pronoun and definite noun phrases, and it has never been discriminated from non-referring 'geot'. To resolve this problem, we establish a feature set which is based on the linguistic property of 'geot' and the discourse property of its text, and propose a method to identify referential 'geot' from non-referring 'geot' using decision tree. In the experiment, our system achieved the F-measures of 92.3% for non-referring geot and of 82.2% for referential geot and the total classification performance of 89.27%, and outperformed the classification system based on pattern rules.

A Study of Designing the Intelligent Information Retrieval System by Automatic Classification Algorithm (자동분류 알고리즘을 이용한 지능형 정보검색시스템 구축에 관한 연구)

  • Seo, Whee
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.283-304
    • /
    • 2008
  • This is to develop Intelligent Retrieval System which can automatically present early query's category terms(association terms connected with knowledge structure of relevant terminology) through learning function and it changes searching form automatically and runs it with association terms. For the reason, this theoretical study of Intelligent Automatic Indexing System abstracts expert's index term through learning and clustering algorism about automatic classification, text mining(categorization), and document category representation. It also demonstrates a good capacity in the aspects of expense, time, recall ratio, and precision ratio.

  • PDF

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

  • Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.3
    • /
    • pp.29-39
    • /
    • 2014
  • In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

A Study for Research Area of Library and Information Science by Network Text Analysis (네트워크 텍스트 분석을 통한 문헌정보학 최근 연구 경향 분석)

  • Cho, Jane
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.4
    • /
    • pp.65-83
    • /
    • 2011
  • In this study, Network Text Analysis was performed on 1,752 articles which had been published in recent 7 years and drew the subject concept distribution and their relations in Library and Information Science research areas. Furthermore, for analyzing more recent trends and changing aspects, this study performed secondary analysis based on 482 articles published in recent 2 years. Results show that "public library", and "academic library" concepts were most frequently studied in the field and "evaluation", "education", and "web" concepts showed the highest-degree centrality during the recent 7 years. In the result of recent two years analysis, "web", and "classification" concepts showed high frequency and "user", and "public library" showed an improvement in high degree centrality.

Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm (텍스트 마이닝 기법을 활용한 기후변화관련 식품분야 논문초록 분석)

  • Bae, Kyu Yong;Park, Ju-Hyun;Kim, Jeong Seon;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1429-1437
    • /
    • 2013
  • Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.

Analyzing Issues on Environment-Friendly Agriculture Using Topic Modeling and Network Analysis (토픽모델링과 네트워크분석을 활용한 친환경농업 이슈분석에 관한 연구)

  • Shin, Ye-Eun;Shin, Eun-Seo;Kim, Sang-Bum;Choi, Jin-Ah;Kim, Myunghyun;Han, Seokjun;An, Kyungjin
    • Journal of Korean Society of Rural Planning
    • /
    • v.29 no.4
    • /
    • pp.35-53
    • /
    • 2023
  • This study attempts to identify the flow of key topics and issues of research trends related to environment-friendly agriculture conducted around the 2000s in South Korea and compare them with the environment-friendly agriculture promotion plan to seek the level of consistency and the direction of future development of environment-friendly agriculture. For the analysis of environment-friendly agriculture research trends and policy consistency, 'topic modeling', which is suitable for subject classification of large amounts of unstructured data, and 'text network analysis', which visualizes the relationship between keywords as a network and interprets its characteristics, were utilized. Overall, active discussions were held on 'technical discussions for the production and cultivation of environment-friendly agricultural products' and 'food safety & consumer awareness', and keywords such as production, cultivation, consumption, and safety were consistently linked to other keywords regardless of time. In addition, it was found that the issue of environment-friendly agriculture was partially consistent with the policy direction of the period. Considering the fact that the ongoing '5th Environment-Friendly Agriculture Promotion Phase' emphasizes the strengthening of rural environment management and aims to ensure the continuous quantitative and qualitative development of environment-friendly agriculture, active discussions and research on its environmental contributions and management methods are needed.

Research Trend Analysis of the Prevalence of Complementary and Alternative Medicine in Korea (국내 보완대체의학 사용 실태조사 연구의 동향 분석)

  • Kim, Sul-Gi;Lee, Sang-Hun;Seo, Hyun-Ju;Baek, Seung-Min;Choi, Sun-Mi
    • The Journal of Korean Medicine
    • /
    • v.33 no.1
    • /
    • pp.24-41
    • /
    • 2012
  • Objectives: This study reviewed research trends concerning the prevalence of complementary and alternative medicine (CAM) use and to suggest future research directions appropriate to medical circumstances in Korea. Methods: We searched for surveys of CAM use in 8 databases including 6 Korean databases, Ovid MEDLINE, and the CINAHL electronic database. Three independent reviewers working in pairs screened titles and abstracts of articles for eligibility. Full text was retrieved in case of disagreement on the eligibility. The main analysis targets included survey researcher's affiliation, terminology used in the title, study subject, definition of CAM, classifications of CAM modalities, and the area assortment of CAM and traditional Korean medicine (TKM). Results: 92 articles were included for analysis. The major constituent of affiliation was doctors (53%). According to years, study subjects were diversified to a large range of diseases. Since 2003, terminology is absorbed to use CAM. But actually, the most commonly used definition in the research was comprehensive such as "not generally considered part of major medicine" (55.4%) and the most used classification of CAM was self-criteria (61.9%). As for area assortment of CAM and TKM, many therapies exist in a gray zone between CAM and TKM. Conclusions: Standardized definition and classification criteria about CAM fit to the Korean healthcare system have not yet developed. For traditional Korean medicine academia, more concern should be paid to establishing appropriate development of definitions and classification criteria.