• Title/Summary/Keyword: Comparative Text Analysis

Search Result 170, Processing Time 0.055 seconds

A Comparative Study of Feature Extraction Methods for Authorship Attribution in the Text of Traditional East Asian Medicine with a Focus on Function Words (한의학 고문헌 텍스트에서의 저자 판별 - 기능어의 역할을 중심으로 -)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.33 no.2
    • /
    • pp.51-59
    • /
    • 2020
  • Objectives : We would like to study what is the most appropriate "feature" to effectively perform authorship attribution of the text of Traditional East Asian Medicine Methods : The authorship attribution performance of the Support Vector Machine (SVM) was compared by cross validation, depending on whether the function words or content words, single word or collocations, and IDF weights were applied or not, using 'Variorum of the Nanjing' as an experimental Corpus. Results : When using the combination of 'function words/uni-bigram/TF', the performance was best with accuracy of 0.732, and the combination of 'content words/unigram/TFIDF' showed the lowest accuracy of 0.351. Conclusions : This shows the following facts from the authorship attribution of the text of East Asian traditional medicine. First, function words play an important role in comparison to content words. Second, collocations was relatively important in content words, but single words have more important meanings in function words. Third, unlike general text analysis, IDF weighting resulted in worse performance.

A comparative study of Entity-Grid and LSA models on Korean sentence ordering (한국어 텍스트 문장정렬을 위한 개체격자 접근법과 LSA 기반 접근법의 활용연구)

  • Kim, Youngsam;Kim, Hong-Gee;Shin, Hyopil
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.4
    • /
    • pp.301-321
    • /
    • 2013
  • For the task of sentence ordering, this paper attempts to utilize the Entity-Grid model, a type of entity-based modeling approach, as well as Latent Semantic analysis, which is based on vector space modeling, The task is well known as one of the fundamental tools used to measure text coherence and to enhance text generation processes. For the implementation of the Entity-Grid model, we attempt to use the syntactic roles of the nouns in the Korean text for the ordering task, and measure its impact on the result, since its contribution has been discussed in previous research. Contrary to the case of German, it shows a positive result. In order to obtain the information on the syntactic roles, we use a strategy of using Korean case-markers for the nouns. As a result, it is revealed that the cues can be helpful to measure text coherence. In addition, we compare the results with the ones of the LSA-based model, discussing the advantages and disadvantages of the models, and options for future studies.

  • PDF

A Study on the International Research Trend in Education Development focused on Text Network Analysis(2002~2017) (교육개발협력에 관한 국제 학술지 연구 동향 고찰 : 텍스트 네트워크 분석을 중심으로(2002~2017))

  • Kim, Sang-Mi;Kim, Young-Hwan;Cho, Won-Gyeum
    • Korean Journal of Comparative Education
    • /
    • v.28 no.1
    • /
    • pp.1-24
    • /
    • 2018
  • The objective of the article is to find the research trends and the main traits presented in the keywords on abstracts of research articles of "International Journal of Education Development" from 2002 to 2017. To do this, Text Network Analysis(TNA) was applied targeting 966 papers on the journal and the major research outcomes are as follows. First, the frequency analysis on the keywords showed that the keywords like Administration of education program, Schools and instruction, Regional public administration, Educational support service, Elementary education, and Elementary and secondary school were analyzed more than 100 times and also high in centrality degree. Second, the analysis results of the keywords presented in those research articles by development goal periods showed that several new keywords like Elementary education, Elementary and secondary school, Education quality, Secondary education, Educational planning have emerged frequently after SDGs and these keywords showed high in their centrality analysis. Third, the analysis on education level showed that the keywords like Elementary education, Administration of education program, School children were high in frequency and centrality degree in Elementary level. In secondary level, Schools and instruction, Administration of education program, Academic achievement were high, and in high level, college and university was high, respectively.

A Comparative Study between Ubiquitous City Comprehensive Plan and Ubiquitous City Plan - Focusing on U-Service Plan (유비쿼터스도시종합계획과 유비쿼터스도시계획 비교 연구 -U-서비스 계획을 중심으로-)

  • Yoo, Ji Song;Jeong, Da Woon;Yi, Mi Sook;Min, Kyung Ju
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.83-93
    • /
    • 2015
  • U-Services, which are offered from local governments based on their Ubiquitous City Plans, are only focused on facility and urban management services. Also Citizen oriented U-service is only planned. This study's purpose is to propose the implication for provide of the Citizen oriented U-service comparing with U-Service plan of 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' through a network text analysis and word frequency analysis. It was calculated a important keyword that was extracted the service plan contents of the 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' of the four local governments. The network text analysis and keyword frequency analysis was performed through derived keyword. Based on the analysis results, awareness of the citizens can be expected to increase about U-City by activating a excavation of Citizen oriented U-service in a variety of sector through additional services and policy of financial support in the next Ubiquitous City Comprehensive Plan.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

A Comparative Study on the Types and its Importance of Trade Claims between China and the United States: Using Text Mining Techniques (중국과 미국의 무역클레임 유형과 중요도 비교 연구 : 텍스트 마이닝 기법을 활용하여)

  • Cheon Yu;Yun-Seop Hwang
    • Korea Trade Review
    • /
    • v.47 no.3
    • /
    • pp.177-190
    • /
    • 2022
  • This study is designed to identify the differences in the types and importance of trade claims at the national level. For analysis data, abstracts of arbitration and court judgments published on the website of the United Nations Commission on International Trade Law are collected and used. The target countries are China and the United States, with 102 cases from China and 59 cases from the United States. By applying topic modeling techniques to the collection decisions of China and the United States, trade claims are categorized, and the importance of each type is identified using the network centrality index derived through semantic network analysis. The analysis results are as follows. First, the main types of trade claims were the same for both the United States and China: product nonconformity, delivery issues, and payments. However, in China, the order of product nonconformity > delivery issues > payments was important, and in the United States, payments > product nonconformity > delivery issues were found to be important. This study is significant in that it presents a strategic trade claim management plan using a quantitative methodology.

A Comparative Study of Intonation Phrase Boundary Tones of Korean Produced by Korean Speakers and Chinese Speakers in the Reading of Korean Text (중국인 학습자들의 한국어 억양구 경계톤 실현 양상)

  • Yune, Young-Sook
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.39-49
    • /
    • 2010
  • The purpose of this paper is to examine how Chinese speakers realize Korean intonation phrase (IP) boundary tones in the reading of a Korean text. Korean IP boundary tones play various roles in speech communication. They indicate prosodic constituents' boundaries while simultaneously performing pragmatic and grammatical functions. In order to express and understand Korean utterances correctly, it is necessary to understand the Korean IP boundary tone system. To investigate the IP boundary tone produced by Chinese speakers, we have specifically examined the type of boundary tones, the degree of internal pitch modulation of boundary tones, and the pitch difference between penultimate syllables and boundary tones. The results of each analysis were compared to the IP boundary tones produced by Korean native speakers. The results show that IP boundary tones were realized higher than penultimate syllables.

  • PDF

A Comparative Study on the Social Awareness of Metaverse in Korea and China: Using Big Data Analysis (한국과 중국의 메타버스에 관한 사회적 인식의 비교연구: 빅데이터 분석의 활용 )

  • Ki-youn Kim
    • Journal of Internet Computing and Services
    • /
    • v.24 no.1
    • /
    • pp.71-86
    • /
    • 2023
  • The purpose of this exploratory study is to compare the differences in public perceptual characteristics of Korean and Chinese societies regarding the metaverse using big data analysis. Due to the environmental impact of the COVID-19 pandemic, technological progress, and the expansion of new consumer bases such as generation Z and Alpha, the world's interest in the metaverse is drawing attention, and related academic studies have been also in full swing from 2021. In particular, Korea and China have emerged as major leading countries in the metaverse industry. It is a timely research question to discover the difference in social awareness using big data accumulated in both countries at a time when the amount of mentions on the metaverse has skyrocketed. The analysis technique identifies the importance of key words by analyzing word frequency, N-gram, and TF-IDF of clean data through text mining analysis, and analyzes the density and centrality of semantic networks to determine the strength of connection between words and their semantic relevance. Python 3.9 Anaconda data science platform 3 and Textom 6 versions were used, and UCINET 6.759 analysis and visualization were performed for semantic network analysis and structural CONCOR analysis. As a result, four blocks, each of which are similar word groups, were driven. These blocks represent different perspectives that reflect the types of social perceptions of the metaverse in both countries. Studies on the metaverse are increasing, but studies on comparative research approaches between countries from a cross-cultural aspect have not yet been conducted. At this point, as a preceding study, this study will be able to provide theoretical grounds and meaningful insights to future studies.