• 제목/요약/키워드: Text Analysis

검색결과 3,342건 처리시간 0.028초

자동문서분류를 위한 텐서공간모델 기반 심층 신경망 (A Tensor Space Model based Deep Neural Network for Automated Text Classification)

  • 임푸름;김한준
    • 데이타베이스연구회지:데이타베이스연구
    • /
    • 제34권3호
    • /
    • pp.3-13
    • /
    • 2018
  • 자동문서분류(Text Classification)는 주어진 텍스트 문서를 이에 적합한 카테고리로 분류하는 텍스트 마이닝 기술 중의 하나로서 스팸메일 탐지, 뉴스분류, 자동응답, 감성분석, 쳇봇 등 다양한 분야에 활용되고 있다. 일반적으로 자동문서분류 시스템은 기계학습 알고리즘을 활용하며, 이 중에서 텍스트 데이터에 적합한 알고리즘인 나이브베이즈(Naive Bayes), 지지벡터머신(Support Vector Machine) 등이 합리적 수준의 성능을 보이는 것으로 알려져 있다. 최근 딥러닝 기술의 발전에 따라 자동문서분류 시스템의 성능을 개선하기 위해 순환신경망(Recurrent Neural Network)과 콘볼루션 신경망(Convolutional Neural Network)을 적용하는 연구가 소개되고 있다. 그러나 이러한 최신 기법들이 아직 완벽한 수준의 문서분류에는 미치지 못하고 있다. 본 논문은 그 이유가 텍스트 데이터가 단어 차원 중심의 벡터로 표현되어 텍스트에 내재한 의미 정보를 훼손하는데 주목하고, 선행 연구에서 그 효능이 검증된 시멘틱 텐서공간모델에 기반하여 심층 신경망 아키텍처를 제안하고 이를 활용한 문서분류기의 성능이 대폭 상승함을 보인다.

네트워크 텍스트 분석법을 활용한 STEAM 교육의 연구 논문 분석 (Analysis of Articles Related STEAM Education using Network Text Analysis Method)

  • 김방희;김진수
    • 한국초등과학교육학회지:초등과학교육
    • /
    • 제33권4호
    • /
    • pp.674-682
    • /
    • 2014
  • This study aims to analyze STEAM-related articles and to look into the trend of research to present implications for research directions in the future. To achieve the research purpose, the researcher searched by key words, 'STEAM' and 'Convergence Education' through the RISS. Subjects of analysis were titles of 181 articles in journal articles and conference papers published from 2011 through 2013. Through an analysis of the frequency of the texts that appeared in the titles of the papers, key words were selected, the co-occurrence matrix of the key words was established, and using network maps, degree centrality and betweenness centrality, and structural equivalence, a network text analysis was carried out. For the analysis, KrKwic, KrTitle, UCINET and NetMiner Program were used, and the results were as follows: in the result of the text frequency analysis, the key words appeared in order of 'program', 'development', 'base' and 'application'. Through the network among the texts, a network built up with core hubs such as 'program', 'development', 'elementary' and 'application' was found, and in the degree centrality analysis, 'program', 'elementary', 'development' and 'science' comprised key issues at a relatively high value, which constituted the pivot of the network. As a result of the structural equivalence analysis, regarding the types of their respective relations, it was analyzed that there was a similarity in four clusters such as the development of a program (1), analysis of effects (2) and the establishment of a theoretical base (1).

정보처리 관점에서의 서사 텍스트 분석에 관한 연구 - 네 가지 전산적 방법론을 중심으로 (A study on narrative text analysis from the perspective of information processing - focusing on four computational methodologies)

  • 권호창
    • 트랜스-
    • /
    • 제13권
    • /
    • pp.141-169
    • /
    • 2022
  • 서사 텍스트에 대한 분석은 학술적으로나 실용적으로 중요하게 여겨져 왔으며 여러 관점과 방법으로 이루어져 왔다. 이 논문에서는 정보처리 관점에서의 전산적 서사 분석 방법론을 살펴보았다. 정보처리 관점에서 서사의 창작과 수용은 서사 텍스트에 의해 매개된 양방향적 코딩 과정이고, 서사 텍스트는 다층적으로 구조화된 코드라고 할 수 있다. 이 논문에서는 이런 관점을 공유하는 네 가지 방법론 - 캐릭터 네트워크 분석, 텍스트 마이닝과 감성 분석, 사건 구성의 연속성 분석, 서사 에이전트의 지식 분석 -을 사례와 함께 살펴보았다. 이를 통해 서사 분석에 있어 전산적 방법론의 메커니즘과 가능성을 확인하였다. 결론에서는 전산적 서사 분석의 의의와 부작용을 살펴보고, 인문학과 과학기술 통섭에 바탕한 인간-컴퓨터 협업 모델 설계의 필요성을 논의하였다. 이를 통해 미적으로 창의적이고, 윤리적으로 선하며, 정치적으로 진보적이고, 인지적으로 정교한 서사를 보다 효과적으로 만들어 나갈 수 있음을 주장하였다.

Pragmatic Strategies of Self (Other) Presentation in Literary Texts: A Computational Approach

  • Khafaga, Ayman Farid
    • International Journal of Computer Science & Network Security
    • /
    • 제22권2호
    • /
    • pp.223-231
    • /
    • 2022
  • The application of computer software into the linguistic analysis of texts proves useful to arrive at concise and authentic results from large data texts. Based on this assumption, this paper employs a Computer-Aided Text Analysis (CATA) and a Critical Discourse Analysis (CDA) to explore the manipulative strategies of positive/negative presentation in Orwell's Animal Farm. More specifically, the paper attempts to explore the extent to which CATA software represented by the three variables of Frequency Distribution Analysis (FDA), Content Analysis (CA), and Key Word in Context (KWIC) incorporate with CDA decipher the manipulative purposes beyond positive presentation of selfness and negative presentation of otherness in the selected corpus. The analysis covers some CDA strategies, including justification, false statistics, and competency, for positive self-presentation; and accusation, criticism, and the use of ambiguous words for negative other-presentation. With the application of CATA, some words will be analyzed by showing their frequency distribution analysis as well as their contextual environment in the selected text to expose the extent to which they are employed as strategies of positive/negative presentation in the text under investigation. Findings show that CATA software contributes significantly to the linguistic analysis of large data texts. The paper recommends the use and application of the different CATA software in the stylistic and corpus linguistics studies.

텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구 (A Study of 'Emotion Trigger' by Text Mining Techniques)

  • 안주영;배정환;한남기;송민
    • 지능정보연구
    • /
    • 제21권2호
    • /
    • pp.69-92
    • /
    • 2015
  • 최근 소셜 미디어의 사용이 폭발적으로 증가함에 따라 이용자가 직접 생성하는 방대한 데이터를 분석하기 위한 다양한 텍스트 마이닝(text mining) 기법들에 대한 연구가 활발히 이루어지고 있다. 이에 따라 텍스트 분석을 위한 알고리듬(algorithm)의 정확도와 수준 역시 높아지고 있으나, 특히 감성 분석(sentimental analysis)의 영역에서 언어의 문법적 요소만을 적용하는데 그쳐 화용론적 의미론적 요소를 고려하지 못한다는 한계를 지닌다. 본 연구는 이러한 한계를 보완하기 위해 기존의 알고리듬 보다 의미 자질을 폭 넓게 고려할 수 있는 Word2Vec 기법을 적용하였다. 또한 한국어 품사 중 형용사를 감정을 표현하는 '감정어휘'로 분류하고, Word2Vec 모델을 통해 추출된 감정어휘의 연관어 중 명사를 해당 감정을 유발하는 요인이라고 정의하여 이 전체 과정을 'Emotion Trigger'라 명명하였다. 본 연구는 사례 연구(case study)로 사회적 이슈가 된 세 직업군(교수, 검사, 의사)의 특정 사건들을 연구 대상으로 선정하고, 이 사건들에 대한 대중들의 인식에 대해 분석하고자 한다. 특정 사건들에 대한 일반 여론과 직접적으로 표출된 개인 의견 모두를 고려하기 위하여 뉴스(news), 블로그(blog), 트위터(twitter)를 데이터 수집 대상으로 선정하였고, 수집된 데이터는 유의미한 연구 결과를 보여줄 수 있을 정도로 그 규모가 크며, 추후 다양한 연구가 가능한 시계열(time series) 데이터이다. 본 연구의 의의는 키워드(keyword)간의 관계를 밝힘에 있어, 기존 감성 분석의 한계를 극복하기 위해 Word2Vec 기법을 적용하여 의미론적 요소를 결합했다는 점이다. 그 과정에서 감정을 유발하는 Emotion Trigger를 찾아낼 수 있었으며, 이는 사회적 이슈에 대한 일반 대중의 반응을 파악하고, 그 원인을 찾아 사회적 문제를 해결하는데 도움이 될 수 있을 것이다.

A Content Analysis for Website Usefulness Evaluation: Utilizing Text Mining Technique

  • Kwon, Do Young;Jeong, Seung Ryul
    • 인터넷정보학회논문지
    • /
    • 제16권4호
    • /
    • pp.71-81
    • /
    • 2015
  • With the increasing influence of online media, company websites have become important communication channels between companies and customers. Companies use their websites as a marketing tool for a variety of purposes, including enhancing their image and selling products or services. Many researchers have examined the criteria, methods, and tools for website evaluation, but most have focused on usability. Prior content analyses have focused not on text content but on website components, an approach likely to produce subjective evaluations. This study attempts to objectively evaluate company websites by utilizing text mining. We analyze the usefulness of company websites by presenting visualized outputs from a business perspective, allowing practitioners to easily understand the results of the website evaluation and use them in decision making. To demonstrate our method empirically, we selected a company with a number of affiliates in Korea and analyzed the text content of their websites to assess their usefulness using natural language processing and graphics packages in R. Practitioners can easily employ our objective evaluation method, and researchers can use it to gain a new perspective on website evaluation.

R&D Perspective Social Issue Packaging using Text Analysis

  • Wong, William Xiu Shun;Kim, Namgyu
    • 한국IT서비스학회지
    • /
    • 제15권3호
    • /
    • pp.71-95
    • /
    • 2016
  • In recent years, text mining has been used to extract meaningful insights from the large volume of unstructured text data sets of various domains. As one of the most representative text mining applications, topic modeling has been widely used to extract main topics in the form of a set of keywords extracted from a large collection of documents. In general, topic modeling is performed according to the weighted frequency of words in a document corpus. However, general topic modeling cannot discover the relation between documents if the documents share only a few terms, although the documents are in fact strongly related from a particular perspective. For instance, a document about "sexual offense" and another document about "silver industry for aged persons" might not be classified into the same topic because they may not share many key terms. However, these two documents can be strongly related from the R&D perspective because some technologies, such as "RF Tag," "CCTV," and "Heart Rate Sensor," are core components of both "sexual offense" and "silver industry." Thus, in this study, we attempted to discover the differences between the results of general topic modeling and R&D perspective topic modeling. Furthermore, we package social issues from the R&D perspective and present a prototype system, which provides a package of news articles for each R&D issue. Finally, we analyze the quality of R&D perspective topic modeling and provide the results of inter- and intra-topic analysis.

텍스트 내 사건-공간 표현 간 참조 관계 분석을 위한 말뭉치 주석 (Corpus Annotation for the Linguistic Analysis of Reference Relations between Event and Spatial Expressions in Text)

  • 정진우;이희진;박종철
    • 한국언어정보학회지:언어와정보
    • /
    • 제18권2호
    • /
    • pp.141-168
    • /
    • 2014
  • Recognizing spatial information associated with events expressed in natural language text is essential not only for the interpretation of such events and but also for the understanding of the relations among them. However, spatial information is rarely mentioned as compared to events and the association between event and spatial expressions is also highly implicit in a text. This would make it difficult to automate the extraction of spatial information associated with events from the text. In this paper, we give a linguistic analysis of how spatial expressions are associated with event expressions in a text. We first present issues in annotating narrative texts with reference relations between event and spatial expressions, and then discuss surface-level linguistic characteristics of such relations based on the annotated corpus to give a helpful insight into developing an automated recognition method.

  • PDF

Analysis of Social Media Utilization based on Big Data-Focusing on the Chinese Government Weibo

  • Li, Xiang;Guo, Xiaoqin;Kim, Soo Kyun;Lee, Hyukku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권8호
    • /
    • pp.2571-2586
    • /
    • 2022
  • The rapid popularity of government social media has generated huge amounts of text data, and the analysis of these data has gradually become the focus of digital government research. This study uses Python language to analyze the big data of the Chinese provincial government Weibo. First, this study uses a web crawler approach to collect and statistically describe over 360,000 data from 31 provincial government microblogs in China, covering the period from January 2018 to April 2022. Second, a word separation engine is constructed and these text data are analyzed using word cloud word frequencies as well as semantic relationships. Finally, the text data were analyzed for sentiment using natural language processing methods, and the text topics were studied using LDA algorithm. The results of this study show that, first, the number and scale of posts on the Chinese government Weibo have grown rapidly. Second, government Weibo has certain social attributes, and the epidemics, people's livelihood, and services have become the focus of government Weibo. Third, the contents of government Weibo account for more than 30% of negative sentiments. The classified topics show that the epidemics and epidemic prevention and control overshadowed the other topics, which inhibits the diversification of government Weibo.

A Computer-Aided Text Analysis to Explore Recruitment and Intellectual Polarization Strategies in ISIS Media

  • Khafaga, Ayman Farid
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.87-96
    • /
    • 2022
  • This paper employs a computer-aided text analysis (CATA) and a Critical Discourse Analysis (CDA) to explore the strategies of recruitment and intellectual polarization in ISIS (Islamic State in Iraq and Syria) media. The paper's main objective is to shed light on the efficacy of employing computer software in the linguistic analysis of texts, and the extent to which CATA software contribute to deciphering hidden meanings of texts as well as to arrive at concise and authentic results from these texts. More specifically, this paper attempts to demonstrate the contribution of CATA software represented in the two variables of Frequency Distribution Analysis (FDA) and Content Analysis (CA) in decoding the strategies of recruitment and intellectual polarization in one of ISIS 's digital publication: Rumiyah (a digital magazine published by ISIS). The analytical focus is on three strategies of recruitment and intellectual polarization: (i) lexicalization, (ii) intertextual religionisation, and (iii) justification. Two main findings are revealed in this study. First, the application of CATA software into the linguistic investigation of texts contributes effectively to the understanding of the thematic and ideological messages pertaining to the analyzed text. Second, the computational analysis guarantees concise, credible, authentic and ample results than is the case if the analysis is conducted without the work of computer software. The paper, therefore, recommends the integration of CATA software into the linguistic analysis of the various types of texts.