• Title/Summary/Keyword: 텍스트 데이터 분석

Search Result 1,104, Processing Time 0.032 seconds

The Study on Data Governance Research Trends Based on Text Mining: Based on the publication of Korean academic journals from 2009 to 2021 (텍스트 마이닝을 활용한 데이터 거버넌스 연구 동향 분석: 2009년~2021년 국내 학술지 논문을 중심으로)

  • Jeong, Sun-Kyeong
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.133-145
    • /
    • 2022
  • As a result of the study, the poorest keywords were information, big data, management, policy, government, law, and smart. In addition, as a result of network analysis, related research was being conducted on topics such as data industry policy, data governance performance, defense, governance, and data public. The four topics derived through topic modeling were "DG policy," "DG platform," "DG in laws," and "DG implementation," of which research related to "DG platform" showed an increasing trend, and "DG implementation" tended to shrink. This study comprehensively summarized data governance-related studies. Data governance needs to expand research areas from various perspectives and related fields such as data management and data integration policies at the organizational level, and related technologies. In the future, we can expand the analysis targets for overseas data governance and expect follow-up studies on research directions and policy directions in industries that require data-based future industries such as Industry 4.0, artificial intelligence, and Metaverse.

Application Development for Text Mining: KoALA (텍스트 마이닝 통합 애플리케이션 개발: KoALA)

  • Byeong-Jin Jeon;Yoon-Jin Choi;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.21 no.2
    • /
    • pp.117-137
    • /
    • 2019
  • In the Big Data era, data science has become popular with the production of numerous data in various domains, and the power of data has become a competitive power. There is a growing interest in unstructured data, which accounts for more than 80% of the world's data. Along with the everyday use of social media, most of the unstructured data is in the form of text data and plays an important role in various areas such as marketing, finance, and distribution. However, text mining using social media is difficult to access and difficult to use compared to data mining using numerical data. Thus, this study aims to develop Korean Natural Language Application (KoALA) as an integrated application for easy and handy social media text mining without relying on programming language or high-level hardware or solution. KoALA is a specialized application for social media text mining. It is an integrated application that can analyze both Korean and English. KoALA handles the entire process from data collection to preprocessing, analysis and visualization. This paper describes the process of designing, implementing, and applying KoALA applications using the design science methodology. Lastly, we will discuss practical use of KoALA through a block-chain business case. Through this paper, we hope to popularize social media text mining and utilize it for practical and academic use in various domains.

Global Text & Local Text Integration Method for Aspect-Based Sentiment Analysis (개체단위 감정분석을 위한 글로벌 텍스트&로컬 텍스트 통합 방법)

  • Lin, Te;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.414-416
    • /
    • 2022
  • 개체단위 감정분석(Aspect-Based Sentiment Analysis)는 자연어 처리에서 중요한 연구분야이다. 이는 입력 문장중에 존재하는 aspect term 의 감정 극성을 분석하는 것이 목적이다. 이 분야에서 현재 많이 사용되는 모델은 대부분 로컬 텍스트 또는 로컬 덱스트와 aspect term 사이의 관계에 주목하고 있다. 로켈 텍스트에 비해 글로벌 텍스트는 로컬 텍스트 뒤에 aspect term 내용을 추가해서 문장중에 있는 aspect term 내용을 더 깊게 학습할 수 있다고 생각한다. 본 논문에서는 새로운 masked attention 메커니즘을 사용하고 attention 메커니즘의 입력으로 글로벌 텍스트중에 있는 로컬 텍스트를 가로채어 전체 글로벌 텍스트의 내용과 융합한다. 이 방법은 semeval2014 데이터 셋에서 매우 좋은 결과를 얻었다.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

A study of Corpus Annotation for Aspect Based Sentiment Analysis of Korean financial texts (한국어 경제 도메인 텍스트 속성 기반 감성 분석을 위한 말뭉치 주석 요소 연구)

  • Seoyoon Park;Yeonji Jang;Yejee Kang;Hyerin Kang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.232-237
    • /
    • 2022
  • 본 논문에서는 미세 조정(fine-tuning) 및 비지도 학습 기법을 사용하여 경제 분야 텍스트인 금융 리포트에 대해 속성 기반 감성 분석(aspect-based sentiment analysis) 데이터셋을 반자동적으로 구축할 수 있는 방법론에 대한 연구를 수행하였다. 구축 시에는 속성기반 감성분석 주석 요소 중 극성, 속성 카테고리 정보를 부착하였으며, 미세조정과 비지도 학습 기법인 BERTopic을 통해 주석 요소를 자동적으로 부착하는 한편 이를 수동으로 검수하여 데이터셋의 완성도를 높이고자 하였다. 데이터셋에 대한 실험 결과, 극성 반자동 주석의 경우 기존에 구축된 데이터셋과 비슷한 수준의 성능을 보였다. 한편 정성적 분석을 통해 자동 구축을 동일하게 수행하였더라도 기술의 원리와 발달 정도에 따라 결과가 상이하게 달라짐을 관찰함으로써 경제 도메인의 ABSA 데이터셋 구축에 여전히 발전 여지가 있음을 확인할 수 있었다.

  • PDF

A Method of Constructing Large-Scale Train Set Based on Sentiment Lexicon for Improving the Accuracy of Deep Learning Model (딥러닝 모델의 정확도 향상을 위한 감성사전 기반 대용량 학습데이터 구축 방안)

  • Choi, Min-Seong;Park, Sang-Min;On, Byung-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.106-111
    • /
    • 2018
  • 감성분석(Sentiment Analysis)은 텍스트에 나타난 감성을 분석하는 기술로 자연어 처리 분야 중 하나이다. 한국어 텍스트를 감성분석하기 위해 다양한 기계학습 기법이 많이 연구되어 왔으며 최근 딥러닝의 발달로 딥러닝 기법을 이용한 감성분석도 활발해지고 있다. 딥러닝을 이용해 감성분석을 수행할 경우 좋은 성능을 얻기 위해서는 충분한 양의 학습데이터가 필요하다. 하지만 감성분석에 적합한 학습데이터를 얻는 것은 쉽지 않다. 본 논문에서는 이와 같은 문제를 해결하기 위해 기존에 구축되어 있는 감성사전을 활용한 대용량 학습데이터 구축 방안을 제안한다.

  • PDF

100 K-Poison: Poisonous Texts Resistance Test Dataset For Korean Generative Models (100 K-Poison: 한국어 생성 모델을 위한 독성 텍스트 저항력 검증 데이터셋 )

  • Li Fei;Yejee Kang;Seoyoon Park;Yeonji Jang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.149-154
    • /
    • 2023
  • 본고는 한국어 생성 모델의 독성 텍스트 저항 능력을 검증하기 위해 'CVALUE' 데이터셋에서 추출한 고난도 독성 질문-대답 100쌍을 바탕으로 한국어 생성 모델을 위한 '100 K-Poison' 데이터셋을 시범적으로 구축했다. 이 데이터셋을 토대로 4가지 대표적인 한국어 생성 모델 'ZeroShot TextClassifcation'과 'Text Generation7 실험을 진행함으로써 현재 한국어 생성 모델의 독성 텍스트 식별 및 응답 능력을 종합적으로 고찰했고, 모델 간의 독성 텍스트 저항력 격차 현상을 분석했으며, 앞으로 한국어 생성 모델의 독성 텍스트 식별 및 웅대 성능을 한층 더 강화하기 위한 '이독공독(以毒攻毒)' 학습 전략을 새로 제안하였다.

  • PDF

Aspect-based Sentiment Analysis on Cosmetics Customer Reviews (감성 분석 화장품 사용자 리뷰에 대한 속성기반 감성분석)

  • Heewon Jeong;Young-Seob Jeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.13-16
    • /
    • 2024
  • 온라인상에 인간의 감성을 담은 리뷰 데이터가 꾸준히 축적되어왔다. 이 텍스트 데이터를 분석하고 활용하는 일은 마케팅에 있어서 중요한 자산이 될 것이다. 이와 관련된 Aspect-Based Sentiment Analysis(ABSA) 연구는 한글에 있어서는 데이터 부족을 이유로 거의 선행연구가 없는 실정이다. 본 연구에서는 최근 공개된 데이터 셋을 바탕으로 하여 화장품 도메인에 대한 소비자들의 리뷰 텍스트와 사전 라벨링 된 속성, 감성 극성을 기반으로 ABSA를 진행한다. Klue RoBERTa base 모델을 활용하여 데이터를 학습시키고, Python Kiwipiepy 등으로 전처리한 결과를 대시보드로 시각화하여 분석하기 쉬운 환경을 마련하는 방법을 제시한다.

  • PDF

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

The Frequency Analysis of Teacher's Emotional Response in Mathematics Class (수학 담화에서 나타나는 교사의 감성적 언어 빈도 분석)

  • Son, Bok Eun;Ko, Ho Kyoung
    • Communications of Mathematical Education
    • /
    • v.32 no.4
    • /
    • pp.555-573
    • /
    • 2018
  • The purpose of this study is to identify the emotional language of math teachers in math class using text mining techniques. For this purpose, we collected the discourse data of the teachers in the class by using the excellent class video. The analysis of the extracted unstructured data proceeded to three stages: data collection, data preprocessing, and text mining analysis. According to text mining analysis, there was few emotional language in teacher's response in mathematics class. This result can infer the characteristics of mathematics class in the aspect of affective domain.