• Title/Summary/Keyword: Text Visualization

Search Result 210, Processing Time 0.025 seconds

문서 요약 및 비교분석을 위한 주제어 네트워크 가시화 (Keyword Network Visualization for Text Summarization and Comparative Analysis)

  • 김경림;이다영;조환규
    • 정보과학회 논문지
    • /
    • 제44권2호
    • /
    • pp.139-147
    • /
    • 2017
  • 문자 정보는 인터넷 공간에 통용되는 정보의 대다수를 차지하고 있다. 따라서 대용량의 문서의 의미를 빠르게 특히 자동적으로 파악하는 일은 빅 데이터 시대의 중요한 연구 주제중 하나이다. 이 분야의 대표적인 연구 중 하나는 문서의 의미를 요약해주는 주요 주제어의 자동 추출 및 분석이다. 그러나 단순히 추출된 개별 주제어들의 집합만으로 문서의 의미구조를 나타내기에는 부족함이 있다. 본 논문에서는 추출된 주제어들의 연관관계를 그래프로 표현하여 대상 문서의 의미구조를 보다 다양하게 표시하고 추상화할 수 있는 주제어 가시화 방법을 개발하였다. 먼저 각 주제어들 간의 연관관계를 추출하기 위해 주제어별 지배구간 모델과 단어거리 모델을 제안하였다. 이렇게 추출한 주제어 연결성과 그를 형상화한 그래프는 문서의 의미구조를 보다 함축적으로 담고 있으므로 문서의 빠른 내용파악과 요약이 가능하며 이 가시화 그래프를 비교함으로서 문서의 의미적 유사도 비교도 가능하다. 실험을 통하여 문서의 의미파악과 비교에 본 주제어 가시화 그래프는 일반적인 요약문이나 단순 주제어 리스트보다 더 유용함을 보였다.

Integration of the PubAnnotation ecosystem in the development of a web-based search tool for alternative methods

  • Neves, Mariana
    • Genomics & Informatics
    • /
    • 제18권2호
    • /
    • pp.18.1-18.5
    • /
    • 2020
  • Finding publications that propose alternative methods to animal experiments is an important but time-consuming task since researchers need to perform various queries to literature databases and screen many articles to assess two important aspects: the relevance of the article to the research question, and whether the article's proposed approach qualifies to being an alternative method. We are currently developing a Web application to support finding alternative methods to animal experiments. The current (under development) version of the application utilizes external tools and resources for document processing, and relies on the PubAnnotation ecosystem for annotation querying, annotation storage, dictionary-based tagging of cell lines, and annotation visualization. Currently, our two PubAnnotation repositories for discourse elements contain annotations for more than 110k PubMed documents. Further, we created an annotator for cell lines that contain more than 196k terms from Cellosaurus. Finally, we are experimenting with TextAE for annotation visualization and for user feedback.

생물 의료 정보의 효과적인 텍스트 시각화 (Effective text visualization for biomedical information)

  • 김탁은;박종철
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2007년도 학술대회 1부
    • /
    • pp.399-405
    • /
    • 2007
  • 생물 의료 분야에서 정보의 양이 아주 빠르게 증가하고 있다. 이러한 방대한 양의 정보에서 유용한 정보를 추출하기 위해 텍스트 마이닝 기법을 이용한 연구들이 많이 진행되어 왔다. 그렇지만 이렇게 뽑아진 정보조차 그 양이 방대하고, 또한 텍스트로 되어 있기 때문에 직관적으로 이해하기가 어렵다. 따라서 이러한 정보들을 좀 더 직관적으로 이해하기 위해서는 정보 시각화 시스템이 필수적이다. 최근 들어 이러한 정보 시각화에 대한 연구가 많이 진행되었으나 이러한 시각화 정보조차 너무나 방대하기 때문에 사용자가 필요로 하는 정보를 여과해 주는 방법이 필요하다. 그리고 시각화 시스템에서의 지식 발견을 위한 방법을 제공하여야 한다. 본 논문에서는 생물 의료 정보의 텍스트 시각화에 초점을 맞추어 생물 의료 정보의 효과적인 표현 방법과 지식 발견을 위한 직관적인 인터페이스를 제안하고자 한다.

  • PDF

지능형 항해정보기록 분석 시스템 설계 및 가시화 모듈 개발 (New Scheme for Intelligent Voyage Data Recorder Analysis System and Development of Visualization Module of the VDR)

  • 황일규;이경호;한영수
    • 한국해양공학회지
    • /
    • 제22권5호
    • /
    • pp.126-131
    • /
    • 2008
  • The voyage data is very important for safety of ships, and duty being effectuated by the installation of voyage data record (VDR) on ships. VDR is a black-box, and it contains 14 kinds of voyage data as text. But it is not easy to understand when the accident happened because voyage data is saved as complicated texts. User interface (UI), analysis, visualization system, which works for assist to gather information about situation of accident, was developed. It will be possible to develop onboard ship monitoring and voyage prediction system by the VDR visualization system's development in near future.

Self-Attention 시각화를 사용한 기계번역 서비스의 번역 오류 요인 설명 (Explaining the Translation Error Factors of Machine Translation Services Using Self-Attention Visualization)

  • 장청롱;안현철
    • 한국IT서비스학회지
    • /
    • 제21권2호
    • /
    • pp.85-95
    • /
    • 2022
  • This study analyzed the translation error factors of machine translation services such as Naver Papago and Google Translate through Self-Attention path visualization. Self-Attention is a key method of the Transformer and BERT NLP models and recently widely used in machine translation. We propose a method to explain translation error factors of machine translation algorithms by comparison the Self-Attention paths between ST(source text) and ST'(transformed ST) of which meaning is not changed, but the translation output is more accurate. Through this method, it is possible to gain explainability to analyze a machine translation algorithm's inside process, which is invisible like a black box. In our experiment, it was possible to explore the factors that caused translation errors by analyzing the difference in key word's attention path. The study used the XLM-RoBERTa multilingual NLP model provided by exBERT for Self-Attention visualization, and it was applied to two examples of Korean-Chinese and Korean-English translations.

도서 정보 및 본문 텍스트 통합 마이닝 기반 사용자 맞춤형 도서 큐레이션 시스템 (Personalized Book Curation System based on Integrated Mining of Book Details and Body Texts)

  • 안희정;김기원;김승훈
    • Journal of Information Technology Applications and Management
    • /
    • 제24권1호
    • /
    • pp.33-43
    • /
    • 2017
  • The content curation service through big data analysis is receiving great attention in various content fields, such as film, game, music, and book. This service recommends personalized contents to the corresponding user based on user's preferences. The existing book curation systems recommended books to users by using bibliographic citation, user profile or user log data. However, these systems are difficult to recommend books related to character names or spatio-temporal information in text contents. Therefore, in this paper, we suggest a personalized book curation system based on integrated mining of a book. The proposed system consists of mining system, recommendation system, and visualization system. The mining system analyzes book text, user information or profile, and SNS data. The recommendation system recommends personalized books for users based on the analysed data in the mining system. This system can recommend related books using based on book keywords even if there is no user information like new customer. The visualization system visualizes book bibliographic information, mining data such as keyword, characters, character relations, and book recommendation results. In addition, this paper also includes the design and implementation of the proposed mining and recommendation module in the system. The proposed system is expected to broaden users' selection of books and encourage balanced consumption of book contents.

텍스트마이닝을 활용한 북한 지도자의 신년사 및 연설문 트렌드 연구 (Discovering Meaningful Trends in the Inaugural Addresses of North Korean Leader Via Text Mining)

  • 박철수
    • Journal of Information Technology Applications and Management
    • /
    • 제26권3호
    • /
    • pp.43-59
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korean new year addresses, one of most important and authoritative document publicly announced by North Korean government. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. We propose a procedure to find meaningful tendencies based on a combination of text mining, cluster analysis, and co-occurrence networks. To demonstrate applicability and effectiveness of the proposed procedure, we analyzed the inaugural addresses of Kim Jung Un of the North Korea from 2017 to 2019. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. We found that uncovered semantic structures of North Korean new year addresses closely follow major changes in North Korean government's positions toward their own people as well as outside audience such as USA and South Korea.

지식의 시각화에 의한 창의적 패션디자인 연구 -ATTA 평가항목에 의한 구찌컬렉션을 중심으로- (A Study on Creative Fashion Design by Visualization of Knowledge -Focusing on Gucci Collection by ATTA Evaluation Items-)

  • 김민지
    • 패션비즈니스
    • /
    • 제21권4호
    • /
    • pp.90-104
    • /
    • 2017
  • In a rapidly changing fashion design world, creative ideas are always required. Knowledge has been created as an art, exhibiting a new imagination that surpasses reality, while being visualized from the past. The purpose of this study is to derive types of visualization of knowledge for continuous creation of fashion design. The study consists of literature and empirical studies. ATTA, a creativity evaluation method made by Torrance applied for analyzing of Gucci fashion design collections from 2016-2017. Creativity of the Gucci collection by ATTA evaluation items are that first, a vivid idea was revealed through collection history, myths and animal and plants, second, conceptual incongruity is in the composition of the garment, third, provocative questions are the symbolism of the meaning of the text, fourth, different perspectives derive a new formative beauty through the synthesis of twisted elements, fifth, abstraction is a symbolic expression of animals and plants, sixth, synthesis is a mixture of various materials and techniques by plural inspiration, seventh, context, it is developed as a design accompanied by stories of history and myth and eighth, fantasy is fictitious animals and animals and detail. In addition, formativeness of fashion design by visualization of this knowledge was extracted with contamination, symbolism, enjoyment and fabrication. Visualization of the knowledge is expected to be used as a strategy to attract ongoing ideas for creative fashion designs.

대용량 소스코드 시각화기법 연구 (Visualization Techniques for Massive Source Code)

  • 서동수
    • 컴퓨터교육학회논문지
    • /
    • 제18권4호
    • /
    • pp.63-70
    • /
    • 2015
  • 프로그램 소스코드는 텍스트를 기반으로 하는 정보이며 동시에 논리 구조를 포함하고 있는 복잡한 구문의 집합체이다. 특히 소스코드의 규모가 수만 라인에 이르는 경우 구조적, 논리적인 복잡함으로 인해 기존의 빅데이터 시각화 기법이 잘 적용되기 힘들다는 문제가 발생한다. 본 논문은 소스코드가 갖는 구조적인 특징을 시각화하는데 있어 필요한 절차를 제안한다. 이를 위해 본 논문은 파싱 과정을 거쳐 생성된 추상구문트리를 대상으로 프로그램의 구조특징을 표현하기 위한 자료형의 정의, 함수간 호출관계를 표현한다. 이들 정보를 바탕으로 제어 정보를 네트워크 형태로 시각화함으로써 모듈의 구조적인 특징을 개괄적으로 살펴볼 수 있는 방법을 제시한다. 본 연구의 결과는 대규모 소프트웨어의 구조적 특징을 이해하거나 변경을 관리하는 효과적인 수단으로 활용할 수 있다.

텍스트마이닝을 활용한 미국 대통령 취임 연설문의 트렌드 연구 (Discovering Meaningful Trends in the Inaugural Addresses of United States Presidents Via Text Mining)

  • 조수곤;조재희;김성범
    • 대한산업공학회지
    • /
    • 제41권5호
    • /
    • pp.453-460
    • /
    • 2015
  • Identification of meaningful patterns and trends in large volumes of text data is an important task in various research areas. In the present study, we propose a procedure to find meaningful tendencies based on a combination of text mining, cluster analysis, and low-dimensional embedding. To demonstrate applicability and effectiveness of the proposed procedure, we analyzed the inaugural addresses of the presidents of the United States from 1789 to 2009. The main results of this study show that trends in the national policy agenda can be discovered based on clustering and visualization algorithms.