• Title/Summary/Keyword: 텍스트화

Search Result 989, Processing Time 0.031 seconds

A Study on the Application of Machine Learning in Literary Texts - Focusing on Rule Selection for Speaker Directive Analysis - (문학 텍스트의 머신러닝 활용방안 연구 - 화자 지시어 분석을 위한 규칙 선별을 중심으로 -)

  • Kwon, Kyoungah;Ko, Ilju;Lee, Insung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.313-323
    • /
    • 2021
  • The purpose of this study is to propose rules that can identify the speaker referred by the speaker directive in the text for the realization of a machine learning-based virtual character using a literary text. Through previous studies, we found that when applying literary texts to machine learning, the machine did not properly discriminate the speaker without any specific rules for the analysis of speaker directives such as other names, nicknames, pronouns, and so on. As a way to solve this problem, this study proposes 'nine rules for finding a speaker indicated by speaker directives (including pronouns)': location, distance, pronouns, preparatory subject/preparatory object, quotations, number of speakers, non-characters directives, word compound form, dispersion of speaker names. In order to utilize characters within a literary text as virtual ones, the learning text must be presented in a machine-comprehensible way. We expect that the rules suggested in this study will reduce trial and error that may occur when using literary texts for machine learning, and enable smooth learning to produce qualitatively excellent learning results.

Application Development for Text Mining: KoALA (텍스트 마이닝 통합 애플리케이션 개발: KoALA)

  • Byeong-Jin Jeon;Yoon-Jin Choi;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.21 no.2
    • /
    • pp.117-137
    • /
    • 2019
  • In the Big Data era, data science has become popular with the production of numerous data in various domains, and the power of data has become a competitive power. There is a growing interest in unstructured data, which accounts for more than 80% of the world's data. Along with the everyday use of social media, most of the unstructured data is in the form of text data and plays an important role in various areas such as marketing, finance, and distribution. However, text mining using social media is difficult to access and difficult to use compared to data mining using numerical data. Thus, this study aims to develop Korean Natural Language Application (KoALA) as an integrated application for easy and handy social media text mining without relying on programming language or high-level hardware or solution. KoALA is a specialized application for social media text mining. It is an integrated application that can analyze both Korean and English. KoALA handles the entire process from data collection to preprocessing, analysis and visualization. This paper describes the process of designing, implementing, and applying KoALA applications using the design science methodology. Lastly, we will discuss practical use of KoALA through a block-chain business case. Through this paper, we hope to popularize social media text mining and utilize it for practical and academic use in various domains.

A Document Summary System based on Personalized Web Search Systems (개인화 웹 검색 시스템 기반의 문서 요약 시스템)

  • Kim, Dong-Wook;Kang, Soo-Yong;Kim, Han-Joon;Lee, Byung-Jeong;Chang, Jae-Young
    • Journal of Digital Contents Society
    • /
    • v.11 no.3
    • /
    • pp.357-365
    • /
    • 2010
  • Personalized web search engine provides personalized results to users by query expansion, re-ranking or other methods representing user's intention. The personalized result page includes URL, page title and small text fragment of each web document. which is known as snippet. The snippet is the summary of the document which includes the keywords issued by either user or search engine itself. Users can verify the relevancy of the whole document using only the snippet, easily. The document summary (snippet) is an important information which makes users determine whether or not to click the link to the whole document. Hence, if a search engine generates personalized document summaries, it can provide a more satisfactory search results to users. In this paper, we propose a personalized document summary system for personalized web search engines. The proposed system provides increased degree of satisfaction to users with marginal overhead.

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

Characteristics of Transmedia Contents Textuality and Usage (트랜스미디어 콘텐츠의 텍스트 및 이용 특징)

  • Jeon, Gyong-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.9
    • /
    • pp.243-250
    • /
    • 2010
  • With the help of digital technology, contents production process has become flexible and transmedia situation has been in the rise. Transmedia contents is a new type of contents being based on the various media platform and establishes a consistent story world. They are newly appeared mode of contents in the age of digital convergence. Users reading and watching the several versions of story in the transmedia contents can create a comprehensive contents experience and deepen it. Developing their own aesthetics and meaning of media text, transmedia contents is changing the textuality of digital contents, contents production process and contents using behaviors and contexts.

Analysis of Term Ambiguity based on Genetic Algorithm (유전자 알고리즘 기반 용어 중의성 분석)

  • Kim, Jeong-Joon;Chung, Sung-Taek;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.131-136
    • /
    • 2017
  • Recently, with the development of Internet media, many document materials have become exponentially increasing on the web. These materials are described, and the information on what is the most by this text are classified according. However, the text has meant that many have room for ambiguous interpretation must look at it from various angles in order to interpret them correctly. In conventional classification methods it was simply a classification only have the appearance of the text. In this paper, we analyze it in terms genetic algorithm and local preserving based techniques and implemented a clustering system fragmentation them. Finally, the performance of this paper was evaluated based on the implementation results compared to traditional methods.

The Principle of Dual Semiotic Process in Animation - Within Structuralism Semiotics - (애니메이션의 이중적 기호작용 원리 - 구조주의 기호학의 관점에서 -)

  • Joo Young-Sook;Kim Chee-Yong
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1196-1207
    • /
    • 2006
  • In this paper, study on organization factors and algorithm of semiotics in Animation text within Roland Gerard Barthes's Structurism semiotics theory. It is possible through this approach that we can analyse the effective mechanism which delivers messages(or text) of animation, instead of plain analysis of classical semiotics. and then It will be able to keep watch on the blind viewpoint of the pure aesthetics which does not consider a social duty. In the expression of single sentence by the view of Barthes's semiotics theory, the text of animation is 'one sign has duplex role'. when it is explained another, the animation of mass media is special processing that makes conception and significance. in other words, the order of domination likely natural rule assimilate mass people to itself by the animation of mass media

  • PDF

A Study on the Designation in Korean Traditional Space design Text -Focusing on structural homology of Space Context- (한국 전통공간디자인 텍스트의 지시작용 해석에 관한 연구-컨텍스트의 구조적 유비성을 중심으로-)

  • Park, Kyung-Ae
    • Korean Institute of Interior Design Journal
    • /
    • v.16 no.4
    • /
    • pp.31-38
    • /
    • 2007
  • This study is interested in how philological interpretation of a space text were patterned so as to give the text structural cohesion. A similar philological motivation incorporates some of the notions of generative grammar. Interpretation is the process of recovering the cultural meanings expressed in discourse by analysing the linguistic structures in the light of their interactional and wider social contexts. Viewed in this light, the process of this study is illustrated as follows: At first, this research contains basic concepts of signification of text and context, and theories of spacial text and context of typological structure in terms of Ricoeur's structural Hermeneutics. Secondly, it concretize a logic that traditional space context is inserted in organized attribute like emotion, spirit, nature as character of contemporary space text through typological structure. Finally, from aspect of designation theory among interpretive semantics, it shows that korean contemporary space design is incorporated with typological structure of korean traditional palace spacial context homologically through the case study of I-Hotel space design. Through this process, this study suggest that positivistic interpretation methodology by designation of text is logical thinking of Korean traditional space design.

A SMIL 1.0 Contents Generating Tool Implemented by JAVA (JAVA로 구현한 SMIL 1.0 컨텐츠 생성도구)

  • Song, Jun-Hong;Kim, Se-Young;Lee, Jong-Youl;Kim, Hyun-Hee;Shin, Dong-Kyoo;Shin, Dong-Il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04a
    • /
    • pp.565-568
    • /
    • 2001
  • 고속 인터넷과 멀티미디어 관련 기술의 발달에 따라 통합 멀티미디어 서비스의 기반 구축이 활발히 진행되고 있으며, 이에 따라 단순 이미지, 텍스트만으로 구성되었던 기존의 웹 서비스 환경은 급격히 변화하고 있다. 그러나, 과거의 정적인 텍스트 위주의 고정된 내용을 표현하는 웹 페이지 저작으로는 멀티미디어에 대한 급격히 증대되는 사용자의 요구를 수용할 수 없게 되었다. 이에 따라 1998년도에 시간에 기반을 둔 멀티미디어 데이터의 통합 및 동기화를 위한 효과적인 프리젠테이션(Presentation)을 기술할 수 있는 SMIL(Synchronized Multimedia Integration Language)이 W3C(World Wide Web Consortium)에 의해 제안되었다. SMIL은 XML(eXended Markup Language)에 기반 한 선언적 마크업(Markup) 언어이며 텍스트 편집기 등으로 쉽게 저작할 수 있으나, 태그(Tag)기반 언어이므로 태그의 사용법을 숙지하여야만 효과적인 멀티미디어 프리젠테이션 제작이 가능하다. 이러한 난점을 극복하기 위해 본 논문에서는 Java를 기반으로 한 SMIL 문서 템플릿 기능과 멀티미디어 소스의 미리 보기 기능을 지원하는 SMIL 저작도구의 설계, 구현에 대하여 서술한다.

  • PDF

A Method for Detecting Event-location using Relevant Words Clustering in Tweet (트위터에서의 연관어 군집화를 이용한 이벤트 지역 탐지 기법)

  • Ha, Hyunsoo;Woo, Seungmin;Yim, Junyeob;Hwang, Byung-Yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.680-682
    • /
    • 2015
  • 최근 스마트폰의 보급으로 소셜 네트워크 서비스를 이용하는 사용자들이 급증하였다. 그 중 트위터는 정보의 빠른 전파력과 확산성으로 인해 현실에서 발생한 이벤트를 탐지하는 도구로 활용하는 것이 가능하다. 따라서 트위터 사용자 개개인을 하나의 센서로 가정하고 그들이 작성한 트윗 텍스트를 분석한다면 이벤트 탐지의 도구로써 활용할 수 있다. 이와 관련된 연구들은 이벤트 발생 위치를 추적하기 위해 GPS좌표를 이용하지만 트위터 사용자들이 위치정보 공개에 회의적인 점을 감안하면 명확한 한계점으로 제시될 수 있다. 이에 본 논문에서는 트위터에서 제공하는 위치정보를 이용하지 않고, 트윗 텍스트에서 위치정보를 추적하는 방법을 제시하였다. 트윗 텍스트에서 키워드간의 관계를 고려하여 이벤트의 사실여부를 결정하였으며, 실험을 통해 기존 매체들보다 빠른 탐지를 보임으로써 제안된 시스템의 필요성을 보였다.