• Title/Summary/Keyword: Semantic discovery

Search Result 73, Processing Time 0.02 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Connections among Hohoche, Hoche, and Bongoae, and the Interpretation of Book of Changes (『주역』의 괘체와 해석 - 호호체(互互體)·호체(互體)·본괘(本卦)의 상관성과 『주역』 해석 -)

  • choi, yeen-young
    • The Journal of Korean Philosophical History
    • /
    • no.53
    • /
    • pp.215-254
    • /
    • 2017
  • In the study of divining art, Hoche(互體) plays vital roles in the composition of Goaes(卦) and interpretation of Sutras, but the reality is that the research effort for Hoche has been poor along with the perception of its utilization. This study set out to investigate connections among Bongoae(本卦), Hoche, and Hoche of Hoche(Hohoche互互體) and thus shed new light on the importance of Hoche in the composition and interpretation of Yi. 64 Goaes belong to 16 Hoches, which should belong to 4 Hoches. 4 Hoches Jungcheongeon(重天乾), Jungjigon(重地坤), Suhwagije(水火旣濟), and Hwasumije (火水未濟). That is, one can make 6 Hoikgoaes(劃卦) by extracting Hoche from the 6 Hoikgoaes comprised of Hoche of Bongoae, and they converge on 4 Goaes of Geon (乾), Gon(坤), Gije(旣濟), and Mije(未濟). The present study named Hoche of Hoche Hohoche and argued that there should be some consistent connections in the interpretation of meanings of these 4 Hohoche Goaes and their respective Hoches and Bongoaes. Focusing on the discovery of common meanings among the Hoches and Bongoaes of "Danjeon(彖傳)" and" Daesangjeon(大象傳)" of Hohoche. Book of Changes begin with Jungcheongeon and Jungjigon and end with Suhwagije and Hwasumije. The Hohoche of 64 Goaes(卦) are concluded into these 4 Goaes, which indicates that the 4 Goaes have supervision over beginning and ending and that all the Goaes between them operate within the categories of 4 Goaes. The content of "Danjeon" and "Daesangjeon" in Hohoche holds certain semantic connections in the interpretation of Hoche and Bongoae restored to Hohoche and points to the directionality of the interpretation. Those findings open a window for investigating Hoche with the core principle of Goae formation in Book of Changes and imply that Hoche holds important significance in the interpretation of the Book.

A Semantic Study on the Soundscape of the Historic Downtown of Daejeon - Focusing on the Bells of Daeheung-dong Cathedral and Enhang-dong Sungsimdang - (대전 원도심 소리풍경에 관한 의미론적 연구 - 대흥동 성당과 은행동 성심당 종소리를 중심으로 -)

  • Kim, Myeong-Shin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.2
    • /
    • pp.64-75
    • /
    • 2022
  • The purpose of this study is to illuminate the meaning of the soundscapes of two bells, Daeheung-dong Cathedral and Sungsimdang in Eunghang-dong, which are landmarks and attractions in the historic downtown of Daejeon. The study was conducted through field research and recordings, as well as literature studies of related documents and soundscape theory. Daejeon city was developed along with Daejeon Railway Station during the Japanese colonial period in the early 20th century. As the Chungnam Provincial Office moved to Daejeon, Daeheung-dong and Eunhang-dong in Jung-gu, located near Daejeon Station, developed significantly and formed the city centre. As major administrative agencies moved to Seo-gu in the 1990s, the downtown area of Daejeon was on a path of decline, and the decline accelerated with the development of Sejong city. Meanwhile, Daeheung-dong Cathedral and Sungsimdang, founded by refugees during the Korean War, firmly protected the historic downtown area of Daejeon, where the natives left. Daeheung-dong Cathedral, established during the Japanese colonial period, is a local landmark with a history of 100 years in 2019. Sungsimdang, which was created with the backdrop of the Korean War, is also a historical and cultural asset with a history of 60 years and a local landmark selected as the No. 1 tourist attraction in Daejeon. This research, which started from the sound of the bells of Daeheung-dong Cathedral, heard even in the neighboring residential areas, led to the discovery of the bells of Sungsimdang in Eunhang-dong, located across the street. In this paper, the bells of Daeheung-dong Cathedral and Eunhang-dong Sungsimdang have characteristics of soundmarks according to R. Murray Schafer's soundscape sound category. Furthermore, this paper attempted to analyze the meaning of the two bells according to the relatively recent EU soundscape definition. These two bells are signal sounds at the surface level, but are the sound marks of the historic downtown area of Daejeon at the deep level. Although there are outward differences in size, scale, frequency, and famousness, these two bells share a meaning in terms of locality and good influence with the historicity and spatiality of a special relationship. The implication of this study is that the two places should be preserved as local historical and cultural assets not only as visual landmarks but also as sound marks in the urban regeneration or urban development of Jung-gu, Daejeon.