• Title/Summary/Keyword: Two-step cluster analysis

Search Result 54, Processing Time 0.023 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

The Relationship between Driving Behavior, Driving Anger, and Ambivalence Over Emotional Expressiveness in an Anonymous Situation (익명상황의 운전행동과 운전분노 및 정서표현갈등과의 관계)

  • Bo Young Yun ;Soon Chul Lee
    • Korean Journal of Culture and Social Issue
    • /
    • v.17 no.3
    • /
    • pp.321-341
    • /
    • 2011
  • This study examines how anonymity between drivers affects aggressive driving and why, in an anonymous situation, some drive aggressively and others do not. Two surveys were conducted. The first survey covered 200 participants and found that people are more likely to drive aggressively in an anonymous situation than in a face-to-face situation. The second survey covered 384 participants with a history of aggressive driving and found that these aggressive drivers could be classified into three groups using a two-step cluster analysis. Drivers who often exhibit aggressive driving in anonymous situations were found in the second questionnaire to have a high tendency towards driving anger and towards ambivalence over emotional expressiveness. The tendency towards self-defensive ambivalence factor, one of the factors in the ambivalence over emotional expressiveness questionnaire, was also found to be high. Individuals who tended to drive aggressively in an anonymous situation were found to be susceptible to driving anger, usually faced ambivalence over emotional expressiveness, and typically were indecisive. The results of this study suggest that rather than intensifying the enforcement of traffic regulations, a better remedy for those who drive recklessly would be to have them undertake some candid self-reflection.

  • PDF

Implication of the Ratio of Exchangeable Cations in Mountain Wetlands (산지습지 치환성 양이온 함량비의 특성과 함의)

  • Shin, Young Ho;Kim, Sung Hwan;Rhew, Hosahang
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.2
    • /
    • pp.221-244
    • /
    • 2014
  • We suggested several implications by examining geochemical properties of sediments in Simjeok, Jangdo, and Hwaeomneup mountain wetlands which are natural preservation areas. Geochemical properties of wetland sediments show that all wetlands were included in the type of fens, but their distribution patterns were different from one another. We classified three sub-groups of sediments using the two step cluster analysis on the ratio of exchangeable cations. Wetland sediments can be grouped into Ca-dominated, Mg-dominated, and K-dominated types. Simjeok wetland have Ca-dominated sediments, while the sediments of Jangdo wetland indicate the Mg-dominated and Ca-dominated characteristics. Hwaeomneup wetland is composed of K-dominated sediment mainly. Different properties in the ratio are affected by various environmental factors such as geological, pedological, and vegetational settings. Because these geochemical properties will be affected by climate change and human impacts, these will be environmental indicator in mountain wetlands and be used in wetland management. This scheme can be used for classification of mountain wetlands. Therefore, we should work on geochemical properties of wetland sediments and classification schemes based on geochemical properties not only to widen understanding in geomorphic system or ecosystem of mountain wetlands but to conserve mountain wetlands properly.

  • PDF

A Study on the Job Productivity by the Smart Work Investment - Focused on the Organizational Change Resistance and the Communication - (스마트워크 투자에 따른 직무 생산성에 관한 연구 - 조직 변화저항과 의사소통을 중심으로-)

  • Jung, Byoung-Ho
    • Management & Information Systems Review
    • /
    • v.37 no.3
    • /
    • pp.83-113
    • /
    • 2018
  • The purpose of this study to empirically examine a smart work investment and job performance by change resistance. Firstly, There investigates mediating role of the communication between the smart work investment and the job performance. Secondly, It will identify the job productivity differences through a level of organizational change resistance that reduced smart work investment. The smart work is to provide the flexibility of time and location and is a working method to improve a work productivity of organization members. The introduction of smart work means the adoption of new organizational culture, institution and technology and requires a novel change of a custom and pattern on existing organization culture and institution because of transformation form of communication and collaboration. The method of this study adopts a structural equation model to test a mediating effect of communication and a moderating effect of change resistance level. This model confirms whether smart work investments provide a positive impact on communication and organizational productivity. In addition, I will classify a change resistance level of smart work by cluster analysis and then check a critical path difference of job productivity between each group. As a result, The organizational IT, institution and culture on the smart work investment appeared to important influencers in communication and also had a direct influence of individual performance. Also, The three independent variables of smart work investment have an indirect influence of individual and organizational performance through communication mediating variables. However, the organizational IT and institution as independent variables do not provide direct influence of organization performance. Nevertheless, two independent variables of organizational IT and institution have an indirect influence the organization performance through communication mediating variables. As a result of confirming a productivity of three groups on organization resistance, there was a difference the individual and organizational performance among groups. The low-level group of organizational resistance showed high coefficient value of performance compared to other groups. The group analysis implications, The smart work investment appeared significantly to revise the institution first, build culture secondly and advanced technology lastly. The theoretical implication from this study contributes an extension of social science theory through socio-technical systems, institution, culture, change resistance and job performance based on smart work. The practical implications explain the smart work success in step-by-step investment rather than radical investment as level management of change resistance. In future research, the smart work performance between private and public firms will analyze a difference of the organizational culture, institution, technology and performance.