• Title/Summary/Keyword: unstructured data

Search Result 717, Processing Time 0.028 seconds

The Study of Docent System Improvement for Revitalization of Science Museum (과학관 활성화를 위한 도슨트 제도 개선 연구)

  • Park, Young-Shin;Lee, Jung-Hwa
    • Journal of the Korean earth science society
    • /
    • v.33 no.2
    • /
    • pp.200-215
    • /
    • 2012
  • The revitalization of science museum depends on the number of qualified docents who can meet the museum visitors' educational needs. However, the current unstructured docent system is not sufficient to meet the goal. Forty six docents currently working in science museums were surveyed about docent training program, current working conditions, and docent professional program in order to propose a viable system providing a docent profession. Data were collected through surveys with 46 docents, interviews with two experienced docents, and several artifacts from the science museum and selected docents. The surveys consisted of 47 items asking about personal biography, docent's perception, docents training program they took, current working conditions, and supplementary professional program. The conclusion of this study is as follows; First, there must be recognition about docents who can play educator's roles which are different from those of general volunteers in terms of recruiting and training system in science museum. Second, docents need to take training and supplementary professional courses that focus on observing and educating visitors in the field. Third, we need a docent management system by employing a well structured evaluating tools. A well established docent system will bring forth the enhancement of science museum education and the increase of science popularization by providing visitors with the quality educational services.

Long-term Effect of the 5-Day Stop-Smoking School (5일 금연학교의 장기적 효과에 관한 연구)

  • Kim Seon Ae
    • Journal of Korean Public Health Nursing
    • /
    • v.12 no.1
    • /
    • pp.103-115
    • /
    • 1998
  • As the studies that smoking can be a major cause to various diseases have been made, many following researches on the outcome of stop-smoking education were in progress. Even though researches based on the knowledge about smoking and status about the teenagers were prevalent, the research based on the outcomes in long time basis were not in progress. Therefore, I tracked the people who went through 5-Day Stop-Smoking School that has taught through complexed structure of behavioral, intellectual, and psychological education. I made researches on the average of success and the hardest point during their efforts to stop just to show the necessity of going through re-education. The objectives of this study were the ones who have completed the education on the years 1990, 1991. 47 were selected from 364 people that completed the training, and who were able to be contacted on the phone line. This study was conducted from 27 Oct. to 7 Nov. 1997 through verbal interviews based on the questionnaire. The questionnaire used here was made by myself, assisted by my professor. Analysis was made through unstructured open questions. The data was analyzed using SPSS program. The major results were as follows ; 1) General characteristics of the objectives are $97.0\%$ were male, $17\%$ ages below 40s, $34\%$ in the age group of 40s, and $48.9\%$ over 50s. Religiously christian 340/0, buddhist $19.1\%$, no religion or any other reason $46.8\%$. Status married $93.6\%$, unmarried $6.4\%$. There is someone smoking in the family $36.2\%$, no one smokes $63.8\%$. Reputation salary men $55.3\%$, personal business $27.7\%$. 2) The average of success is $42.6\%(20/47)$, the failure is $57.4\%$. 3) The results from the study 'When was the hardest point in the process of stop smoking' : For the successors the first week $33.3\%$, after the first week $66.7\%$. For the failures the first week $55\%$, after first week $45\%$(Statistics not precisely done), the most effective element that helped through the hardest point was the family $40\%$, personal determination $30\%$. 4) The necessity of re-education : Successors needed $55\%$, not needed $45\%$. Failures needed $48.1\%$, not needed $51.9\%$(Statistics not precisely done). The perfect time for reeducation : Successors in 6 months $50\%$, irregular time basis $50\%$. Failures in six months $36.4\%$, after six months $27.3\%$, irregular time basis $36.4\%$(Statistics not precisely done). Synthesizing the result of the study can't generalize the long-term effect of the stop-smoking due to the number of the objectives,. but recognize the fact that 47.6 have experienced success, and also the self-determination and the support from the family are desirable. Seeing the fact that both are great motivation to stop smoking. Since the first week is necessary. The necessity of re-education is rather high, so this education should be planned to be done repeatedly in a long term along with close observation, instead of short education.

  • PDF

Fintech Trends and Mobile Payment Service Anlaysis in Korea: Application of Text Mining Techniques (국내 핀테크 동향 및 모바일 결제 서비스 분석: 텍스트 마이닝 기법 활용)

  • An, JungKook;Lee, So-Hyun;An, Eun-Hee;Kim, Hee-Woong
    • Informatization Policy
    • /
    • v.23 no.3
    • /
    • pp.26-42
    • /
    • 2016
  • Recently, with the rapid growth of the O2O market, Fintech combining the finance and ICT technology is drawing attention as innovation to lead "O2O of finance", along with Fintech-based payment, authentication, security technology and related services. For new technology industries such as Fintech, technical sources, related systems and regulations are important but previous studies on Fintech lack in-depth research about systems and technological trends of the domestic Fintech industry. Therefore, this study aims to analyze domestic Fintech trends and find the insights for the direction of technology and systems of the future domestic Fintech industry by comparing Kakao Pay and Samsung Pay, the two domestic representative mobile payment services. By conducting a complete enumeration survey about the tweets mentioning Fintech until June 2016, this study visualized topics extraction, sensitivity analysis and keyword analyses. According to the analysis results, it was found that various topics have been created in the technologies and systems between 2014 and 2016 and different keywords and reactions were extracted between topics of Samsung Pay based on "devices" such as Galaxy and Kakao Pay based on "service" such as KakaoTalk. This study contributes to analyzing the unstructured data of social media by period by using social media mining and quantifying the expectations and reactions of consumers to services through the sentiment analysis. It is expected to be the foundation of Fintech industry development by presenting a strategic direction to Fintech related practitioners.

The Effect of Engineering Design Based Ocean Clean Up Lesson on STEAM Attitude and Creative Engineering Problem Solving Propensity (공학설계기반 오션클린업(Ocean Clean-up) 수업이 STEAM태도와 창의공학적 문제해결성향에 미치는 효과)

  • DongYoung Lee;Hyojin Yi;Younkyeong Nam
    • Journal of the Korean earth science society
    • /
    • v.44 no.1
    • /
    • pp.79-89
    • /
    • 2023
  • The purpose of this study was to investigate the effects of engineering design-based ocean cleanup classes on STEAM attitudes and creative engineering problem-solving dispositions. Furthermore, during this process, we tried to determine interesting points that students encountered in engineering design-based classes. For this study, a science class with six lessons based on engineering design was developed and reviewed by a professor who majored in engineering design, along with five engineering design experts with a master's degree or higher. The subject of the class was selected as the design and implementation of scientific and engineering measures to reduce marine pollution based on the method implemented in an actual Ocean Clean-up Project. The engineering design process utilized the engineering design model presented by NGSS (2013), and was configured to experience redesign through the optimization process. To verify effectiveness, the STEAM attitude questionnaire developed by Park et al. (2019) and the creative engineering problemsolving propensity test tool developed by Kang and Nam (2016) were used. A pre and post t-test was used for statistical analysis for the effectiveness test. In addition, the contents of interesting points experienced by the learners were transcribed after receiving descriptive responses, and were analyzed and visualized through degree centrality analysis. Results confirmed that engineering design in science classes had a positive effect on both STEAM attitude and creative engineering problem-solving disposition (p< .05). In addition, as a result of unstructured data analysis, science and engineering knowledge, engineering experience, and cooperation and collaboration appeared as factors in which learners were interested in learning, confirming that engineering experience was the main factor.

Comparative analysis of informationattributes inchemical accident response systems through Unstructured Data: Spotlighting on the OECD Guidelines for Chemical Accident Prevention, Preparedness, and Response (비정형 데이터를 이용한 화학물질 사고 대응 체계 정보속성 비교 분석 : 화학사고 예방, 대비 및 대응을 위한 OECD 지침서를 중심으로)

  • YongJin Kim;Chunghyun Do
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The importance of manuals is emphasized because chemical accidents require swift response and recovery, and often result in environmental pollution and casualties. In this regard, the OECD revised OECD Guidelines for the Prevention, Preparedness, and Response to Chemical Accidents (referred to as the OECD Guidelines), in June 2023. Moreover, while existing research primarily raises awareness about chemical accidents, highlighting the need for a system-wide response including laws, regulations, and manuals, it was difficult to find comparative research on the attributes of manuals. So, this paper aims to compare and analyze the second and third editions of the OECD Guidelines, in order to uncover the information attributes and implications of the revised version. Specifically, TF-IDF (Term Frequency-Inverse Document Frequency) was applied to understand which keywords have become more important, and Word2Vec was applied to identify keywords that were used similarly and those that were differentiated. Lastly, a 2×2 matrix was proposed, identifying the topics within each quadrant to provide a deeper comparison of the information attributes of the OECD Guidelines. This study offers a framework to help researchers understand information attributes. From a practical perspective, it appears valuable for the revision of standard manuals by domestic government agencies and corporations related to chemistry.

Sentiment Analysis of News Based on Generative AI and Real Estate Price Prediction: Application of LSTM and VAR Models (생성 AI기반 뉴스 감성 분석과 부동산 가격 예측: LSTM과 VAR모델의 적용)

  • Sua Kim;Mi Ju Kwon;Hyon Hee Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.5
    • /
    • pp.209-216
    • /
    • 2024
  • Real estate market prices are determined by various factors, including macroeconomic variables, as well as the influence of a variety of unstructured text data such as news articles and social media. News articles are a crucial factor in predicting real estate transaction prices as they reflect the economic sentiment of the public. This study utilizes sentiment analysis on news articles to generate a News Sentiment Index score, which is then seamlessly integrated into a real estate price prediction model. To calculate the sentiment index, the content of the articles is first summarized. Then, using AI, the summaries are categorized into positive, negative, and neutral sentiments, and a total score is calculated. This score is then applied to the real estate price prediction model. The models used for real estate price prediction include the Multi-head attention LSTM model and the Vector Auto Regression model. The LSTM prediction model, without applying the News Sentiment Index (NSI), showed Root Mean Square Error (RMSE) values of 0.60, 0.872, and 1.117 for the 1-month, 2-month, and 3-month forecasts, respectively. With the NSI applied, the RMSE values were reduced to 0.40, 0.724, and 1.03 for the same forecast periods. Similarly, the VAR prediction model without the NSI showed RMSE values of 1.6484, 0.6254, and 0.9220 for the 1-month, 2-month, and 3-month forecasts, respectively, while applying the NSI led to RMSE values of 1.1315, 0.3413, and 1.6227 for these periods. These results demonstrate the effectiveness of the proposed model in predicting apartment transaction price index and its ability to forecast real estate market price fluctuations that reflect socio-economic trends.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

Emoticon by Emotions: The Development of an Emoticon Recommendation System Based on Consumer Emotions (Emoticon by Emotions: 소비자 감성 기반 이모티콘 추천 시스템 개발)

  • Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.227-252
    • /
    • 2018
  • The evolution of instant communication has mirrored the development of the Internet and messenger applications are among the most representative manifestations of instant communication technologies. In messenger applications, senders use emoticons to supplement the emotions conveyed in the text of their messages. The fact that communication via messenger applications is not face-to-face makes it difficult for senders to communicate their emotions to message recipients. Emoticons have long been used as symbols that indicate the moods of speakers. However, at present, emoticon-use is evolving into a means of conveying the psychological states of consumers who want to express individual characteristics and personality quirks while communicating their emotions to others. The fact that companies like KakaoTalk, Line, Apple, etc. have begun conducting emoticon business and sales of related content are expected to gradually increase testifies to the significance of this phenomenon. Nevertheless, despite the development of emoticons themselves and the growth of the emoticon market, no suitable emoticon recommendation system has yet been developed. Even KakaoTalk, a messenger application that commands more than 90% of domestic market share in South Korea, just grouped in to popularity, most recent, or brief category. This means consumers face the inconvenience of constantly scrolling around to locate the emoticons they want. The creation of an emoticon recommendation system would improve consumer convenience and satisfaction and increase the sales revenue of companies the sell emoticons. To recommend appropriate emoticons, it is necessary to quantify the emotions that the consumer sees and emotions. Such quantification will enable us to analyze the characteristics and emotions felt by consumers who used similar emoticons, which, in turn, will facilitate our emoticon recommendations for consumers. One way to quantify emoticons use is metadata-ization. Metadata-ization is a means of structuring or organizing unstructured and semi-structured data to extract meaning. By structuring unstructured emoticon data through metadata-ization, we can easily classify emoticons based on the emotions consumers want to express. To determine emoticons' precise emotions, we had to consider sub-detail expressions-not only the seven common emotional adjectives but also the metaphorical expressions that appear only in South Korean proved by previous studies related to emotion focusing on the emoticon's characteristics. We therefore collected the sub-detail expressions of emotion based on the "Shape", "Color" and "Adumbration". Moreover, to design a highly accurate recommendation system, we considered both emotion-technical indexes and emoticon-emotional indexes. We then identified 14 features of emoticon-technical indexes and selected 36 emotional adjectives. The 36 emotional adjectives consisted of contrasting adjectives, which we reduced to 18, and we measured the 18 emotional adjectives using 40 emoticon sets randomly selected from the top-ranked emoticons in the KakaoTalk shop. We surveyed 277 consumers in their mid-twenties who had experience purchasing emoticons; we recruited them online and asked them to evaluate five different emoticon sets. After data acquisition, we conducted a factor analysis of emoticon-emotional factors. We extracted four factors that we named "Comic", Softness", "Modernity" and "Transparency". We analyzed both the relationship between indexes and consumer attitude and the relationship between emoticon-technical indexes and emoticon-emotional factors. Through this process, we confirmed that the emoticon-technical indexes did not directly affect consumer attitudes but had a mediating effect on consumer attitudes through emoticon-emotional factors. The results of the analysis revealed the mechanism consumers use to evaluate emoticons; the results also showed that consumers' emoticon-technical indexes affected emoticon-emotional factors and that the emoticon-emotional factors affected consumer satisfaction. We therefore designed the emoticon recommendation system using only four emoticon-emotional factors; we created a recommendation method to calculate the Euclidean distance from each factors' emotion. In an attempt to increase the accuracy of the emoticon recommendation system, we compared the emotional patterns of selected emoticons with the recommended emoticons. The emotional patterns corresponded in principle. We verified the emoticon recommendation system by testing prediction accuracy; the predictions were 81.02% accurate in the first result, 76.64% accurate in the second, and 81.63% accurate in the third. This study developed a methodology that can be used in various fields academically and practically. We expect that the novel emoticon recommendation system we designed will increase emoticon sales for companies who conduct business in this domain and make consumer experiences more convenient. In addition, this study served as an important first step in the development of an intelligent emoticon recommendation system. The emotional factors proposed in this study could be collected in an emotional library that could serve as an emotion index for evaluation when new emoticons are released. Moreover, by combining the accumulated emotional library with company sales data, sales information, and consumer data, companies could develop hybrid recommendation systems that would bolster convenience for consumers and serve as intellectual assets that companies could strategically deploy.

A Diagnostic Study on High School Students' Health and Quality of Life - Based on the PRECEDE model - (고등학생의 건강 및 삶의 질에 대한 진단적 연구 - PRECEDE 모형을 근간으로 -)

  • Yoo Jae-Soon;Hong Yeo-Shin
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.3
    • /
    • pp.78-98
    • /
    • 1997
  • Health education, as the most fundamental concept for national health promotion, alms for developing the self-care ability of the general public. High school days are regarded as the period when most important physical, mental and social developments occur, and most health-related behaviors are formed. School health education is one of the major learning resources influencing health potential in the home and community as well as for the individual student. High school health education in Korea has a fundamental systemic flaw in that health-related subjects are divided and taught under various subjects areas at school. In order to achieve the goal of school health education, it is essential to make a systematic assessment of the learner's concerns connected with his health and life, and the factors affecting them. So far, most of the research projects that had been carried out for improving high school health education were limited in their concerns to a particular aspect of health. Even though some had been done in view of comprehensive school health education, they failed to Include a health assessment of the learner. Therefore, in this study the high school students' concerns related to health and life were investigated in the first place on the basis of the PRECEDE model, developed by Green and others for the purpose of a comprehensive diagnostic research on high school health education. This study was done in two steps : one was the basic study for developing research instrument and the other was the main one. The former was conducted at five high schools in Seoul and Cheongju for 2 months-beginning in March, 1996. The students were asked to respond to questions related to their health and lives in unstructured open-ended question forms. On the basis of analysis of the basic study, the diagnostic instruments for the quality of life, health problems, health behavior and educational factors were constructed to be used for the collection of data for main study. An expert panel and the pilot study were used to improve content validity and reliability of the instruments. The reliability of the instruments was measured at between .7697 and .9611 by the Cronbach $\alpha$. The data for this study were collected from the sample consisted of the junior and senior classes of twenty general and vocational high schools in Seoul and Cheongju for two months period beginning in July, 1996. In analyzing the data, both t-test and $X^2$-test were done by using SAS-$PC^+$ Program to compare data between the sexes of the high school students and the types of high school. A canonical correlation analysis was carried out to determine the relationships among the diagnostic variables, and a multivariate multiple regression analysis was conducted by using LISREL 8.03 to ascertain the influences of variables on the high school students' health and quality of life. The results were as follows : 1) The findings of the hypothesis tests (1) The canonical correlation between the educational diagnosis variables and behavioral, epidemiological, social diagnosis variables was .7221, which was significant at the level of p<.001. (2) The canonical correlation between the educational diagnosis variables and the behavior variables was .6851, which also was significant (p<.001). (3) The canonical correlation between the behavioral diagnosis variables and the epidemiological variables was 4295, which was significant (p<.001). (4) The canonical correlation between the epidemiological diagnosis variables and the social variables was .6005, which was also significant (p<.001). Therefore, the relationship between each diagnosis variable suggested by the PRECEDE model had been experimentally proven to be valid, supporting the conceptual framework of the study as appropriate for assessing the multi-dimensional factors affecting high school students' health and quality of life. Health behavior self-efficacy, the level of parents' interest and knowledge of health, and the level of the perception of school health education, all of which are the educational diagnostic variables, are the most influential variables in students' health and quality of life. In particular, health behavior self-efficacy, a causative factor, was one of the main influential variables in their health and quality of life. Other diagnostic variables suggested in the steps of the PRECEDE model were found to have reciprocal relations rather than a unidirectional causative relationship. The significance of this research is that it has diagnosed the needs of high school health education by the learner-centered assessment of variety of factors related to the health and the life of the students. This research findings suggest an integrated system of school health education to be contrived to enhance the effectiveness of the education by strengthening the influential factors such as self-efficacy to improve the health and quality of the lives of high school students.

  • PDF

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.