• Title/Summary/Keyword: Academic Text

Search Result 356, Processing Time 0.028 seconds

Investigation on the Effect of Multi-Vector Document Embedding for Interdisciplinary Knowledge Representation

  • Park, Jongin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.21 no.1
    • /
    • pp.99-116
    • /
    • 2020
  • Text is the most widely used means of exchanging or expressing knowledge and information in the real world. Recently, researches on structuring unstructured text data for text analysis have been actively performed. One of the most representative document embedding method (i.e. doc2Vec) generates a single vector for each document using the whole corpus included in the document. This causes a limitation that the document vector is affected by not only core words but also other miscellaneous words. Additionally, the traditional document embedding algorithms map each document into only one vector. Therefore, it is not easy to represent a complex document with interdisciplinary subjects into a single vector properly by the traditional approach. In this paper, we introduce a multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. After introducing the previous study on multi-vector document embedding, we visually analyze the effects of the multi-vector document embedding method. Firstly, the new method vectorizes the document using only predefined keywords instead of the entire words. Secondly, the new method decomposes various subjects included in the document and generates multiple vectors for each document. The experiments for about three thousands of academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the multi-vector based method, we ascertained that the information and knowledge in complex documents can be represented more accurately by eliminating the interference among subjects.

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

An Analysis of Key Elements for FinTech Companies Based on Text Mining: From the User's Review (텍스트 마이닝 기반의 자산관리 핀테크 기업 핵심 요소 분석: 사용자 리뷰를 바탕으로)

  • Son, Aelin;Shin, Wangsoo;Lee, Zoonky
    • The Journal of Information Systems
    • /
    • v.29 no.4
    • /
    • pp.137-151
    • /
    • 2020
  • Purpose Domestic asset management fintech companies are expected to grow by leaps and bounds along with the implementation of the "Data bills." Contrary to the market fever, however, academic research is insufficient. Therefore, we want to analyze user reviews of asset management fintech companies that are expected to grow significantly in the future to derive strengths and complementary points of services that have been provided, and analyze key elements of asset management fintech companies. Design/methodology/approach To analyze large amounts of review text data, this study applied text mining techniques. Bank Salad and Toss, domestic asset management application services, were selected for the study. To get the data, app reviews were crawled in the online app store and preprocessed using natural language processing techniques. Topic Modeling and Aspect-Sentiment Analysis were used as analysis methods. Findings According to the analysis results, this study was able to derive the elements that asset management fintech companies should have. As a result of Topic Modeling, 7 topics were derived from Bank Salad and Toss respectively. As a result, topics related to function and usage and topics on stability and marketing were extracted. Sentiment Analysis showed that users responded positively to function-related topics, but negatively to usage-related topics and stability topics. Through this, we were able to extract the key elements needed for asset management fintech companies.

A Study on a Landscape Color Analysis according to Regional Environment - Centering on Damyang County, Jeollnamdo - (지역 환경에 따른 경관 색채분석에 관한 연구 - 전라남도 담양군을 중심으로 -)

  • Choi, Seong-Kyung;Moon, Jung-Min
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.4
    • /
    • pp.146-154
    • /
    • 2012
  • As Damyang has preserved both beautiful natural environment and tradition very well, it needs colors which can coexist with Damyang while preserving it as it is rather than colorful and refined colors. However, the present Damyang deteriorates the quality of beautiful natural scenes by chaotic uses of colors. Therefore, colors which can represent symbolism based on the present colors of Damyang should be used so that everyone can be pleased with them. Finally, the basic colors decided were classified into main, supplement and highlight colors in consideration of characteristics of each scene and they were effectively arranged based on the colors decided. If such colors and color schemes are properly applied according to characteristics of scenes, ecological, historical, cultural and traditional scenes of Damyang can be preserved consistently. Academic literature uses the abstract to succinctly communicate complex research. An abstract may act as a stand-alone entity instead of a full paper. As such, an abstract is used by many organizations as the basis for selecting research that is proposed for presentation in the form of a poster, platform/oral presentation or workshop presentation at an academic conference. Most literature database search engines index only abstracts rather than providing the entire text of the paper. Full texts of scientific papers must often be purchased because of copyright and/or publisher fees and therefore the abstract is a significant selling point for the reprint or electronic version of the full-text. Abstracts are protected under copyright law just as any other form of written speech is protected. However, publishers of scientific articles invariably make abstracts publicly available, even when the article itself is protected by a toll barrier. For example, articles in the biomedical literature are available publicly from medline which is accessible through design. It is a common misconception that the abstracts in medline provide sufficient information for medical practitioners, students, scholars and patients. The abstract can convey the main results and conclusions of a scientific article but the full text article must be consulted for details of the methodology.

  • PDF

A study on research trends for gestational diabetes mellitus and breastfeeding: Focusing on text network analysis and topic modeling (임신성 당뇨와 모유수유에 대한 연구 동향 분석: 텍스트네트워크 분석과 토픽모델링 중심)

  • Lee, Junglim;Kim, Youngji;Kwak, Eunju;Park, Seungmi
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.27 no.2
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: The aim of this study was to identify core keywords and topic groups in the 'Gestational diabetes mellitus (GDM) and Breastfeeding' field of research for better understanding research trends in the past 20 years. Methods: This was a text-mining and topic modeling study composed of four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building a co-occurrence matrix, and 4) analyzing network features and clustering topic groups. Results: A total of 635 papers published between 2001 and 2020 were found in databases (Web of Science, CINAHL, RISS, DBPIA, RISS, KISS). Among them, 3,639 words extracted from 366 articles selected according to the conditions were analyzed by text network analysis and topic modeling. The most important keywords were 'exposure', 'fetus', 'hypoglycemia', 'prevention' and 'program'. Six topic groups were identified through topic modeling. The main topics of the study were 'cardiovascular disease' and 'obesity'. Through the topic modeling analysis, six themes were derived: 'cardiovascular disease', 'obesity', 'complication prevention strategy', 'support of breastfeeding', 'educational program' and 'management of GDM'. Conclusion: This study showed that over the past 20 years many studies have been conducted on complications such as cardiovascular diseases and obesity related to gestational diabetes and breastfeeding. In order to prevent complications of gestational diabetes and promote breastfeeding, various nursing interventions, including gestational diabetes management and educational programs for GDM pregnancies, should be developed in nursing fields.

A Study on the Majinhwiseong (麻疹彙成), a Medical Text on Measles Written by Joseon physician Lee Wonpung (조선 의원 이원풍(李元豊)의 마진 의서, 『마진휘성(麻疹彙成)』연구)

  • OH, Chaekun
    • Journal of Korean Medical classics
    • /
    • v.35 no.3
    • /
    • pp.41-58
    • /
    • 2022
  • Objectives : In this paper, the outline and overall content of the Majinhwiseong, a specialized medical text on measles written by Lee Wonpung was introduced, along with its academic historical meaning. Methods : The entire Majinhwiseong was analyzed according to content and form. In terms of form, organization, construction, cited literature, etc., were studied, while in terms of content, diagnosis of disease pattern and treatment formulas were studied. Later, based on cited medical texts and the author's social position, the academic historical meaning of this book was discussed. Results : Through the Majinhwiseong, Lee Wonpung strengthened the credibility of the text by not only providing medical knowledge on measles but listing their sources and comparing and analyzing related contents. In the diagnosis part, Lee focused on the changes in symptom, shape, color, and pulse of measles, discussing in detail its differential diagnostic methods. In the treatment part, while listing numerous formulas suggested by Ming (明) masters, Lee did not leave out treatment experiences of Joseon physicians. Meanwhile, the Majinhwiseong is indicative of measles medicine in 18th century Joseon having been progressed in the private sector rather than the official, and how the results of private sector medicine were being absorbed into the official realm through the Uiyakdongcham (議藥同參) system. Conclusions : The Majinhwiseong is a practical treatment manual written by clinician Lee Wonpung to deal measles which was widely spread at the time. The author organized existing medical knowledge on measles for clinicians while reflecting outcomes and medical situation of Joseon physicians in this book. Based on these findings, we could verify that medicine in 18th century Joseon had been progressing actively around the private medical sector.

A study on research trends for pregnancy in adolescence: Focusing on text network analysis and topic modeling (청소년 임신에 대한 연구 동향 분석: 텍스트 네트워크 분석과 토픽 모델링)

  • Park, Seungmi;Kwak, Eunju;Park, Hye Ok;Hong, Jung Eun
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.30 no.2
    • /
    • pp.149-159
    • /
    • 2024
  • Purpose: The aim of this study was to identify core keywords and topic groups in the "adolescent pregnancy" field of research for a better understanding of research trends in the past 10 years. Methods: Topics related to adolescent pregnancy were extracted from 3,819 articles that were published in journals between January 2013 and July 2023. Abstracts were retrieved from five databases (MEDLINE, CINAHL, Embase, RISS, and KISS). Keywords were extracted from the abstracts and cleaned using semantic morphemes. Text network analysis and topic modeling were performed using NetMiner 4.3.3. Results: The most important keywords were "health," "woman," "risk," "group," "girl," "school," "service," "family," "program," and "contraception." Five topic groups were identified through topic modeling. Through the topic modeling analysis, five themes were derived: "health service," "community program for school girls," "risks for adult women," "relationship risks," and "sexual contraceptive knowledge." Conclusion: This study utilized text network analysis and topic modeling to analyze keywords from abstracts of research conducted over the past decade on adolescent pregnancy. Given that adolescent pregnancy leads to physical, mental, social, and economic issues, it is imperative to provide integrated intervention programs, including prenatal/postnatal care, psychological services, proper contraception methods, and sex education, through school and community partnerships, as well as related research studies. Nurses can play a vital role by actively engaging in prevention efforts and directly supporting and educating socially disadvantaged adolescent mothers, which could significantly contribute to improving their quality of life.

A Rule-based Approach to Identifying Citation Text from Korean Academic Literature (한국어 학술 문헌의 본문 인용문 인식을 위한 규칙 기반 방법)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.43-60
    • /
    • 2012
  • Identifying citing sentences from article full-text is a prerequisite for creating a variety of future academic information services such as citation-based automatic summarization, automatic generation of review articles, sentiment analysis of citing statements, information retrieval based on citation contexts, etc. However, finding citing sentences is not easy due to the existence of implicit citing sentences which do not have explicit citation markers. While several methods have been proposed to attack this problem for English, it is difficult to find such automatic methods for Korean academic literature. This article presents a rule-based approach to identifying Korean citing sentences. Experiments show that the proposed method could find 30% of implicit citing sentences in our test data in nearly 70% precision.

A Study on the Smart Tourism Awareness through Bigdata Analysis

  • LEE, Song-Yi;LEE, Hwan-Soo
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.5
    • /
    • pp.45-52
    • /
    • 2020
  • Purpose: In the 4th industrial revolution, services that incorporate various smart technologies in the tourism sector have begun to gain popularity. Accordingly, academic discussions on smart tourism have also started to become active in various fields. Despite recent research, the definition of smart tourism is still ambiguous, and it is not easy to differentiate its scope or characteristics from traditional tourism concepts. Thus, this study aims to analyze the perception of smart tourism exposed online to identify the current point of smart tourism in Korea and present the research direction for conceptualizing smart tourism suitable for the domestic situation. Research design, data, and methodology: This study analyzes the perception of smart tourism exposed online based on 20,198 news data from portal sites over the past six years. Data on words used with smart tourism were collected from the leading portal sites Naver, Daum, and Google. Text mining techniques were applied to identify the social awareness status of smart tourism. Network analysis was used to visualize the results between words related to smart tourism, and CONCOR analysis was conducted to derive clusters formed by words having similarity. Results: As a result of keyword analysis, the frequency of words related to the development and construction of smart tourism areas was high. The analysis of the centrality of the connection between words showed that the frequency of keywords was similar, and that the words "smartphones" and "China" had relatively high connection centrality. The results of network analysis and CONCOR indicated that words were formed into eight groups including related technologies, promotion, globalization, service introduction, innovation, regional society, activation, and utilization guide. The overall results of data analysis showed that the development of smart tourism cities was a noticeable issue. Conclusions: This study is meaningful in that it clearly reflects the differences in the perception of smart tourism between online and research trends despite various efforts to develop smart tourism in Korea. In addition, this study highlights the need to understand smart tourism concepts and enhance academic discussions. It is expected that such academic discussions will contribute to improving the competitiveness of smart tourism research in Korea.

Meta Analysis of Trade Insurance Using Text Mining (텍스트 마이닝을 활용한 무역보험분야의 메타분석)

  • Hyun-Hee Park;Sung-Je Cho
    • Korea Trade Review
    • /
    • v.45 no.6
    • /
    • pp.157-179
    • /
    • 2020
  • This study presented the results of meta-analysis through topic modeling among the papers published in the Journal of the International Trade Association for the purpose of presenting academic research trends in the field of trade insurance and future research directions. Among the total 2,010 papers included in the Journal of the Korea International Trade Association, the analyzed paper covers the subject of trade-related insurance. According to detailed topics, 33 marine insurance (42.31%), 16 export insurance (20.51%), 11 hull insurance (14.10%), and 18 others (23.08%), and 4 other products liability insurance. According to the empirical analysis results, Topic 1 was classified as marine insurance, airworthiness, notice obligation, and collateral, and Topic 2 was derived as a representative topic for loading insurance, emergency risk, and immunity as export insurance. And Topic 3 was classified as vessel, sinking and container in relation to ship insurance, and Topic 4 was analyzed as an important topic such as manufacture and British marine insurance. Through the analysis results, we selected the representative topic used for the trade insurance topic and looked at the status of major research. Trade insurance is an area that requires the development of more theoretical and practical research subjects as an optimal risk management means in international trade transactions. To this end, first, support from the Korea International Trade Association is needed to establish a continuous research subject sharing system for the development of research subjects in the field of trade insurance. Second, academic journal operation management must be continuously managed in which academic research papers can be submitted and published.