• Title/Summary/Keyword: Keywords Extraction

Search Result 139, Processing Time 0.023 seconds

Strategies for the Development of Watermelon Industry Using Unstructured Big Data Analysis

  • LEE, Seung-In;SON, Chansoo;SHIM, Joonyong;LEE, Hyerim;LEE, Hye-Jin;CHO, Yongbeen
    • The Journal of Industrial Distribution & Business
    • /
    • v.12 no.1
    • /
    • pp.47-62
    • /
    • 2021
  • Purpose: Our purpose in this study was to examine the strategies for the development of watermelon industry using unstructured big data analysis. That is, this study was to look the change of issues and consumer's perception about watermelon using big data and social network analysis and to investigate ways to strengthen the competitiveness of watermelon industry based on that. Methodology: For this purpose, the data was collected from Naver (blog, news) and Daum (blog, news) by TEXTOM 4.5 and the analysis period was set from 2015 to 2016 and from 2017-2018 and from 2019-2020 in order to understand change of issues and consumer's perception about watermelon or watermelon industry. For the data analysis, TEXTOM 4.5 was used to conduct key word frequency analysis, word cloud analysis and extraction of metrics data. UCINET 6.0 and NetDraw function of UCINET 6.0 were utilized to find the connection structure of words and to visualize the network relations, and to make a cluster of words. Results: The keywords related to the watermelon extracted such as 'the stalk end of a watermelon', 'E-mart', 'Haman', 'Gochang', and 'Lotte Mart' (news: 015-2016), 'apple watermelon', 'Haman', 'E-mart', 'Gochang', and' Mudeungsan watermelon' (news: 2017-2018), 'E-mart', 'apple watermelon', 'household', 'chobok', and 'donation' (news: 2019-2020), 'watermelon salad', 'taste', 'the heat', 'baby', and 'effect' (blog: 2015-2016), 'taste', 'watermelon juice', 'method', 'watermelon salad', and 'baby' (blog: 2017-2018), 'taste', 'effect', 'watermelon juice', 'method', and 'apple watermelon' (blog: 2019-2020) and the results from frequency and TF-IDF analysis presented. And in CONCOR analysis, appeared as four types, respectively. Conclusions: Based on the results, the authors discussed the strategies and policies for boosting the watermelon industry and limitations of this study and future research directions. The results of this study will help prioritize strategies and policies for boosting the consumption of the watermelon and contribute to improving the competitiveness of watermelon industry in Korea. Also, it is expected that this study will be used as a very important basis for agricultural big data studies to be conducted in the future and this study will offer watermelon producers and policy-makers practical points helpful in crafting tailor-made marketing strategies.

A study on the scope of future oriented work of dental hygienists (치과위생사의 미래지향적 업무 범위에 대한 고찰)

  • Ahn, Eunsuk;Kim, Sun-Mi;Kim, Bo-Ra;Jeong, Soon-Jeong;Hwang, Soo-Jeong;Han, Ji-Hyoung
    • Journal of Korean Academy of Dental Administration
    • /
    • v.8 no.1
    • /
    • pp.15-23
    • /
    • 2020
  • The aim of this study is to identify future-oriented tasks for Korean dental hygienists based on a literature review. A literature search is performed using 14 keywords related to tasks carried out by dental hygienists, and included articles published from 2000 to 2019 in the databases KISS, RISS, DBpia, NDSL, Papersearch, Pubmed, and Google Scholar. Six reviewers assessed the titles and abstracts of articles, and an article was selected if the study was considered to cover future-oriented tasks for Korean dental hygienists. Based on the results six local studies and two foreign studies were used for literature review and data extraction. In total, 38 tasks were classified as future-oriented tasks of dental hygienists according to the following criteria: 1) tasks that were specifically referred to as future-oriented tasks, and 2) tasks that could be classified as future-oriented tasks although no specific reference was made. Of these, the most frequently mentioned tasks were measuring periodontal pocket depth, dental hygiene assessment, providing dietary advice, infiltration anesthesia, and root planing. These were extracted from five of the eight studies, including both local and foreign studies. Dental hygiene planning, emergency, emergency management, and smoking cessation were the next most common tasks based on four studies. Even though some of these future-oriented tasks for Korean dental hygienists are included in the dentistry curriculum, and are currently performed as clinical practice for dental hygienists. Nonetheless, the reference to the legal scope is unclear. It is necessary to reconsider the scope of tasks of dental hygienists to reflect changes in domestic and foreign dental care delivery, thereby contributing to the oral health promotion of the public, where safety is guaranteed under legal protection.

A Study on Follow-up Survey Methodology to Verify the Effectiveness of (<인생나눔교실> 사업의 효과 검증을 위한 추적 조사 방법론 연구 - 2017~2018년도 영상추적조사를 중심으로 -)

  • Lee, Dong Eun
    • Korean Association of Arts Management
    • /
    • no.53
    • /
    • pp.207-247
    • /
    • 2020
  • is a project for the senior generation with humanistic knowledge to become a mentor and communicate with them to present the wisdom and direction of life to the new generations of mentees based on various life experiences. has been expanding since 2015, starting with the pilot operation in 2014. In general, projects such as these are assessed to establish effectiveness indicators to verify effectiveness and to establish project management and development strategies. However, most of the evaluations have been conducted quantitatively and qualitatively based on the short-term duration of the project. Therefore, in the case of continuous projects such as , especially in the field of culture and arts where long-term effectiveness verification is required, the short-term evaluation is difficult to predict and judge the actual meaningful effects. In this regard, tried to examine the qualitative change of key participants in this project through the 2017 and 2018 image tracking survey. For this purpose, we adopted qualitative research methodology through interview video shooting, field shooting, and value coding as a research method suitable for the research subject. To analyze the results, first, the interview images were transcribed, keywords were extracted, value encoding works were matched with human psychological values, and the theoretical method was used to identify changes and to derive the meaning. In fact, despite the fact that the study conducted in this study was a follow-up survey, it remained a limitation that it analyzed the changed pattern in a rather short time of 2 years. However, this study systemized the specific methodology that researchers should conduct for follow-up and provided the flow of research at the present time when there is hardly a model for follow-up in the field of culture and arts education business in Korea as well as abroad. Significance can be derived from this point. In addition, it can be said that it has great significance in preparing the detailed system and case of comparative analysis methodology through value coding.

Methodological Quality Evaluation of a Meta-Analysis Study of Rehabilitation Treatment Interventions for Stroke Patients in Korea Applying AMSTAR-2: Focusing on Upper Extremity Function and Recovery of Daily Life (AMSTAR-2를 적용한 국내 뇌졸중 환자의 재활치료 중재 메타분석 연구의 방법론적 질 평가: 상지기능과 일상생활회복을 중심으로)

  • Hwang, Ho-Sung;Ham, Min-Joo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.660-670
    • /
    • 2022
  • This study was analyzed by applying AMSTAR-2, a methodological quality evaluation tool, to evaluate the quality of domestic meta-analysis papers on rehabilitation interventions for stroke patients. The purpose of this study is to provide guidelines for qualitative improvement of evidence-based practice and meta-analysis research by analyzing the qualitative level of the analyzed research. The literature search was conducted using the Research Information Sharing Service, Korean Medical database, and Korean studies Information Service System. Two authors searched, extracted, and reviewed literature using the keywords 'stroke' and 'meta-analysis'. As a result of the AMSTAR-2 quality evaluation of the final 18 studies, 3 studies (16.67%) were 'Moderate', 8 studies (44.44%) were 'Low', and 7 studies (38.89%) were 'Critically Low'. In future research, scientific and objective data selection and extraction process should be performed. It is expected that interest and efforts to improve the quality of meta-analysis research will continue by referring to the contents analyzed in this study as a way to improve the quality of literature.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Text Mining-Based Emerging Trend Analysis for the Aviation Industry (항공산업 미래유망분야 선정을 위한 텍스트 마이닝 기반의 트렌드 분석)

    • Kim, Hyun-Jung;Jo, Nam-Ok;Shin, Kyung-Shik
      • Journal of Intelligence and Information Systems
      • /
      • v.21 no.1
      • /
      • pp.65-82
      • /
      • 2015
    • Recently, there has been a surge of interest in finding core issues and analyzing emerging trends for the future. This represents efforts to devise national strategies and policies based on the selection of promising areas that can create economic and social added value. The existing studies, including those dedicated to the discovery of future promising fields, have mostly been dependent on qualitative research methods such as literature review and expert judgement. Deriving results from large amounts of information under this approach is both costly and time consuming. Efforts have been made to make up for the weaknesses of the conventional qualitative analysis approach designed to select key promising areas through discovery of future core issues and emerging trend analysis in various areas of academic research. There needs to be a paradigm shift in toward implementing qualitative research methods along with quantitative research methods like text mining in a mutually complementary manner. The change is to ensure objective and practical emerging trend analysis results based on large amounts of data. However, even such studies have had shortcoming related to their dependence on simple keywords for analysis, which makes it difficult to derive meaning from data. Besides, no study has been carried out so far to develop core issues and analyze emerging trends in special domains like the aviation industry. The change used to implement recent studies is being witnessed in various areas such as the steel industry, the information and communications technology industry, the construction industry in architectural engineering and so on. This study focused on retrieving aviation-related core issues and emerging trends from overall research papers pertaining to aviation through text mining, which is one of the big data analysis techniques. In this manner, the promising future areas for the air transport industry are selected based on objective data from aviation-related research papers. In order to compensate for the difficulties in grasping the meaning of single words in emerging trend analysis at keyword levels, this study will adopt topic analysis, which is a technique used to find out general themes latent in text document sets. The analysis will lead to the extraction of topics, which represent keyword sets, thereby discovering core issues and conducting emerging trend analysis. Based on the issues, it identified aviation-related research trends and selected the promising areas for the future. Research on core issue retrieval and emerging trend analysis for the aviation industry based on big data analysis is still in its incipient stages. So, the analysis targets for this study are restricted to data from aviation-related research papers. However, it has significance in that it prepared a quantitative analysis model for continuously monitoring the derived core issues and presenting directions regarding the areas with good prospects for the future. In the future, the scope is slated to expand to cover relevant domestic or international news articles and bidding information as well, thus increasing the reliability of analysis results. On the basis of the topic analysis results, core issues for the aviation industry will be determined. Then, emerging trend analysis for the issues will be implemented by year in order to identify the changes they undergo in time series. Through these procedures, this study aims to prepare a system for developing key promising areas for the future aviation industry as well as for ensuring rapid response. Additionally, the promising areas selected based on the aforementioned results and the analysis of pertinent policy research reports will be compared with the areas in which the actual government investments are made. The results from this comparative analysis are expected to make useful reference materials for future policy development and budget establishment.

    Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

    • Shin, Hyunbo;Kim, Hea-Jin
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.3
      • /
      • pp.179-200
      • /
      • 2019
    • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

    Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

    • Kim, Museong;Kim, Namgyu
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.3
      • /
      • pp.175-197
      • /
      • 2021
    • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

    Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

    • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
      • Journal of Intelligence and Information Systems
      • /
      • v.20 no.2
      • /
      • pp.109-122
      • /
      • 2014
    • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.