• Title/Summary/Keyword: topic modeling analysis

Search Result 694, Processing Time 0.026 seconds

Exploring Dynamics of Information Systems Research Trend Using Text Mining Approach (텍스트 마이닝 기법을 이용한 정보시스템 분야 연구 동향 분석)

  • Jungkook An;Sodam Kim;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.18 no.3
    • /
    • pp.73-96
    • /
    • 2016
  • Recent research on information and communication technology and Internet-of-Things indicates that convergence and integration facilitate the development of various technologies. Similarly, related academic theories and technologies have also gained attention. This paradigm shift facilitated the convergence and integration of academic disciplines. In particular, information systems have become initiators of change. However, only a limited number of studies have been conducted on information systems. To address this gap, this study explores the future direction of information systems based on the core concepts and results of the comparative analysis conducted on research trends. We considered 48,102 data obtained from international top journals from 1980 to 2015. We analyzed journal titles, authors, abstracts, and keywords. We conducted the network analysis on existing collaborative studies and performed comparative analysis to visualize the results. The results provide an in-depth understanding of information systems and provides directions for future research on this area.

A Comparative Analysis of Social Commerce and Open Market Using User Reviews in Korean Mobile Commerce (사용자 리뷰를 통한 소셜커머스와 오픈마켓의 이용경험 비교분석)

  • Chae, Seung Hoon;Lim, Jay Ick;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.53-77
    • /
    • 2015
  • Mobile commerce provides a convenient shopping experience in which users can buy products without the constraints of time and space. Mobile commerce has already set off a mega trend in Korea. The market size is estimated at approximately 15 trillion won (KRW) for 2015, thus far. In the Korean market, social commerce and open market are key components. Social commerce has an overwhelming open market in terms of the number of users in the Korean mobile commerce market. From the point of view of the industry, quick market entry, and content curation are considered to be the major success factors, reflecting the rapid growth of social commerce in the market. However, academics' empirical research and analysis to prove the success rate of social commerce is still insufficient. Henceforward, it is to be expected that social commerce and the open market in the Korean mobile commerce will compete intensively. So it is important to conduct an empirical analysis to prove the differences in user experience between social commerce and open market. This paper is an exploratory study that shows a comparative analysis of social commerce and the open market regarding user experience, which is based on the mobile users' reviews. Firstly, this study includes a collection of approximately 10,000 user reviews of social commerce and open market listed Google play. A collection of mobile user reviews were classified into topics, such as perceived usefulness and perceived ease of use through LDA topic modeling. Then, a sentimental analysis and co-occurrence analysis on the topics of perceived usefulness and perceived ease of use was conducted. The study's results demonstrated that social commerce users have a more positive experience in terms of service usefulness and convenience versus open market in the mobile commerce market. Social commerce has provided positive user experiences to mobile users in terms of service areas, like 'delivery,' 'coupon,' and 'discount,' while open market has been faced with user complaints in terms of technical problems and inconveniences like 'login error,' 'view details,' and 'stoppage.' This result has shown that social commerce has a good performance in terms of user service experience, since the aggressive marketing campaign conducted and there have been investments in building logistics infrastructure. However, the open market still has mobile optimization problems, since the open market in mobile commerce still has not resolved user complaints and inconveniences from technical problems. This study presents an exploratory research method used to analyze user experience by utilizing an empirical approach to user reviews. In contrast to previous studies, which conducted surveys to analyze user experience, this study was conducted by using empirical analysis that incorporates user reviews for reflecting users' vivid and actual experiences. Specifically, by using an LDA topic model and TAM this study presents its methodology, which shows an analysis of user reviews that are effective due to the method of dividing user reviews into service areas and technical areas from a new perspective. The methodology of this study has not only proven the differences in user experience between social commerce and open market, but also has provided a deep understanding of user experience in Korean mobile commerce. In addition, the results of this study have important implications on social commerce and open market by proving that user insights can be utilized in establishing competitive and groundbreaking strategies in the market. The limitations and research direction for follow-up studies are as follows. In a follow-up study, it will be required to design a more elaborate technique of the text analysis. This study could not clearly refine the user reviews, even though the ones online have inherent typos and mistakes. This study has proven that the user reviews are an invaluable source to analyze user experience. The methodology of this study can be expected to further expand comparative research of services using user reviews. Even at this moment, users around the world are posting their reviews about service experiences after using the mobile game, commerce, and messenger applications.

Using GA based Input Selection Method for Artificial Neural Network Modeling Application to Bankruptcy Prediction (유전자 알고리즘을 활용한 인공신경망 모형 최적입력변수의 선정: 부도예측 모형을 중심으로)

  • 홍승현;신경식
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.227-249
    • /
    • 2003
  • Prediction of corporate failure using past financial data is a well-documented topic. Early studies of bankruptcy prediction used statistical techniques such as multiple discriminant analysis, logit and probit. Recently, however, numerous studies have demonstrated that artificial intelligence such as neural networks can be an alternative methodology for classification problems to which traditional statistical methods have long been applied. In building neural network model, the selection of independent and dependent variables should be approached with great care and should be treated as model construction process. Irrespective of the efficiency of a teaming procedure in terms of convergence, generalization and stability, the ultimate performance of the estimator will depend on the relevance of the selected input variables and the quality of the data used. Approaches developed in statistical methods such as correlation analysis and stepwise selection method are often very useful. These methods, however, may not be the optimal ones for the development of neural network model. In this paper, we propose a genetic algorithms approach to find an optimal or near optimal input variables fur neural network modeling. The proposed approach is demonstrated by applications to bankruptcy prediction modeling. Our experimental results show that this approach increases overall classification accuracy rate significantly.

  • PDF

Estimation of channel morphology using RGB orthomosaic images from drone - focusing on the Naesung stream - (드론 RGB 정사영상 기반 하도 지형 공간 추정 방법 - 내성천 중심으로 -)

  • Woo-Chul, KANG;Kyng-Su, LEE;Eun-Kyung, JANG
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.4
    • /
    • pp.136-150
    • /
    • 2022
  • In this study, a comparative review was conducted on how to use RGB images to obtain river topographic information, which is one of the most essential data for eco-friendly river management and flood level analysis. In terms of the topographic information of river zone, to obtain the topographic information of flow section is one of the difficult topic, therefore, this study focused on estimating the river topographic information of flow section through RGB images. For this study, the river topography surveying was directly conducted using ADCP and RTK-GPS, and at the same time, and orthomosiac image were created using high-resolution images obtained by drone photography. And then, the existing developed regression equations were applied to the result of channel topography surveying by ADCP and the band values of the RGB images, and the channel bathymetry in the study area was estimated using the regression equation that showed the best predictability. In addition, CCHE2D flow modeling was simulated to perform comparative verification of the topographical informations. The modeling result with the image-based topographical information provided better water depth and current velocity simulation results, when it compared to the directly measured topographical information for which measurement of the sub-section was not performed. It is concluded that river topographic information could be obtained from RGB images, and if additional research was conducted, it could be used as a method of obtaining efficient river topographic information for river management.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Analysis on Dynamics of Korea Startup Ecosystems Based on Topic Modeling (토픽 모델링을 활용한 한국의 창업생태계 트렌드 변화 분석)

  • Heeyoung Son;Myungjong Lee;Youngjo Byun
    • Knowledge Management Research
    • /
    • v.23 no.4
    • /
    • pp.315-338
    • /
    • 2022
  • In 1986, Korea established legal systems to support small and medium-sized start-ups, which becomes the main pillars of national development. The legal systems have stimulated start-up ecosystems to have more than 1 million new start-up companies founded every year during the past 30 years. To analyze the trend of Korea's start-up ecosystem, in this study, we collected 1.18 million news articles from 1991 to 2020. Then, we extracted news articles that have the keywords "start-up", "venture", and "start-up". We employed network analysis and topic modeling to analyze collected news articles. Our analysis can contribute to analyzing the government policy direction shown in the history of start-up support policy. Specifically, our analysis identifies the dynamic characteristics of government influenced by external environmental factors (e.g., society, economy, and culture). The results of our analysis suggest that the start-up ecosystems in Korea have changed and developed mainly by the government policies for corporation governance, industrial development planning, deregulation, and economic prosperity plan. Our frequency keyword analysis contributes to understanding entrepreneurial productivity attributed to activities among the networked components in industrial ecosystems. Our analyses and results provide practitioners and researchers with practical and academic implications that can help to establish dedicated support policies through forecast tasks of the economic environment surrounding the start-ups. Korean entrepreneurial productivity has been empowered by growing numbers of large companies in the mobile phone industry. The spectrum of large companies incorporates content startups, platform providers, online shopping malls, and youth-oriented start-ups. In addition, economic situational factors contribute to the growth of Korean entrepreneurial productivity the economic, which are related to the global expansions of the mobile industry, and government efforts to foster start-ups. Our research is methodologically implicative. We employ natural language processes for 30 years of media articles, which enables more rigorous analysis compared to the existing studies which only observe changes in government and policy based on a qualitative manner.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Comparative Analysis of Consumer Needs for Products, Service, and Integrated Product Service : Focusing on Amazon Online Reviews (제품, 서비스, 융합제품서비스의 소비자 니즈 비교 분석 :아마존 온라인 리뷰를 중심으로)

  • Kim, Sungbum
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.316-330
    • /
    • 2020
  • The study analyzes reviews of hardware products, customer service products, and products that take the form of a convergence of hardware and cloud services in ICT using text mining. We derive keywords of each review and find the differentiation of words that are used to derive topics. A cluster analysis is performed to categorize reviews into their respective clusters. Through this study, we observed which keywords are most often used for each product type and found topics that express the characteristics of products and services using topic modeling. We derived keywords such as "professional" and "technician" which are topics that suggest the excellence of the service provider in the review of service products. Further, we identified adjectives with positive connotations such as "favorite", "fine", "fun", "nice", "smart", "unlimited", and "useful" from Amazon Eco review, an integrated product and service. Using the cluster analysis, the entire review was clustered into three groups, and three product type reviews exclusively resulted in belonging to each different cluster. The study analyzed the differences whereby consumer needs are expressed differently in reviews depending on the type of product and suggested that it is necessary to differentiate product planning and marketing promotion according to the product type in practice.

Detecting Spam Data for Securing the Reliability of Text Analysis (텍스트 분석의 신뢰성 확보를 위한 스팸 데이터 식별 방안)

  • Hyun, Yoonjin;Kim, Namgyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.493-504
    • /
    • 2017
  • Recently, tremendous amounts of unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers and practitioners as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, more and more attempts to gain profits by distorting text data maliciously or nonmaliciously are also increasing. This increase in spam text data not only burdens users who want to obtain useful information with a large amount of inappropriate information, but also damages the reliability of information and information providers. Therefore, efforts must be made to improve the reliability of information and the quality of analysis results by detecting and removing spam data in advance. For this purpose, many studies to detect spam have been actively conducted in areas such as opinion spam detection, spam e-mail detection, and web spam detection. In this study, we introduce core concepts and current research trends of spam detection and propose a methodology to detect the spam tag of a blog as one of the challenging attempts to improve the reliability of blog information.

A Technology Landscape of Artificial Intelligence: Technological Structure and Firms' Competitive Advantages (인공지능 기술 랜드스케이프 : 기술 구조와 기업별 경쟁우위)

  • Lee, Wangjae;Lee, Hakyeon
    • Journal of Korea Technology Innovation Society
    • /
    • v.22 no.3
    • /
    • pp.340-361
    • /
    • 2019
  • This study analyzes the technological structure of artificial intelligence (AI) and technological capabilities of AI companies based on patent information. 2589 AI patents registered in USPTO from 2007 to 2017 were collected and analyzed by the Latent Dirichlet Allocation (LDA) to derive 20 AI technology topics. Analysis of technology development trends by AI technology reveals that visual understanding, data analysis, motion control, and machine learning are growing, while language understanding and speech technology are sluggish. In addition, we also investigated leading companies in each sub-field of AI as well as core competencies of global IT companies. The findings of this study are expected to be fruitfully used for formulation and implementation of technology strategy of AI companies.