• Title/Summary/Keyword: 텍스트 마이닝 분석

Search Result 999, Processing Time 0.023 seconds

A Study on Image Recognition of local Currency Consumers Using Big Data (빅데이터를 활용한 지역화폐 소비자 이미지 인식에 관한 연구)

  • Kim, Myung-hee;Ryu, Ki-hwan
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.11-17
    • /
    • 2022
  • Currently, the income and funds of the local economy are flowing out to the metropolitan area, and talented people, the driving force for regional development, also gather in the metropolitan area, and the local economy is facing a serious crisis. Local currency is issued by local governments and is a currency with auxiliary and complementary functions that can be used only within the area concerned. In order to revitalize the local economy, as local governments have focused their attention on the introduction of local currency, studies on the issuance and use of local currency are continuously being conducted. In this study, by using big data from data materials such as portals and SNS, the consumer image of local currency issued in local governments was identified through big data analysis, and based on the research results, the issuance and operation of local currency was conducted. The purpose is to present implications for The results of this study are as follows. First, by inducing local consumption through the policy issuance of local currency, it is showing the effect of increasing the economic income of the region. Second, local governments are exerting efforts to revitalize the economy and establish a virtuous cycle system for the local economy by issuing and distributing local currency. Third, the introduction of blockchain technology shows the stable operation of local currency. With academic significance, it was possible to grasp the changed appearance and effect of local currency through big data analysis and the policy direction of local currency.

Analysis of Municipal Ordinances for Smart Cities of Municipal Governments: Using Topic Modeling (지방자치단체의 스마트시티 조례 분석: 토픽모델링을 활용하여)

  • Hyungjun Seo
    • Informatization Policy
    • /
    • v.30 no.1
    • /
    • pp.41-66
    • /
    • 2023
  • This study aims to reveal the direction of municipal ordinances for smart cities, while focusing on 74 municipal ordinances from 72 municipal governments through topic modeling. As a result, the main keywords that show a high frequency belong to establishment and operations of the Smart City Committee. From the result of topic modeling Latent Dirichlet Allocation(LDA), it classifies municipal ordinances for smart cities into eight topics as follows: Topic 1(security for process of smart cities), Topic 2(promotion of smart city industry), Topic 3(composition of a smart city consultative body for local residents), Topic 4(support system for smart cities), Topic 5(management for personal information), Topic 6(use of smart city data), Topic 7(implementation for intelligent public administration), and Topic 8(smart city promotion). As for topic categorization by region, Topics 5, 6, and 8 which are mostly related to the practical operation of smart cities have a significant portion of municipal ordinances for smart cities in the Seoul metropolitan area. Then, Topics 2, 3, and 4 which are mostly related to the initial implementation of smart cities have a significant portion of municipal ordinances for smart cities in provincial areas.

A Generation and Matching Method of Normal-Transient Dictionary for Realtime Topic Detection (실시간 이슈 탐지를 위한 일반-급상승 단어사전 생성 및 매칭 기법)

  • Choi, Bongjun;Lee, Hanjoo;Yong, Wooseok;Lee, Wonsuk
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.5
    • /
    • pp.7-18
    • /
    • 2017
  • Recently, the number of SNS user has rapidly increased due to smart device industry development and also the amount of generated data is exponentially increasing. In the twitter, Text data generated by user is a key issue to research because it involves events, accidents, reputations of products, and brand images. Twitter has become a channel for users to receive and exchange information. An important characteristic of Twitter is its realtime. Earthquakes, floods and suicides event among the various events should be analyzed rapidly for immediately applying to events. It is necessary to collect tweets related to the event in order to analyze the events. But it is difficult to find all tweets related to the event using normal keywords. In order to solve such a mentioned above, this paper proposes A Generation and Matching Method of Normal-Transient Dictionary for realtime topic detection. Normal dictionaries consist of general keywords(event: suicide-death-loop, death, die, hang oneself, etc) related to events. Whereas transient dictionaries consist of transient keywords(event: suicide-names and information of celebrities, information of social issues) related to events. Experimental results show that matching method using two dictionary finds more tweets related to the event than a simple keyword search.

Investigating the Influence of ESG Information on Funding Success in Online Crowdfunding Platform by Using Text Mining Technique and Logistic Regression

  • Kyu Sung Kim;Min Gyeong Kim;Francis Joseph Costello;Kun Chang Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.155-164
    • /
    • 2023
  • In this paper, we examine the influence of Environmental, Social, and Governance (ESG)-related content on the success of online crowdfunding proposals. Along with the increasing significance of ESG standards in business, investment proposals incorporating ESG concepts are now commonplace. Due to the ESG trend, conventional wisdom holds that the majority of proposals with ESG concepts will have a higher rate of success. We investigate by analyzing over 9000 online business presentations found in a Kickstarter dataset to determine which characteristics of these proposals led to increased investment. We first utilized lexicon-based measurement and Feature Engineering to determine the relationship between environment and society scores and financial indicators. Next, Logistic Regression is utilized to determine the effect of including environmental and social terms in a project's description on its ability to obtain funding. Contrary to popular belief, our research found that microentrepreneurs were less likely to succeed with proposals that focused on ESG issues. Our research will generate new opportunities for research in the disciplines of information science and crowdfunding by shedding new light on the environment of online micro-entrepreneurship.

A Trend Analysis of in the U.S. Cybersecurity Strategy and Implications for Korea (미국 사이버안보 전략의 경향 분석과 한국에의 함의)

  • Sunha Bae;Minkyung Song;Dong Hee Kim
    • Convergence Security Journal
    • /
    • v.23 no.2
    • /
    • pp.11-25
    • /
    • 2023
  • Since President Biden's inauguration, significant cyberattacks have occurred several times in the United States, and cybersecurity was emphasized as a national priority. The U.S. is advancing efforts to strengthen the cybersecurity both domestically and internationally, including with allies. In particular, the Biden administration announced the National Cybersecurity Strategy in March 2023. The National Cybersecurity Strategy is the top guideline of cybersecurity and is the foundation of other cybersecurity policies. And it includes public-privates as well as international policy directions, so it is expected to affect the international order. Meanwhile, In Korea, a new administration was launched in 2022, and the revision of the National Cybersecurity Strategy is necessary. In addition, cooperation between Korea and the U.S. has recently been strengthened, and cybersecurity is being treated as a key agenda in the cooperative relationship. In this paper, we examine the cyber security strategies of the Trump and Biden administration, and analyze how the strategies have changed, their characteristics and implications in qualitative and quantitative terms. And we derive the implications of these changes for Korea's cybersecurity policy.

Analyzing TripAdvisor application reviews to enable smart tourism : focusing on topic modeling (스마트 관광 활성화를 위한 트립어드바이저 애플리케이션 리뷰 분석 : 토픽 모델링을 중심으로)

  • YuNa Lee;MuMoungCho Han;SeonYeong Yu;MeeQi Siow;Mijin Noh;YangSok Kim
    • Smart Media Journal
    • /
    • v.12 no.8
    • /
    • pp.9-17
    • /
    • 2023
  • The development of information and communication technology and the improvement of the development and dissemination of smart devices have caused changes in the form of tourism, and the concept of smart tourism has since emerged. In this regard, researches related to smart tourism has been conducted in various fields such as policy implementation and surveys, but there is a lack of research on application reviews. This study collects Trip Advisor application review data in the Google Play Store to identify usage of the application and user satisfaction through Latent Dirichlet Allocation (LDA) topic modeling. The analysis results in four topics, two of which are positive and the other two are negative. We found that users were satisfied with the application's recommendation system, but were dissatisfied when the filters they set during search were not applied or that reviews were not published after updates of the application. We suggest more categories can be added to the application to provide users with different experiences. In addition, it is expected that user satisfaction can be improved by identifying problems within the application, including the filter function, and checking the application environment and resolving the error occurring during the application usage.

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

Stock Price Prediction Using Sentiment Analysis: from "Stock Discussion Room" in Naver (SNS감성 분석을 이용한 주가 방향성 예측: 네이버 주식토론방 데이터를 이용하여)

  • Kim, Myeongjin;Ryu, Jihye;Cha, Dongho;Sim, Min Kyu
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.4
    • /
    • pp.61-75
    • /
    • 2020
  • The scope of data for understanding or predicting stock prices has been continuously widened from traditional structured format data to unstructured data. This study investigates whether commentary data collected from SNS may affect future stock prices. From "Stock Discussion Room" in Naver, we collect 20 stocks' commentary data for six months, and test whether this data have prediction power with respect to one-hour ahead price direction and price range. Deep neural network such as LSTM and CNN methods are employed to model the predictive relationship. Among the 20 stocks, we find that future price direction can be predicted with higher than the accuracy of 50% in 13 stocks. Also, the future price range can be predicted with higher than the accuracy of 50% in 16 stocks. This study validate that the investors' sentiment reflected in SNS community such as Naver's "Stock Discussion Room" may affect the demand and supply of stocks, thus driving the stock prices.