• Title/Summary/Keyword: 빅데이터 마이닝

Search Result 458, Processing Time 0.024 seconds

A Study on the Landscape Cognition of Wind Power Plant in Social Media (소셜미디어에 나타난 풍력발전시설의 경관 인식 연구)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.5
    • /
    • pp.69-79
    • /
    • 2022
  • This study aims to assess the current understanding of the landscape of wind power facilities as renewable energy sources that supply sightseeing, tourism, and other opportunities. Therefore, social media data related to the landscape of wind power facilities experienced by visitors from different regions was analyzed. The analysis results showed that the common characteristics of the landscape of wind power facilities are based on the scale of wind power facilities, the distance between overlook points of wind power facilities, the visual openness of the wind power facilities from the overlook points, and the terrain where the wind power facilities are located. In addition, the preference for wind power facilities is higher in places where the shape of wind power facilities and the surrounding landscape can be clearly seen- flat ground or the sea are considered better landscapes. Negative keywords about the landscape appear on Gade Mountain in Taibai, Meifeng Mountain in Taibai, Taiqi Mountain, and Gyeongju Wind Power Generation Facilities on Gyeongshang Road in Gangwon. The keyword 'negation' occurs when looking at wind power facilities at close range. Because of the high angle of the view, viewers can feel overwhelmed seeing the size of the facility and the ridge simultaneously, feeling psychological pressure. On the contrary, positive landscape adjectives are obtained from wind power facilities on flat ground or the sea. Visitors think that the visual volume of the landscape is fully ensured on flat ground or the sea, and it is a symbolic element that can represent the site. This study analyzes landscape awareness based on the opinions of visitors who have experienced wind power facilities. However, wind power facilities are built in different areas. Therefore, landscape characteristics are different, and there are many variables, such as viewpoints and observers, so the research results are difficult to popularize and have limitations. In recent years, landscape damage due to the construction of wind power facilities has become a hot issue, and the domestic methods of landscape evaluation of wind power facilities are unsatisfactory. Therefore, when evaluating the landscape of wind power facilities, the scale of wind power facilities, the inherent natural characteristics of the area where wind power facilities are set up, and the distance between wind power facilities and overlook points are important elements to consider. In addition, wind power facilities are set in the natural environment, which needs to be protected. Therefore, from the landscape perspective, it is necessary to study the landscape of wind power facilities and the surrounding environment.

The Comparative Analysis of Outcomes on Patents and Papers of Railway Research Institutes in Korea, China and Japan (한국, 중국, 일본 철도연구기관 특허 및 논문실적 비교분석)

  • Baek, Sunghyun;Yi, Yoonju
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.455-460
    • /
    • 2020
  • The governments of Korea, China, and Japan have operated comprehensive research institutes for railway technologies. Korea Railroad Research Institute (KRRI), China Academy of Railway Sciences Corporation Limited (CARS), and Railway Technical Research Institute (RTRI) are representatives of comprehensive railway research institutes in each country. KRRI was found to be the most advanced in the quantitative competitiveness of patents. In terms of qualitative competitiveness, KRRI has strength in civil engineering, whereas RTRI has strength in electricity. KRRI was found to have the greatest efforts in securing competitiveness in overseas property rights. By comparing the publication of papers, CARS published the most papers. On the other hand, from 2015, KRRI showed an upward trend and published the most papers. By examining the impact of the papers by the citation, KRRI was found to have higher competitiveness than the other two institutions. In the future, it will be necessary to perform big data analysis on patents and papers of the three organizations, derive the key research areas and promising technology areas for each institute, and establish a mid-to-long-term development plan for railway technology based on scientific evidence.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A Study on Industry-specific Sustainability Strategy: Analyzing ESG Reports and News Articles (산업별 지속가능경영 전략 고찰: ESG 보고서와 뉴스 기사를 중심으로)

  • WonHee Kim;YoungOk Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.287-316
    • /
    • 2023
  • As global energy crisis and the COVID-19 pandemic have emerged as social issues, there is a growing demand for companies to move away from profit-centric business models and embrace sustainable management that balances environmental, social, and governance (ESG) factors. ESG activities of companies vary across industries, and industry-specific weights are applied in ESG evaluations. Therefore, it is important to develop strategic management approaches that reflect the characteristics of each industry and the importance of each ESG factor. Additionally, with the stance of strengthened focus on ESG disclosures, specific guidelines are needed to identify and report on sustainable management activities of domestic companies. To understand corporate sustainability strategies, analyzing ESG reports and news articles by industry can help identify strategic characteristics in specific industries. However, each company has its own unique strategies and report structures, making it difficult to grasp detailed trends or action items. In our study, we analyzed ESG reports (2019-2021) and news articles (2019-2022) of six companies in the 'Finance,' 'Manufacturing,' and 'IT' sectors to examine the sustainability strategies of leading domestic ESG companies. Text mining techniques such as keyword frequency analysis and topic modeling were applied to identify industry-specific, ESG element-specific management strategies and issues. The analysis revealed that in the 'Finance' sector, customer-centric management strategies and efforts to promote an inclusive culture within and outside the company were prominent. Strategies addressing climate change, such as carbon neutrality and expanding green finance, were also emphasized. In the 'Manufacturing' sector, the focus was on creating sustainable communities through occupational health and safety issues, sustainable supply chain management, low-carbon technology development, and eco-friendly investments to achieve carbon neutrality. In the 'IT' sector, there was a tendency to focus on technological innovation and digital responsibility to enhance social value through technology. Furthermore, the key issues identified in the ESG factors were as follows: under the 'Environmental' element, issues such as greenhouse gas and carbon emission management, industry-specific eco-friendly activities, and green partnerships were identified. Under the 'Social' element, key issues included social contribution activities through stakeholder engagement, supporting the growth and coexistence of members and partner companies, and enhancing customer value through stable service provision. Under the 'Governance' element, key issues were identified as strengthening board independence through the appointment of outside directors, risk management and communication for sustainable growth, and establishing transparent governance structures. The exploration of the relationship between ESG disclosures in reports and ESG issues in news articles revealed that the sustainability strategies disclosed in reports were aligned with the issues related to ESG disclosed in news articles. However, there was a tendency to strengthen ESG activities for prevention and improvement after negative media coverage that could have a negative impact on corporate image. Additionally, environmental issues were mentioned more frequently in news articles compared to ESG reports, with environmental-related keywords being emphasized in the 'Finance' sector in the reports. Thus, ESG reports and news articles shared some similarities in content due to the sharing of information sources. However, the impact of media coverage influenced the emphasis on specific sustainability strategies, and the extent of mentioning environmental issues varied across documents. Based on our study, the following contributions were derived. From a practical perspective, companies need to consider their characteristics and establish sustainability strategies that align with their capabilities and situations. From an academic perspective, unlike previous studies on ESG strategies, we present a subdivided methodology through analysis considering the industry-specific characteristics of companies.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.