• Title/Summary/Keyword: 빅데이터 수집

Search Result 995, Processing Time 0.023 seconds

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

Development of Machine Learning-Based Platform for Distillation Column (증류탑을 위한 머신러닝 기반 플랫폼 개발)

  • Oh, Kwang Cheol;Kwon, Hyukwon;Roh, Jiwon;Choi, Yeongryeol;Park, Hyundo;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.58 no.4
    • /
    • pp.565-572
    • /
    • 2020
  • This study developed a software platform using machine learning of artificial intelligence to optimize the distillation column system. The distillation column is representative and core process in the petrochemical industry. Process stabilization is difficult due to various operating conditions and continuous process characteristics, and differences in process efficiency occur depending on operator skill. The process control based on the theoretical simulation was used to overcome this problem, but it has a limitation which it can't apply to complex processes and real-time systems. This study aims to develop an empirical simulation model based on machine learning and to suggest an optimal process operation method. The development of empirical simulations involves collecting big data from the actual process, feature extraction through data mining, and representative algorithm for the chemical process. Finally, the platform for the distillation column was developed with verification through a developed model and field tests. Through the developed platform, it is possible to predict the operating parameters and provided optimal operating conditions to achieve efficient process control. This study is the basic study applying the artificial intelligence machine learning technique for the chemical process. After application on a wide variety of processes and it can be utilized to the cornerstone of the smart factory of the industry 4.0.

Factors influencing metabolic syndrome perception and exercising behaviors in Korean adults: Data mining approach (대사증후군의 인지와 신체활동 실천에 영향을 미치는 요인: 데이터 마이닝 접근)

  • Lee, Soo-Kyoung;Moon, Mikyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.12
    • /
    • pp.581-588
    • /
    • 2017
  • This study was conducted to determine which factors would predict metabolic syndrome (MetS) perception and exercise by applying a machine learning classifier, or Extreme Gradient Boosting algorithm (XGBoost) from July 2014 to December 2015. Data were obtained from the Korean Community Health Survey (KCHS), representing different community-dwelling Korean adults 19 years and older, from 2009 to 2013. The dataset includes 370,430 adults. Outcomes were categorized as follows based on the perception of MetS and physical activity (PA): Stage 1 (no perception, no PA), Stage 2 (perception, no PA), and Stage 3 (perception, PA). Features common to all questionnaires for the last 5 years were selected for modeling. Overall, there were 161 features, categorical except for age and the visual analogue scale (EQ-VAS). We used the Extreme Boosting algorithm in R programming for a model to predict factors and achieved prediction accuracy in 0.735 submissions. The top 10 predictive factors in Stage 3 were: age, education level, attempt to control weight, EQ mobility, nutrition label checks, private health insurance, EQ-5D usual activities, anti-smoking advertising, EQ-VAS, education in health centers for diabetes, and dental care. In conclusion, the results showed that XGBoost can be used to identify factors influencing disease prevention and management using healthcare bigdata.

Design and Implementation of Luo-kuan Recognition Application (낙관 인식을 위한 애플리케이션의 설계 및 구현)

  • Kim, Han-Syel;Seo, Kwi-Bin;Kang, Mingoo;Ryu, Gee Soo;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.97-103
    • /
    • 2018
  • In oriental paintings, there is Luo-kuan that expressed in a single picture by compressing the artist's information. Such Luo-kuan includes various information such as the title of the work or the name of the artist. Therefore, information about Luo-kuan is considered important to those who collect or enjoy oriental paintings. However, most of the letters in the Luo-kuan are difficult kanji, kanzai, or various shapes, so it is difficult for the ordinary people to interpret. In this paper, we developed an Luo-kuan search application to easily check the information of the Luo-kuan. The application uses a search algorithm that analyzes the captured Luo-kuan image and sends it to the server to output information about the Luo-kuan candidates that are most similar to the Luo-kuan images taken from the database in the server. We also compared and analyzed the accuracy of the algorithm based on 170 Luo-kuan data in order to find out the ranking of the Luo-kuan that matched the Luo-kuan among the candidates. Accuracy Analysis Experimental Results The accuracy of the search algorithm of this application is confirmed to be about 90%, and it is anticipated that it will be possible to develop a platform to automatically analyze and search images in a big data environment by supplementing the optimizing algorithm and multi-threading algorithm.

Analyzing TripAdvisor application reviews to enable smart tourism : focusing on topic modeling (스마트 관광 활성화를 위한 트립어드바이저 애플리케이션 리뷰 분석 : 토픽 모델링을 중심으로)

  • YuNa Lee;MuMoungCho Han;SeonYeong Yu;MeeQi Siow;Mijin Noh;YangSok Kim
    • Smart Media Journal
    • /
    • v.12 no.8
    • /
    • pp.9-17
    • /
    • 2023
  • The development of information and communication technology and the improvement of the development and dissemination of smart devices have caused changes in the form of tourism, and the concept of smart tourism has since emerged. In this regard, researches related to smart tourism has been conducted in various fields such as policy implementation and surveys, but there is a lack of research on application reviews. This study collects Trip Advisor application review data in the Google Play Store to identify usage of the application and user satisfaction through Latent Dirichlet Allocation (LDA) topic modeling. The analysis results in four topics, two of which are positive and the other two are negative. We found that users were satisfied with the application's recommendation system, but were dissatisfied when the filters they set during search were not applied or that reviews were not published after updates of the application. We suggest more categories can be added to the application to provide users with different experiences. In addition, it is expected that user satisfaction can be improved by identifying problems within the application, including the filter function, and checking the application environment and resolving the error occurring during the application usage.

Development of Metrics to Measure Reusability Quality of AIaaS

  • Eun-Sook Cho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.147-153
    • /
    • 2023
  • As it spreads to all industries of artificial intelligence technology, AIaaS equipped with artificial intelligence services is emerging. In particular, non-IT companies are suffering from the absence of software experts, difficulties in training big data models, and difficulties in collecting and analyzing various types of data. AIaaS makes it easier and more economical for users to build a system by providing various IT resources necessary for artificial intelligence software development as well as functions necessary for artificial intelligence software in the form of a service. Therefore, the supply and demand for such cloud-based AIaaS services will increase rapidly. However, the quality of services provided by AIaaS becomes an important factor in what is required as the supply and demand for AIaaS increases. However, research on a comprehensive and practical quality evaluation metric to measure this is currently insufficient. Therefore, in this paper, we develop and propose a usability, replacement, scalability, and publicity metric, which are the four metrics necessary for measuring reusability, based on implementation, convenience, efficiency, and accessibility, which are characteristics of AIaaS, for reusability evaluation among the service quality measurement factors of AIaaS. The proposed metrics can be used as a tool to predict how much services provided by AIaaS can be reused for potential users in the future.

Analysis of media trends related to spent nuclear fuel treatment technology using text mining techniques (텍스트마이닝 기법을 활용한 사용후핵연료 건식처리기술 관련 언론 동향 분석)

  • Jeong, Ji-Song;Kim, Ho-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.33-54
    • /
    • 2021
  • With the fourth industrial revolution and the arrival of the New Normal era due to Corona, the importance of Non-contact technologies such as artificial intelligence and big data research has been increasing. Convergent research is being conducted in earnest to keep up with these research trends, but not many studies have been conducted in the area of nuclear research using artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. This study was conducted to confirm the applicability of data science analysis techniques to the field of nuclear research. Furthermore, the study of identifying trends in nuclear spent fuel recognition is critical in terms of being able to determine directions to nuclear industry policies and respond in advance to changes in industrial policies. For those reasons, this study conducted a media trend analysis of pyroprocessing, a spent nuclear fuel treatment technology. We objectively analyze changes in media perception of spent nuclear fuel dry treatment techniques by applying text mining analysis techniques. Text data specializing in Naver's web news articles, including the keywords "Pyroprocessing" and "Sodium Cooled Reactor," were collected through Python code to identify changes in perception over time. The analysis period was set from 2007 to 2020, when the first article was published, and detailed and multi-layered analysis of text data was carried out through analysis methods such as word cloud writing based on frequency analysis, TF-IDF and degree centrality calculation. Analysis of the frequency of the keyword showed that there was a change in media perception of spent nuclear fuel dry treatment technology in the mid-2010s, which was influenced by the Gyeongju earthquake in 2016 and the implementation of the new government's energy conversion policy in 2017. Therefore, trend analysis was conducted based on the corresponding time period, and word frequency analysis, TF-IDF, degree centrality values, and semantic network graphs were derived. Studies show that before the 2010s, media perception of spent nuclear fuel dry treatment technology was diplomatic and positive. However, over time, the frequency of keywords such as "safety", "reexamination", "disposal", and "disassembly" has increased, indicating that the sustainability of spent nuclear fuel dry treatment technology is being seriously considered. It was confirmed that social awareness also changed as spent nuclear fuel dry treatment technology, which was recognized as a political and diplomatic technology, became ambiguous due to changes in domestic policy. This means that domestic policy changes such as nuclear power policy have a greater impact on media perceptions than issues of "spent nuclear fuel processing technology" itself. This seems to be because nuclear policy is a socially more discussed and public-friendly topic than spent nuclear fuel. Therefore, in order to improve social awareness of spent nuclear fuel processing technology, it would be necessary to provide sufficient information about this, and linking it to nuclear policy issues would also be a good idea. In addition, the study highlighted the importance of social science research in nuclear power. It is necessary to apply the social sciences sector widely to the nuclear engineering sector, and considering national policy changes, we could confirm that the nuclear industry would be sustainable. However, this study has limitations that it has applied big data analysis methods only to detailed research areas such as "Pyroprocessing," a spent nuclear fuel dry processing technology. Furthermore, there was no clear basis for the cause of the change in social perception, and only news articles were analyzed to determine social perception. Considering future comments, it is expected that more reliable results will be produced and efficiently used in the field of nuclear policy research if a media trend analysis study on nuclear power is conducted. Recently, the development of uncontact-related technologies such as artificial intelligence and big data research is accelerating in the wake of the recent arrival of the New Normal era caused by corona. Convergence research is being conducted in earnest in various research fields to follow these research trends, but not many studies have been conducted in the nuclear field with artificial intelligence and big data-related technologies such as natural language processing and text mining analysis. The academic significance of this study is that it was possible to confirm the applicability of data science analysis technology in the field of nuclear research. Furthermore, due to the impact of current government energy policies such as nuclear power plant reductions, re-evaluation of spent fuel treatment technology research is undertaken, and key keyword analysis in the field can contribute to future research orientation. It is important to consider the views of others outside, not just the safety technology and engineering integrity of nuclear power, and further reconsider whether it is appropriate to discuss nuclear engineering technology internally. In addition, if multidisciplinary research on nuclear power is carried out, reasonable alternatives can be prepared to maintain the nuclear industry.

Crepe Search System Design using Web Crawling (웹 크롤링 이용한 크레페 검색 시스템 설계)

  • Kim, Hyo-Jong;Han, Kun-Hee;Shin, Seung-Soo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.261-269
    • /
    • 2017
  • The purpose of this paper is to provide a search system using a method of accessing the web in real time without using a database server in order to guarantee the up-to-date information in a single network, rather than using a plurality of bots connected by a wide area network Design. The method of the research is to design and analyze the system which can search the person and keyword quickly and accurately in crepe system. In the crepe server, when the user registers information, the body tag matching conversion process stores all the information as it is, since various styles are applied to each user, such as a font, a font size, and a color. The crepe server does not cause a problem of body tag matching. However, when executing the crepe retrieval system, the style and characteristics of users can not be formalized. This problem can be solved by using the html_img_parser function and the Go language html parser package. By applying queues and multiple threads to a general-purpose web crawler, rather than a web crawler design that targets a specific site, it is possible to utilize a multiplier that quickly and efficiently searches and collects various web sites in various applications.

Textile material classification in clothing images using deep learning (딥러닝을 이용한 의류 이미지의 텍스타일 소재 분류)

  • So Young Lee;Hye Seon Jeong;Yoon Sung Choi;Choong Kwon Lee
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.43-51
    • /
    • 2023
  • As online transactions increase, the image of clothing has a great influence on consumer purchasing decisions. The importance of image information for clothing materials has been emphasized, and it is important for the fashion industry to analyze clothing images and grasp the materials used. Textile materials used for clothing are difficult to identify with the naked eye, and much time and cost are consumed in sorting. This study aims to classify the materials of textiles from clothing images based on deep learning algorithms. Classifying materials can help reduce clothing production costs, increase the efficiency of the manufacturing process, and contribute to the service of recommending products of specific materials to consumers. We used machine vision-based deep learning algorithms ResNet and Vision Transformer to classify clothing images. A total of 760,949 images were collected and preprocessed to detect abnormal images. Finally, a total of 167,299 clothing images, 19 textile labels and 20 fabric labels were used. We used ResNet and Vision Transformer to classify clothing materials and compared the performance of the algorithms with the Top-k Accuracy Score metric. As a result of comparing the performance, the Vision Transformer algorithm outperforms ResNet.

Analysis of Topic Changes in Metaverse Application Reviews Before and After the COVID-19 Pandemic Using Causal Impact Analysis Techniques (Causal Impact 분석 기법을 접목한 COVID-19 팬데믹 전·후 메타버스 애플리케이션 리뷰의 토픽 변화 분석)

  • Lee, Sowon;Mijin Noh;MuMoungCho Han;YangSok Kim
    • Smart Media Journal
    • /
    • v.13 no.1
    • /
    • pp.36-44
    • /
    • 2024
  • Metaverse is attracting attention as the development of virtual environment technology and the emergence of untact culture due to the COVID-19 pandemic. In this study, by analyzing users' reviews on the "Zepeto" application, which has recently attracted attention as a metaverse service, we tried to confirm changes in the requirements for the metaverse after the COVID-19 pandemic. To this end, 109,662 reviews of "Zepeto" applications written on the Google Play Store from September 2018 to March 2023 were collected, topics were extracted using LDA topic modeling technique, and topics were analyzed using the Causal Impact technique to examine how topics changed before and after based on "March 11, 2020" when the COVID-19 pandemic was declared. As a result of the analysis, five topics were extracted: application functional problems (topic1), security problems (topic 2), complaints about cryptocurrency (Zem) in the application (topic 3), application performance (topic 4), and personal information-related problems (topic 5). Among them, it was confirmed that security problems (topic 2) were most affected by the COVID-19 pandemic.