• Title/Summary/Keyword: Unstructured data analysis

Search Result 426, Processing Time 0.035 seconds

Analysis of Performance of Creative Education based on Twitter Big Data Analysis (트위터 빅데이터 분석을 통한 창의적 교육의 성과요인 분석)

  • Joo, Kilhong
    • Journal of Creative Information Culture
    • /
    • v.5 no.3
    • /
    • pp.215-223
    • /
    • 2019
  • The wave of the information age gradually accelerates, and fusion analysis solutions that can utilize these knowledge data according to accumulation of various forms of big data such as large capacity texts, sounds, movies and the like are increasing, Reduction in the cost of storing data accordingly, development of social network service (SNS), etc. resulted in quantitative qualitative expansion of data. Such a situation makes possible utilization of data which was not trying to be existing, and the potential value and influence of the data are increasing. Research is being actively made to present future-oriented education systems by applying these fusion analysis systems to the improvement of the educational system. In this research, we conducted a big data analysis on Twitter, analyzed the natural language of the data and frequency analysis of the word, quantitative measure of how domestic windows education problems and outcomes were done in it as a solution.

An Insight Study on Keyword of IoT Utilizing Big Data Analysis (빅데이터 분석을 활용한 사물인터넷 키워드에 관한 조망)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.146-147
    • /
    • 2017
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Internet of things" keyword, one month as of october 8, 2017. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Internet of things" has been found to be technology (995). This study suggests theoretical implications based on the results.

  • PDF

A Study on the Data-Based Organizational Capabilities by Convergence Capabilities Level of Public Data (공공데이터 융합역량 수준에 따른 데이터 기반 조직 역량의 연구)

  • Jung, Byoungho;Joo, Hyungkun
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.18 no.4
    • /
    • pp.97-110
    • /
    • 2022
  • The purpose of this study is to analyze the level of public data convergence capabilities of administrative organizations and to explore important variables in data-based organizational capabilities. The theoretical background was summarized on public data and use activation, joint use, convergence, administrative organization, and convergence constraints. These contents were explained Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. The research model was set as the data-based organizational capabilities effect by a data-based administrative capability, public data operation capabilities, and public data operation constraints. It was also set whether there is a capabilities difference data-based on an organizational operation by the level of data convergence capabilities. This study analysis was conducted with hierarchical cluster analysis and multiple regression analysis. As the research result, First, hierarchical cluster analysis was classified into three groups. It was classified into a group that uses only public data and structured data, a group that uses public data on both structured and unstructured data, and a group that uses both public and private data. Second, the critical variables of data-based organizational operation capabilities were found in the data-based administrative planning and administrative technology, the supervisory organizations and technical systems by public data convergence, and the data sharing and market transaction constraints. Finally, the essential independent variables on data-based organizational competencies differ by group. This study contributed. As a theoretical implication, this research is updated on management information systems by explaining the Public Data Act, the Electronic Government Act, and the Data-Based Administrative Act. As a practical implication, the activity reinforcement of public data should be promoting the establishment of data standardization and search convenience and elimination of the lukewarm attitudes and Selfishness behavior for data sharing.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Korean and English Sentiment Analysis Using the Deep Learning

  • Ramadhani, Adyan Marendra;Choi, Hyung Rim;Lim, Seong Bae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.3
    • /
    • pp.59-71
    • /
    • 2018
  • Social media has immense popularity among all services today. Data from social network services (SNSs) can be used for various objectives, such as text prediction or sentiment analysis. There is a great deal of Korean and English data on social media that can be used for sentiment analysis, but handling such huge amounts of unstructured data presents a difficult task. Machine learning is needed to handle such huge amounts of data. This research focuses on predicting Korean and English sentiment using deep forward neural network with a deep learning architecture and compares it with other methods, such as LDA MLP and GENSIM, using logistic regression. The research findings indicate an approximately 75% accuracy rate when predicting sentiments using DNN, with a latent Dirichelet allocation (LDA) prediction accuracy rate of approximately 81%, with the corpus being approximately 64% accurate between English and Korean.

Hadoop Security Technologies and Vulnerability Analysis (하둡 보안 기술과 취약점 분석)

  • Kim, A-Yong;He, Yilun;Kim, Han-Kil;Park, Man-Seub;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.681-683
    • /
    • 2013
  • And were the prevalence of smartphones is the Big Data era, such as Facebook or Twitter, SNS (Social Network Service) routine is used in the real world. Take advantage of the analysis, and to extract and utilize developed in the Apache Foundation Hadoop (Hadoop) without abandoning the SNS unstructured data here. Hadoop is an open source framework that can handle large amounts of data. Hadoop has been introduced in the domestic corporate and commercial development and Compared to the technology development Hadoop has been pointed out that the lack of security sector. In this paper, we propose a method to enhance the security and vulnerability analysis of security technologies and Hadoop.

  • PDF

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

A Study on Recognition of Artificial Intelligence Utilizing Big Data Analysis (빅데이터 분석을 활용한 인공지능 인식에 관한 연구)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.129-130
    • /
    • 2018
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Artificial Intelligence" keyword, one month as of May 19, 2018. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Artificial Intelligence" has been found to be technology (4,122). This study suggests theoretical implications based on the results.

  • PDF

A Method for Short Text Classification using SNS Feature Information based on Markov Logic Networks (SNS 특징정보를 활용한 마르코프 논리 네트워크 기반의 단문 텍스트 분류 방법)

  • Lee, Eunji;Kim, Pankoo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.1065-1072
    • /
    • 2017
  • As smart devices and social network services (SNSs) become increasingly pervasive, individuals produce large amounts of data in real time. Accordingly, studies on unstructured data analysis are actively being conducted to solve the resultant problem of information overload and to facilitate effective data processing. Many such studies are conducted for filtering inappropriate information. In this paper, a feature-weighting method considering SNS-message features is proposed for the classification of short text messages generated on SNSs, using Markov logic networks for category inference. The performance of the proposed method is verified through a comparison with an existing frequency-based classification methods.

An Application of Case-Based Reasoning in Forecasting a Successful Implementation of Enterprise Resource Planning Systems : Focus on Small and Medium sized Enterprises Implementing ERP (성공적인 ERP 시스템 구축 예측을 위한 사례기반추론 응용 : ERP 시스템을 구현한 중소기업을 중심으로)

  • Lim Se-Hun
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.1
    • /
    • pp.77-94
    • /
    • 2006
  • Case-based Reasoning (CBR) is widely used in business and industry prediction. It is suitable to solve complex and unstructured business problems. Recently, the prediction accuracy of CBR has been enhanced by not only various machine learning algorithms such as genetic algorithms, relative weighting of Artificial Neural Network (ANN) input variable but also data mining technique such as feature selection, feature weighting, feature transformation, and instance selection As a result, CBR is even more widely used today in business area. In this study, we investigated the usefulness of the CBR method in forecasting success in implementing ERP systems. We used a CBR method based on the feature weighting technique to compare the performance of three different models : MDA (Multiple Discriminant Analysis), GECBR (GEneral CBR), FWCBR (CBR with Feature Weighting supported by Analytic Hierarchy Process). The study suggests that the FWCBR approach is a promising method for forecasting of successful ERP implementation in Small and Medium sized Enterprises.

  • PDF