• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.023 seconds

An Analysis on Media Trends in Public Agency for Social Service Applying Text Mining (텍스트 마이닝을 적용한 사회서비스원 언론보도기사 분석)

  • Park, Hae-Keung;Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.41-48
    • /
    • 2022
  • This study tried to empirically explore which issues related to the social service agency for public(as below SSA), that is, social perceptions were formed, by using mess media related to the SSA. This study is meaningful in that it identifies the overall social perception and trend of SSA through public opinion. In order to extract media trend data, the search used the big data analysis system, Textom, to collect data from the representative portals Naver News and Daum News. The collected texts were 1,299 in 2020 and 1,410 in 2021, for a total of 2,709. As a result of the analysis, first, the most derived words in relation to the frequency of text appearance were 'SSA', 'establishment', and 'operation'. Second, as a result of the N-gram analysis, the pairs of words directly related to the SSA 'SSA and public', 'SSA and opening', 'SSA and launch', and 'SSA and Department Director', 'SSA and Staff', 'SSA and Caregiver' etc. Third, in the results of TF-IDF analysis and word network analysis, similar to the word occurrence frequency and N-gram results, 'establishment', 'operation', 'public', 'launch', 'provided', 'opened', ' 'Holding' and 'Care' were derived. Based on the above analysis results, it was suggested to strengthen the emergency care support group, to commercialize it in detail, and to stabilize jobs.

AI Technology Analysis using Partial Least Square Regression

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.109-115
    • /
    • 2020
  • In this paper, we propose an artificial intelligence(AI) technology analysis using partial least square(PLS) regression model. AI technology is now affecting most areas of our society. So, it is necessary to understand this technology. To analyze the AI technology, we collect the patent documents related to AI from the patent databases in the world. We extract AI technology keywords from the patent documents by text mining techniques. In addition, we analyze the AI keyword data by PLS regression model. This regression model is based on the technique of partial least squares used in the advanced analyses such as bioinformatics, social science, and engineering. To show the performance of our proposed method, we make experiments using AI patent documents, and we illustrate how our research can be applied to real problems. This paper is applicable not only to AI technology but also to other technological fields. This also contributes to understanding other various technologies by PLS regression analysis.

A Tensor Space Model based Semantic Search Technique (텐서공간모델 기반 시멘틱 검색 기법)

  • Hong, Kee-Joo;Kim, Han-Joon;Chang, Jae-Young;Chun, Jong-Hoon
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.4
    • /
    • pp.1-14
    • /
    • 2016
  • Semantic search is known as a series of activities and techniques to improve the search accuracy by clearly understanding users' search intent without big cognitive efforts. Usually, semantic search engines requires ontology and semantic metadata to analyze user queries. However, building a particular ontology and semantic metadata intended for large amounts of data is a very time-consuming and costly task. This is why commercialization practices of semantic search are insufficient. In order to resolve this problem, we propose a novel semantic search method which takes advantage of our previous semantic tensor space model. Since each term is represented as the 2nd-order 'document-by-concept' tensor (i.e., matrix), and each concept as the 2nd-order 'document-by-term' tensor in the model, our proposed semantic search method does not require to build ontology. Nevertheless, through extensive experiments using the OHSUMED document collection and SCOPUS journal abstract data, we show that our proposed method outperforms the vector space model-based search method.

In-memory Compression Scheme Based on Incremental Frequent Patterns for Graph Streams (그래프 스트림 처리를 위한 점진적 빈발 패턴 기반 인-메모리 압축 기법)

  • Lee, Hyeon-Byeong;Shin, Bo-Kyoung;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.35-46
    • /
    • 2022
  • Recently, with the development of network technologies, as IoT and social network service applications have been actively used, a lot of graph stream data is being generated. In this paper, we propose a graph compression scheme that considers the stream graph environment by applying graph mining to the existing compression technique, which has been focused on compression rate and runtime. In this paper, we proposed Incremental frequent pattern based compression technique for graph streams. Since the proposed scheme keeps only the latest reference patterns, it increases the storage utilization and improves the query processing time. In order to show the superiority of the proposed scheme, various performance evaluations are performed in terms of compression rate and processing time compared to the existing method. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

Pattern Analysis of Apartment Price Using Self-Organization Map (자기조직화지도를 통한 아파트 가격의 패턴 분석)

  • Lee, Jiyoung;Ryu, Jae Pil
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.27-33
    • /
    • 2021
  • With increasing interest in key areas of the 4th industrial revolution such as artificial intelligence, deep learning and big data, scientific approaches have developed in order to overcome the limitations of traditional decision-making methodologies. These scientific techniques are mainly used to predict the direction of financial products. In this study, the factors of apartment prices, which are of high social interest, were analyzed through SOM. For this analysis, we extracted the real prices of the apartments and selected a total of 16 input variables that would affect these prices. The data period was set from 1986 to 2021. As a result of examining the characteristics of the variables during the rising and faltering periods of the apartment prices, it was found that the statistical tendencies of the input variables of the rising and the faltering periods were clearly distinguishable. I hope this study will help us analyze the status of the real estate market and study future predictions through image learning.

Comparative analysis of Lecture Evaluation using Decision Tree: Ways to Improve University Classes after COVID-19

  • Bok-Ju Jung;Sang-Chul Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.197-208
    • /
    • 2023
  • In this study, we attempted to examine the changing ways of thinking about lecture evaluation before and after COVID-19. To this end, decision tree analysis(Decision Tree) was used among data mining techniques based on lecture evaluation data for liberal arts and major classes conducted before and after COVID-19 for A university. According to the results of the study, liberal arts changed from 'method' to 'content', and 'knowledge improvement' was an important factor both before and after majors. In particular, 'Assignment' was found to be an important factor after the COVID-19 in common in the evaluation of lectures in the liberal arts department, which means that in the future, professors will be provided with appropriate teaching methods during class, interaction with students, and feedback on assignments or test results, indicates the need for competence. Based on the results of this study, a plan to improve communication with students and activation of blended learning was suggested.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Development of Machine Learning-Based Platform for Distillation Column (증류탑을 위한 머신러닝 기반 플랫폼 개발)

  • Oh, Kwang Cheol;Kwon, Hyukwon;Roh, Jiwon;Choi, Yeongryeol;Park, Hyundo;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.58 no.4
    • /
    • pp.565-572
    • /
    • 2020
  • This study developed a software platform using machine learning of artificial intelligence to optimize the distillation column system. The distillation column is representative and core process in the petrochemical industry. Process stabilization is difficult due to various operating conditions and continuous process characteristics, and differences in process efficiency occur depending on operator skill. The process control based on the theoretical simulation was used to overcome this problem, but it has a limitation which it can't apply to complex processes and real-time systems. This study aims to develop an empirical simulation model based on machine learning and to suggest an optimal process operation method. The development of empirical simulations involves collecting big data from the actual process, feature extraction through data mining, and representative algorithm for the chemical process. Finally, the platform for the distillation column was developed with verification through a developed model and field tests. Through the developed platform, it is possible to predict the operating parameters and provided optimal operating conditions to achieve efficient process control. This study is the basic study applying the artificial intelligence machine learning technique for the chemical process. After application on a wide variety of processes and it can be utilized to the cornerstone of the smart factory of the industry 4.0.

A Study on the Privacy Awareness through Bigdata Analysis (빅데이터 분석을 통한 프라이버시 인식에 관한 연구)

  • Lee, Song-Yi;Kim, Sung-Won;Lee, Hwan-Soo
    • Journal of Digital Convergence
    • /
    • v.17 no.10
    • /
    • pp.49-58
    • /
    • 2019
  • In the era of the 4th industrial revolution, the development of information technology brought various benefits, but it also increased social interest in privacy issues. As the possibility of personal privacy violation by big data increases, academic discussion about privacy management has begun to be active. While the traditional view of privacy has been defined at various levels as the basic human rights, most of the recent research trends are mainly concerned only with the information privacy of online privacy protection. This limited discussion can distort the theoretical concept and the actual perception, making the academic and social consensus of the concept of privacy more difficult. In this study, we analyze the privacy concept that is exposed on the internet based on 12,000 news data of the portal site for the past one year and compare the difference between the theoretical concept and the socially accepted concept. This empirical approach is expected to provide an understanding of the changing concept of privacy and a research direction for the conceptualization of privacy for current situations.

Does Artificial Intelligence Algorithm Discriminate Certain Groups of Humans? (인공지능 알고리즘은 사람을 차별하는가?)

  • Oh, Yoehan;Hong, Sungook
    • Journal of Science and Technology Studies
    • /
    • v.18 no.3
    • /
    • pp.153-216
    • /
    • 2018
  • The contemporary practices of Big-Data based automated decision making algorithms are widely deployed not just because we expect algorithmic decision making might distribute social resources in a more efficient way but also because we hope algorithms might make fairer decisions than the ones humans make with their prejudice, bias, and arbitrary judgment. However, there are increasingly more claims that algorithmic decision making does not do justice to those who are affected by the outcome. These unfair examples bring about new important questions such as how decision making was translated into processes and which factors should be considered to constitute to fair decision making. This paper attempts to delve into a bunch of research which addressed three areas of algorithmic application: criminal justice, law enforcement, and national security. By doing so, it will address some questions about whether artificial intelligence algorithm discriminates certain groups of humans and what are the criteria of a fair decision making process. Prior to the review, factors in each stage of data mining that could, either deliberately or unintentionally, lead to discriminatory results will be discussed. This paper will conclude with implications of this theoretical and practical analysis for the contemporary Korean society.