• Title/Summary/Keyword: big data mining

Search Result 686, Processing Time 0.026 seconds

Empowering Agriculture: Exploring User Sentiments and Suggestions for Plantix, a Smart Farming Application

  • Mee Qi Siow;Mu Moung Cho Han;Yu Na Lee;Seon Yeong Yu;Mi Jin Noh;Yang Sok Kim
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.38-46
    • /
    • 2023
  • Farming activities are transforming from traditional skill-based agriculture into knowledge-based and technology-driven digital agriculture. The use of intelligent information and communication technology introduces the idea of smart farming that enables farmers to collect weather data, monitor crop growth remotely and detect crop diseases easily. The introduction of Plantix, a pest and disease management tool in the form of a mobile application has allowed farmers to identify pests and diseases of the crop using their mobile devices. Hence, this study collected the reviews of Plantix to explore the response of the users on the Google Play Store towards the application through Latent Dirichlet Allocation (LDA) topic modeling. Results indicate four latent topics in the reviews: two positive evaluations (compliments, appreciation) and two suggestions (plant options, recommendations). We found the users suggested the application to additional plant options and additional features that might help the farmers with their difficulties. In addition, the application is expected to benefit the farmer more by having an early alert of diseases to farmers and providing various substitutes and a list of components for the remedial measures.

An Efficient Estimation of Place Brand Image Power Based on Text Mining Technology (텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법)

  • Choi, Sukjae;Jeon, Jongshik;Subrata, Biswas;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.113-129
    • /
    • 2015
  • Location branding is a very important income making activity, by giving special meanings to a specific location while producing identity and communal value which are based around the understanding of a place's location branding concept methodology. Many other areas, such as marketing, architecture, and city construction, exert an influence creating an impressive brand image. A place brand which shows great recognition to both native people of S. Korea and foreigners creates significant economic effects. There has been research on creating a strategically and detailed place brand image, and the representative research has been carried out by Anholt who surveyed two million people from 50 different countries. However, the investigation, including survey research, required a great deal of effort from the workforce and required significant expense. As a result, there is a need to make more affordable, objective and effective research methods. The purpose of this paper is to find a way to measure the intensity of the image of the brand objective and at a low cost through text mining purposes. The proposed method extracts the keyword and the factors constructing the location brand image from the related web documents. In this way, we can measure the brand image intensity of the specific location. The performance of the proposed methodology was verified through comparison with Anholt's 50 city image consistency index ranking around the world. Four methods are applied to the test. First, RNADOM method artificially ranks the cities included in the experiment. HUMAN method firstly makes a questionnaire and selects 9 volunteers who are well acquainted with brand management and at the same time cities to evaluate. Then they are requested to rank the cities and compared with the Anholt's evaluation results. TM method applies the proposed method to evaluate the cities with all evaluation criteria. TM-LEARN, which is the extended method of TM, selects significant evaluation items from the items in every criterion. Then the method evaluates the cities with all selected evaluation criteria. RMSE is used to as a metric to compare the evaluation results. Experimental results suggested by this paper's methodology are as follows: Firstly, compared to the evaluation method that targets ordinary people, this method appeared to be more accurate. Secondly, compared to the traditional survey method, the time and the cost are much less because in this research we used automated means. Thirdly, this proposed methodology is very timely because it can be evaluated from time to time. Fourthly, compared to Anholt's method which evaluated only for an already specified city, this proposed methodology is applicable to any location. Finally, this proposed methodology has a relatively high objectivity because our research was conducted based on open source data. As a result, our city image evaluation text mining approach has found validity in terms of accuracy, cost-effectiveness, timeliness, scalability, and reliability. The proposed method provides managers with clear guidelines regarding brand management in public and private sectors. As public sectors such as local officers, the proposed method could be used to formulate strategies and enhance the image of their places in an efficient manner. Rather than conducting heavy questionnaires, the local officers could monitor the current place image very shortly a priori, than may make decisions to go over the formal place image test only if the evaluation results from the proposed method are not ordinary no matter what the results indicate opportunity or threat to the place. Moreover, with co-using the morphological analysis, extracting meaningful facets of place brand from text, sentiment analysis and more with the proposed method, marketing strategy planners or civil engineering professionals may obtain deeper and more abundant insights for better place rand images. In the future, a prototype system will be implemented to show the feasibility of the idea proposed in this paper.

A CF-based Health Functional Recommender System using Extended User Similarity Measure (확장된 사용자 유사도를 이용한 CF-기반 건강기능식품 추천 시스템)

  • Sein Hong;Euiju Jeong;Jaekyeong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-17
    • /
    • 2023
  • With the recent rapid development of ICT(Information and Communication Technology) and the popularization of digital devices, the size of the online market continues to grow. As a result, we live in a flood of information. Thus, customers are facing information overload problems that require a lot of time and money to select products. Therefore, a personalized recommender system has become an essential methodology to address such issues. Collaborative Filtering(CF) is the most widely used recommender system. Traditional recommender systems mainly utilize quantitative data such as rating values, resulting in poor recommendation accuracy. Quantitative data cannot fully reflect the user's preference. To solve such a problem, studies that reflect qualitative data, such as review contents, are being actively conducted these days. To quantify user review contents, text mining was used in this study. The general CF consists of the following three steps: user-item matrix generation, Top-N neighborhood group search, and Top-K recommendation list generation. In this study, we propose a recommendation algorithm that applies an extended similarity measure, which utilize quantified review contents in addition to user rating values. After calculating review similarity by applying TF-IDF, Word2Vec, and Doc2Vec techniques to review content, extended similarity is created by combining user rating similarity and quantified review contents. To verify this, we used user ratings and review data from the e-commerce site Amazon's "Health and Personal Care". The proposed recommendation model using extended similarity measure showed superior performance to the traditional recommendation model using only user rating value-based similarity measure. In addition, among the various text mining techniques, the similarity obtained using the TF-IDF technique showed the best performance when used in the neighbor group search and recommendation list generation step.

The Analysis on the Relationship between Firms' Exposures to SNS and Stock Prices in Korea (기업의 SNS 노출과 주식 수익률간의 관계 분석)

  • Kim, Taehwan;Jung, Woo-Jin;Lee, Sang-Yong Tom
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.233-253
    • /
    • 2014
  • Can the stock market really be predicted? Stock market prediction has attracted much attention from many fields including business, economics, statistics, and mathematics. Early research on stock market prediction was based on random walk theory (RWT) and the efficient market hypothesis (EMH). According to the EMH, stock market are largely driven by new information rather than present and past prices. Since it is unpredictable, stock market will follow a random walk. Even though these theories, Schumaker [2010] asserted that people keep trying to predict the stock market by using artificial intelligence, statistical estimates, and mathematical models. Mathematical approaches include Percolation Methods, Log-Periodic Oscillations and Wavelet Transforms to model future prices. Examples of artificial intelligence approaches that deals with optimization and machine learning are Genetic Algorithms, Support Vector Machines (SVM) and Neural Networks. Statistical approaches typically predicts the future by using past stock market data. Recently, financial engineers have started to predict the stock prices movement pattern by using the SNS data. SNS is the place where peoples opinions and ideas are freely flow and affect others' beliefs on certain things. Through word-of-mouth in SNS, people share product usage experiences, subjective feelings, and commonly accompanying sentiment or mood with others. An increasing number of empirical analyses of sentiment and mood are based on textual collections of public user generated data on the web. The Opinion mining is one domain of the data mining fields extracting public opinions exposed in SNS by utilizing data mining. There have been many studies on the issues of opinion mining from Web sources such as product reviews, forum posts and blogs. In relation to this literatures, we are trying to understand the effects of SNS exposures of firms on stock prices in Korea. Similarly to Bollen et al. [2011], we empirically analyze the impact of SNS exposures on stock return rates. We use Social Metrics by Daum Soft, an SNS big data analysis company in Korea. Social Metrics provides trends and public opinions in Twitter and blogs by using natural language process and analysis tools. It collects the sentences circulated in the Twitter in real time, and breaks down these sentences into the word units and then extracts keywords. In this study, we classify firms' exposures in SNS into two groups: positive and negative. To test the correlation and causation relationship between SNS exposures and stock price returns, we first collect 252 firms' stock prices and KRX100 index in the Korea Stock Exchange (KRX) from May 25, 2012 to September 1, 2012. We also gather the public attitudes (positive, negative) about these firms from Social Metrics over the same period of time. We conduct regression analysis between stock prices and the number of SNS exposures. Having checked the correlation between the two variables, we perform Granger causality test to see the causation direction between the two variables. The research result is that the number of total SNS exposures is positively related with stock market returns. The number of positive mentions of has also positive relationship with stock market returns. Contrarily, the number of negative mentions has negative relationship with stock market returns, but this relationship is statistically not significant. This means that the impact of positive mentions is statistically bigger than the impact of negative mentions. We also investigate whether the impacts are moderated by industry type and firm's size. We find that the SNS exposures impacts are bigger for IT firms than for non-IT firms, and bigger for small sized firms than for large sized firms. The results of Granger causality test shows change of stock price return is caused by SNS exposures, while the causation of the other way round is not significant. Therefore the correlation relationship between SNS exposures and stock prices has uni-direction causality. The more a firm is exposed in SNS, the more is the stock price likely to increase, while stock price changes may not cause more SNS mentions.

Time Series Analysis of Park Use Behavior Utilizing Big Data - Targeting Olympic Park - (빅데이터를 활용한 공원 이용행태의 시계열분석 - 올림픽공원을 대상으로 -)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.2
    • /
    • pp.27-36
    • /
    • 2018
  • This study suggests the necessity of behavior analysis as changes to a park environment to reflect user desires can be implemented only by grasping the needs of park users. Online data (blog) were defined as the basic data of the study. After collecting data by 5 - year units, data mining was used to derive the characteristics of the time series behavior while the significance of the online data was verified through social network analysis. The results of the text mining analysis are as follows. First, primary results included 'walking', 'photography', 'riding bicycles'(inline, kickboard, etc.), and 'eating'. Second, in the early days of the collected data, active physical activity such as exercise was the main factor, but recent passive behavior such as eating, using a mobile phone, games, food and drinking coffee also appeared as a new behavior characteristic in parks. Third, the factors affecting the behavior of park users are the changes of various conditions of society such as internet development and a culture of expressing unique personalities and styles. Fourth, the special behaviors appearing at Olympic Park were derived from educational activities such as cultural activities including watching performances and history lessons. In conclusion, it has been shown that people's lifestyle changes and the behavior of a park are influenced by the changes of the various times rather than the original purpose that was intended during park planning and design. Therefore, it is necessary to create an environment tailored to users by considering the main behaviors and influencing factors of Olympic Park. Text mining used as an analytical method has the merit that past data can be collected. Therefore, it is possible to form analysis from a long-term viewpoint of behavior analysis as well as to measure new behavior and value with derived keywords. In addition, the validity of online data was verified through social network analysis to increase the legitimacy of research results. Research on more comprehensive behavior analysis should be carried out by diversifying the types of data collected later, and various methods for verifying the accuracy and reliability of large-volume data will be needed.

Research of Semantic Considered Tree Mining Method for an Intelligent Knowledge-Services Platform

  • Paik, Juryon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.27-36
    • /
    • 2020
  • In this paper, we propose a method to derive valuable but hidden infromation from the data which is the core foundation in the 4th Industrial Revolution to pursue knowledge-based service fusion. The hyper-connected societies characterized by IoT inevitably produce big data, and with the data in order to derive optimal services for trouble situations it is first processed by discovering valuable information. A data-centric IoT platform is a platform to collect, store, manage, and integrate the data from variable devices, which is actually a type of middleware platforms. Its purpose is to provide suitable solutions for challenged problems after processing and analyzing the data, that depends on efficient and accurate algorithms performing the work of data analysis. To this end, we propose specially designed structures to store IoT data without losing the semantics and provide algorithms to discover the useful information with several definitions and proofs to show the soundness.

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

A Study on the Perception of Pit and Fissure Sealant using Unstructured Big Data (비정형 빅데이터를 이용한 치면열구전색(치아홈메우기)에 대한 인식분석)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • Background: This study aimed to explore the overall perception of pit and fissure sealants and suggest methods to revitalize their current stagnation. Methods: To determine the social perception of the change in coverage policy for pit and fissure sealants, we categorized them into five time periods. The first period (December 1, 2009 to November 30, 2010), the second period (December 1, 2010 to September 30, 2012), the third period (October 1, 2012 to May 5, 2013), the fourth period (May 6, 2013 to September 30, 2017), and the fifth period (October 1, 2017 to December 31, 2022). We utilized text mining, an unstructured big data analysis method. Keywords were collected and analyzed using Textom, and the frequency analysis of the top 30 keywords, structural features of the semantic network, centrality analysis, QAP correlation analysis, and co-occurrence analysis were conducted. Results: The frequency analysis showed that the top keywords for each time period were 'Cavities', 'Treatment', and 'Children'. In the structural features of the semantic network of pit and fissure sealants by time period, the density index was found to be around 1.00 for all time periods. The QAP correlation analysis showed the highest correlation between the first and second periods and the fourth and fifth periods with a correlation coefficient of 0.834. The co-occurrence analysis showed that 'cavities' and 'prevention were the top two words across all time periods. Conclusion: This study showed that pit and fissure sealants are well accepted by the society as a preventive treatment for caries. However, the awareness of health education related to these sealants was found to be low. Efforts to revitalize stagnant pit and fissure sealants need to be strengthened with effective education.

Text Mining-Based Analysis for Research Trends in Vocational Studies (텍스트 마이닝을 활용한 직업학 연구동향 분석)

  • Yook, Dong-In
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.3
    • /
    • pp.586-599
    • /
    • 2017
  • This study attempts to understand the overall research trends in Vocational Studies using a text mining method, which is a means to analyze big data. The findings of the research show that Vocational Studies in Korea has been directly influenced by global economic crises, as evidenced by its exponential growth after the 1997 foreign exchange crisis that resulted in a bailout from the IMF. In addition, the topics of research have been shifting from such macro subjects as government policies and systems to such micro topics as individual career development. Moreover, the perspective of research is being moved from the socially vulnerable, including women and the disabled, to the economically marginalized, including retirees and the unemployed. As for the research targets, college students overwhelmingly outnumbered primary and secondary school students. However, few cases analyzed the clinical outcomes of career counseling or attempted to process job information and study the history of jobs. This research is limited in that it only analyzed journal abstracts. Nonetheless, it is meaningful because it used topic analysis, one of the text mining methods, to give a complete enumeration of all articles available for search, thereby crafting a framework of quantitative analysis methodology for Vocational Studies. It is also significant in that it is the first attempt to analyze themes in every stage of the development of Vocational Studies.

An exploratory study for the development of a education framework for supporting children's development in the convergence of "art activity" and "language activity": Focused on Text mining method ('미술'과 '언어' 활동 융합형의 아동 발달지원 교육 프레임워크 개발을 위한 탐색적 연구: 텍스트 마이닝을 중심으로)

  • Park, Yunmi;Kim, Sijeong
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.3
    • /
    • pp.297-304
    • /
    • 2021
  • This study aims not only to access the visual thought-oriented approach that has been implemented in established art therapy and education but also to integrate language education and therapeutic approach to support the development of school-age children. Thus, text mining technique was applied to search for areas where different areas of language and art can be integrated. This research was conducted in accordance with the procedure of basic research, preliminary DB construction, text screening, DB pre-processing and confirmation, stop-words removing, text mining analysis and the deduction about the convergent areas. These results demonstrated that this study draws convergence areas related to regional, communication, and learning functions, areas related to problem solving and sensory organs, areas related to art and intelligence, areas related to information and communication, areas related to home and disability, topics, conceptualization, peer-related areas, integration, reorganization, attitudes. In conclusion, this study is meaningful in that it established a framework for designing an activity-centered convergence program of art and language in the future and attempted a holistic approach to support child development.