• Title/Summary/Keyword: Big data processing

Search Result 1,057, Processing Time 0.024 seconds

Smart farm development strategy suitable for domestic situation -Focusing on ICT technical characteristics for the development of the industry6.0- (국내 실정에 적합한 스마트팜 개발 전략 -6차산업의 발전을 위한 ICT 기술적 특성을 중심으로-)

  • Han, Sang-Ho;Joo, Hyung-Kun
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.147-157
    • /
    • 2022
  • This study tried to propose a smart farm technology strategy suitable for the domestic situation, focusing on the differentiation suitable for the domestic situation of ICT technology. In the case of advanced countries in the overseas agricultural industry, it was confirmed that they focused on the development of a specific stage that reflected the geographical characteristics of each country, the characteristics of the agricultural industry, and the characteristics of the people's demand. Confirmed that no enemy development is being performed. Therefore, in response to problems such as a rapid decrease in the domestic rural population, aging population, loss of agricultural price competitiveness, increase in fallow land, and decrease in use rate of arable land, this study aims to develop smart farm ICT technology in the future to create quality agricultural products and have price competitiveness. It was suggested that the smart farm should be promoted by paying attention to the excellent performance, ease of use due to the aging of the labor force, and economic feasibility suitable for a small business scale. First, in terms of economic feasibility, the ICT technology is configured by selecting only the functions necessary for the small farm household (primary) business environment, and the smooth communication system with these is applied to the ICT technology to gradually update the functions required by the actual farmhouse. suggested that it may contribute to the reduction. Second, in terms of performance, it is suggested that the operation accuracy can be increased if attention is paid to improving the communication function of ICT, such as adjusting the difficulty of big data suitable for the aging population in Korea, using a language suitable for them, and setting an algorithm that reflects their prediction tendencies. Third, the level of ease of use. Smart farms based on ICT technology for the development of the Industry6.0 (1.0(Agriculture, Forestry) + 2.0(Agricultural and Water & Water Processing) + 3.0 (Service, Rural Experience, SCM)) perform operations according to specific commands, finally suggested that ease of use can be promoted by presetting and standardizing devices based on big data configuration customized for each regional environment.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Derivation of Inherent Optical Properties Based on Deep Neural Network (심층신경망 기반의 해수 고유광특성 도출)

  • Hyeong-Tak Lee;Hey-Min Choi;Min-Kyu Kim;Suk Yoon;Kwang-Seok Kim;Jeong-Eon Moon;Hee-Jeong Han;Young-Je Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.695-713
    • /
    • 2023
  • In coastal waters, phytoplankton,suspended particulate matter, and dissolved organic matter intricately and nonlinearly alter the reflectivity of seawater. Neural network technology, which has been rapidly advancing recently, offers the advantage of effectively representing complex nonlinear relationships. In previous studies, a three-stage neural network was constructed to extract the inherent optical properties of each component. However, this study proposes an algorithm that directly employs a deep neural network. The dataset used in this study consists of synthetic data provided by the International Ocean Color Coordination Group, with the input data comprising above-surface remote-sensing reflectance at nine different wavelengths. We derived inherent optical properties using this dataset based on a deep neural network. To evaluate performance, we compared it with a quasi-analytical algorithm and analyzed the impact of log transformation on the performance of the deep neural network algorithm in relation to data distribution. As a result, we found that the deep neural network algorithm accurately estimated the inherent optical properties except for the absorption coefficient of suspended particulate matter (R2 greater than or equal to 0.9) and successfully separated the sum of the absorption coefficient of suspended particulate matter and dissolved organic matter into the absorption coefficient of suspended particulate matter and dissolved organic matter, respectively. We also observed that the algorithm, when directly applied without log transformation of the data, showed little difference in performance. To effectively apply the findings of this study to ocean color data processing, further research is needed to perform learning using field data and additional datasets from various marine regions, compare and analyze empirical and semi-analytical methods, and appropriately assess the strengths and weaknesses of each algorithm.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Ontology-based Context-aware Framework for Battlefield Surveillance Sensor Network System (전장감시 센서네트워크시스템을 위한 온톨로지 기반 상황인식 프레임워크)

  • Shon, Ho-Sun;Park, Seong-Seung;Jeon, Seo-In;Ryu, Keun-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.4
    • /
    • pp.9-20
    • /
    • 2011
  • Future warfare paradigm is changing to network-centric warfare and effects-based operations. In order to find first and strike the enemy in the battlefield, friendly unit requires real-time target acquisition, intelligence collection, accurate situation assessment, and timely decision. The rapid development in advanced sensor technology and wireless networks requires a significant change in operational concepts of the battlefield surveillance. In particular, the introduction of a battlefield surveillance sensor network system is a big challenge to the ground forces which have lack of automated information collection assets. Therefore this paper proposes an ontology-based context-aware framework for the battlefield surveillance sensor network system which is needed for early finding the enemy and visualizing the battlefield in the ground force operations. Compared with the performance of existing systems, the one of the proposed framework has shown highly positive results by applying the context systems evaluation method. The framework has also proven to be satisfactory by the structured evaluation method using device collaboration. Since the proposed ontology-based context-aware framework has a lot of advantages in terms of scalability and reusability, the ground force's reconnaissance and surveillance system can be widely applied to expand in the future. And, ontology-based model has some weak points such as ontology data size, processing time, and limitation of network bandwidth. However, these problems can be resolved by customizing properly to fit the mission and characteristics of the unit. Moreover, development of the next-generation communication infrastructure can expedite the intelligent surveillance and reconnaissance service and may be expected to contribute greatly to expanding the information capacity.

A study on legal service of AI

  • Park, Jong-Ryeol;Noe, Sang-Ouk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.7
    • /
    • pp.105-111
    • /
    • 2018
  • Last March, the world Go competition between AlphaGo, AI Go program developed by Google Deep Mind and professional Go player Lee Sedol has shown us that the 4th industrial revolution using AI has come close. Especially, there ar many system combined with AI hae been developing including program for researching legal information, system for expecting jurisdiction, and processing big data, there is saying that even AI legal person is ready for its appearance. As legal field is mostly based on text-based document, such characteristic makes it easier to adopt artificial intelligence technology. When a legal person receives a case, the first thing to do is searching for legal information and judical precedent, which is the one of the strength of AI. It is very difficult for a human being to utilize a flow of legal knowledge and figures by analyzing them but for AI, this is nothing but a simple job. The ability of AI searching for regulation, precedent, and literature related to legal issue is way over our expectation. AI is evaluated to be able to review 1 billion pages of legal document per second and many people agree that lot of legal job will be replaced by AI. Along with development of AI service, legal service is becoming more advanced and if it devotes to ethical solving of legal issues, which is the final goal, not only the legal field but also it will help to gain nation's trust. If nations start to trust the legal service, it would never be completely replaced by AI. What is more, if it keeps offering advanced, ethical, and quick legal service, value of law devoting to the society will increase and finally, will make contribution to the nation. In this time where we have to compete with AI, we should try hard to increase value of traditional legal service provided by human. In the future, priority of good legal person will be his/her ability to use AI. The only field left to human will be understanding and recovering emotion of human caused by legal problem, which cannot be done by AI's controlling function. Then, what would be the attitude of legal people in this period? It would be to learn the new technology and applying in the field rather than going against it, this will be the way to survive in this new AI period.

Asynchronous Message Pushing Framework between Android Devices using Remote Intent (Remote Intent를 이용한 안드로이드 장치 간 비동기식 메시지 푸싱 프레임워크)

  • Baek, Jihun;Nam, Yongwoo;Park, Sangwon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.8
    • /
    • pp.517-526
    • /
    • 2013
  • When developing an android mobile application the androids intent is used as a mechanism to send messages between local equipment of androids application inner part and other applications. But the androids intent does not support sending messages via each android products intent. If there is a way to support each androids equipments to send messages, it will be easier to make non-stopping services. Non-stopping service is used when the user is using the android to do word or searching services and suddenly changes to a different android product but still maintains the progress what was currently being done without waiting the programs to be loaded. It is possible to send messages to each android products by using the socket, but the connection must be maintained stably which is the weak point. In this paper, I am suggesting a BRIF(Broadcasting Remote Intent Framework) framework to send messages to different android products. BRIF is a framework that uses the Googles C2DM service which services asynchronous transmissions to different android products. This is organized with the C2DM server, RemoteContext Api, web server and RISP(Remote Intent Service Provider) which is will be easy to be used for the developers since there are no big changes for coding compared to the intent code.

Security Requirements Analysis on IP Camera via Threat Modeling and Common Criteria (보안위협모델링과 국제공통평가기준을 이용한 IP Camera 보안요구사항 분석)

  • Park, Jisoo;Kim, Seungjoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.121-134
    • /
    • 2017
  • With rapid increasing the development and use of IoT Devices, requirements for safe IoT devices and services such as reliability, security are also increasing. In Security engineering, SDLC (Secure Development Life Cycle) is applied to make the trustworthy system. Secure Development Life Cycle has 4 big steps, Security requirements, Design, Implementation and Operation and each step has own goals and activities. Deriving security requirements, the first step of SDLC, must be accurate and objective because it affect the rest of the SDLC. For accurate and objective security requirements, Threat modeling is used. And the results of the threat modeling can satisfy the completeness of scope of analysis and the traceability of threats. In many countries, academic and IT company, a lot of researches about drawing security requirements systematically are being done. But in domestic, awareness and researches about deriving security requirements systematically are lacking. So in this paper, I described about method and process to drawing security requirements systematically by using threat modeling including DFD, STRIDE, Attack Library and Attack Tree. And also security requirements are described via Common Criteria for delivering objective meaning and broad use of them.