• Title/Summary/Keyword: Machine classification

Search Result 2,079, Processing Time 0.029 seconds

Applying NIST AI Risk Management Framework: Case Study on NTIS Database Analysis Using MAP, MEASURE, MANAGE Approaches (NIST AI 위험 관리 프레임워크 적용: NTIS 데이터베이스 분석의 MAP, MEASURE, MANAGE 접근 사례 연구)

  • Jung Sun Lim;Seoung Hun, Bae;Taehoon Kwon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.21-29
    • /
    • 2024
  • Fueled by international efforts towards AI standardization, including those by the European Commission, the United States, and international organizations, this study introduces a AI-driven framework for analyzing advancements in drone technology. Utilizing project data retrieved from the NTIS DB via the "drone" keyword, the framework employs a diverse toolkit of supervised learning methods (Keras MLP, XGboost, LightGBM, and CatBoost) enhanced by BERTopic (natural language analysis tool). This multifaceted approach ensures both comprehensive data quality evaluation and in-depth structural analysis of documents. Furthermore, a 6T-based classification method refines non-applicable data for year-on-year AI analysis, demonstrably improving accuracy as measured by accuracy metric. Utilizing AI's power, including GPT-4, this research unveils year-on-year trends in emerging keywords and employs them to generate detailed summaries, enabling efficient processing of large text datasets and offering an AI analysis system applicable to policy domains. Notably, this study not only advances methodologies aligned with AI Act standards but also lays the groundwork for responsible AI implementation through analysis of government research and development investments.

A Comparative Study on Mapping and Filtering Radii of Local Climate Zone in Changwon city using WUDAPT Protocol (WUDAPT 절차를 활용한 창원시의 국지기후대 제작과 필터링 반경에 따른 비교 연구)

  • Tae-Gyeong KIM;Kyung-Hun PARK;Bong-Geun SONG;Seoung-Hyeon KIM;Da-Eun JEONG;Geon-Ung PARK
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.27 no.2
    • /
    • pp.78-95
    • /
    • 2024
  • For the establishment and comparison of environmental plans across various domains, considering climate change and urban issues, it is crucial to build spatial data at the regional scale classified with consistent criteria. This study mapping the Local Climate Zone (LCZ) of Changwon City, where active climate and environmental research is being conducted, using the protocol suggested by the World Urban Database and Access Portal Tools (WUDAPT). Additionally, to address the fragmentation issue where some grids are classified with different climate characteristics despite being in regions with homogeneous climate traits, a filtering technique was applied, and the LCZ classification characteristics were compared according to the filtering radius. Using satellite images, ground reference data, and the supervised classification machine learning technique Random Forest, classification maps without filtering and with filtering radii of 1, 2, and 3 were produced, and their accuracies were compared. Furthermore, to compare the LCZ classification characteristics according to building types in urban areas, an urban form index used in GIS-based classification methodology was created and compared with the ranges suggested in previous studies. As a result, the overall accuracy was highest when the filtering radius was 1. When comparing the urban form index, the differences between LCZ types were minimal, and most satisfied the ranges of previous studies. However, the study identified a limitation in reflecting the height information of buildings, and it is believed that adding data to complement this would yield results with higher accuracy. The findings of this study can be used as reference material for creating fundamental spatial data for environmental research related to urban climates in South Korea.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.

A fundamental study on the automation of tunnel blasting design using a machine learning model (머신러닝을 이용한 터널발파설계 자동화를 위한 기초연구)

  • Kim, Yangkyun;Lee, Je-Kyum;Lee, Sean Seungwon
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.5
    • /
    • pp.431-449
    • /
    • 2022
  • As many tunnels generally have been constructed, various experiences and techniques have been accumulated for tunnel design as well as tunnel construction. Hence, there are not a few cases that, for some usual tunnel design works, it is sufficient to perform the design by only modifying or supplementing previous similar design cases unless a tunnel has a unique structure or in geological conditions. In particular, for a tunnel blast design, it is reasonable to refer to previous similar design cases because the blast design in the stage of design is a preliminary design, considering that it is general to perform additional blast design through test blasts prior to the start of tunnel excavation. Meanwhile, entering the industry 4.0 era, artificial intelligence (AI) of which availability is surging across whole industry sector is broadly utilized to tunnel and blasting. For a drill and blast tunnel, AI is mainly applied for the estimation of blast vibration and rock mass classification, etc. however, there are few cases where it is applied to blast pattern design. Thus, this study attempts to automate tunnel blast design by means of machine learning, a branch of artificial intelligence. For this, the data related to a blast design was collected from 25 tunnel design reports for learning as well as 2 additional reports for the test, and from which 4 design parameters, i.e., rock mass class, road type and cross sectional area of upper section as well as bench section as input data as well as16 design elements, i.e., blast cut type, specific charge, the number of drill holes, and spacing and burden for each blast hole group, etc. as output. Based on this design data, three machine learning models, i.e., XGBoost, ANN, SVM, were tested and XGBoost was chosen as the best model and the results show a generally similar trend to an actual design when assumed design parameters were input. It is not enough yet to perform the whole blast design using the results from this study, however, it is planned that additional studies will be carried out to make it possible to put it to practical use after collecting more sufficient blast design data and supplementing detailed machine learning processes.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

A Study on the Calculation of the Area for Behavior as an Element in Planning the Floor Space of the Elderly Housing (노인주택 면적계획을 위한 요소로서 행위면적 산출 연구)

  • Lee, Youn-Jae;Lee, Hyun-Soo
    • Journal of the Korean housing association
    • /
    • v.20 no.1
    • /
    • pp.59-70
    • /
    • 2009
  • The purpose of the study is to suggest the amount of space for each behavior according to the classification of behavior in the housing to plan the optimal floor space of the elderly housing. The method for calculating space for behavior begins with classifying behaviors, identifying them and then taking pictures of the model of elderly people who reproduce each behavior. Based on the pictures, body parts which are necessary for each behavior are assembled and the formula for behavioral space is created. The space for behavior is produced considering the body dimensions of Korean elderly in their sixty's as well as the furniture size and the psychological distance between people. 3D modeling is used to verify the result. Human behaviors can be classified into individual-related, housework-related, family-related, reception-related and other behaviors. These five behaviors are subdivided into more specific behaviors. The area for each specific behavior is calculated with the anthropometric data of the elderly, preferred furniture dimension and psychological area. As a result the required area for specific behaviors is as follows: the behavior of sleeping in a bed needs $4.3m^2$; the behavior of changing clothes on a chair, $1.7m^2$; the behavior of watching TV on the floor $1.3m^2$, the behavior of working and reading using a desk, $2.1m^2$, the behavior of exercise, $2.5m^2$; the behavior of showering on a chair, $1.3m^2$ and showering using a wheelchair, $1.9m^2$; the behavior of toileting using a wheelchair, $2.3m^2$; the behavior of washing up using a wheelchair, $1.9m^2$; the behavior of eating using a table for four persons, $4.4m^2$; the behavior of cooking and washing dishes, $0.9m^2$ per counter-top; the behavior of washing clothes using a washing machine, $0.9m^2$; the behavior of ironing on the floor $1.4m^2$; the behavior of reception(three persons) on the floor considering personal space, $4.0m^2$; the behavior of taking on and off shoes on a chair, $1.3m^2$. The result of the study is utilized as quantitative data to calculate optimal floor space for elderly housing. In addition, qualitative data such as characteristics of housing preference, spacial usage and storage capacity are necessary to produce the floor space which can provide convenient and safe living environment.

Algorithm and Performance Evaluation of High-speed Distinction for Condition Recognition of Defective Nut (불량 너트의 상태인식을 위한 고속 판별 알고리즘 및 성능평가)

  • Park, Tae-Jin;Lee, Un-Seon;Lee, Sang-Hee;Park, Man-Gon
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.7
    • /
    • pp.895-904
    • /
    • 2011
  • In welding machine that executes existing spot welding, wrong operation of system has often occurs because of their mechanical motion that can be caused by a number of supply like the welding object. In exposed working environment for various situations such as worker or related equipment moving into any place that we are unable to exactly distinguish between good and not condition of nut. Also, in case of defective welding of nut, it needs various evaluation and analysis through image processing because the problem that worker should be inspected every single manually. Therefore in this paper, if the object was not stabilization state correctly, we have purpose to algorithm implementation that it is to reduce the analysis time and exact recognition as to improve system of image processing. As this like, as image analysis for assessment whether it is good or not condition of nut, in his paper, implemented algorithms were suggested and list by group and that it showed the effectiveness through more than one experiment. As the result, recognition rate of normality and error according to the estimation time have been shown as 40%~94.6% and 60%~5.4% from classification 1 of group 1 to classification 11 of group 5, and that estimation time of minimum, maximum, and average have been shown as 1.7sec.~0.08sec., 3.6sec.~1.2sec., and 2.5sec.~0.1sec.

A Scheme for Identifying Malicious Applications Based on API Characteristics (API 특성 정보기반 악성 애플리케이션 식별 기법)

  • Cho, Taejoo;Kim, Hyunki;Lee, Junghwan;Jung, Moongyu;Yi, Jeong Hyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.1
    • /
    • pp.187-196
    • /
    • 2016
  • Android applications are inherently vulnerable to a repackaging attack such that malicious codes are easily inserted into an application and then resigned by the attacker. These days, it occurs often that such private or individual information is leaked. In principle, all Android applications are composed of user defined methods and APIs. As well as accessing to resources on platform, APIs play a role as a practical functional feature, and user defined methods play a role as a feature by using APIs. In this paper we propose a scheme to analyze sensitive APIs mostly used in malicious applications in terms of how malicious applications operate and which API they use. Based on the characteristics of target APIs, we accumulate the knowledge on such APIs using a machine learning scheme based on Naive Bayes algorithm. Resulting from the learned results, we are able to provide fine-grained numeric score on the degree of vulnerabilities of mobile applications. In doing so, we expect the proposed scheme will help mobile application developers identify the security level of applications in advance.