• 제목/요약/키워드: ACCURACY

Search Result 34,149, Processing Time 0.059 seconds

Detection of Phantom Transaction using Data Mining: The Case of Agricultural Product Wholesale Market (데이터마이닝을 이용한 허위거래 예측 모형: 농산물 도매시장 사례)

  • Lee, Seon Ah;Chang, Namsik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.161-177
    • /
    • 2015
  • With the rapid evolution of technology, the size, number, and the type of databases has increased concomitantly, so data mining approaches face many challenging applications from databases. One such application is discovery of fraud patterns from agricultural product wholesale transaction instances. The agricultural product wholesale market in Korea is huge, and vast numbers of transactions have been made every day. The demand for agricultural products continues to grow, and the use of electronic auction systems raises the efficiency of operations of wholesale market. Certainly, the number of unusual transactions is also assumed to be increased in proportion to the trading amount, where an unusual transaction is often the first sign of fraud. However, it is very difficult to identify and detect these transactions and the corresponding fraud occurred in agricultural product wholesale market because the types of fraud are more intelligent than ever before. The fraud can be detected by verifying the overall transaction records manually, but it requires significant amount of human resources, and ultimately is not a practical approach. Frauds also can be revealed by victim's report or complaint. But there are usually no victims in the agricultural product wholesale frauds because they are committed by collusion of an auction company and an intermediary wholesaler. Nevertheless, it is required to monitor transaction records continuously and to make an effort to prevent any fraud, because the fraud not only disturbs the fair trade order of the market but also reduces the credibility of the market rapidly. Applying data mining to such an environment is very useful since it can discover unknown fraud patterns or features from a large volume of transaction data properly. The objective of this research is to empirically investigate the factors necessary to detect fraud transactions in an agricultural product wholesale market by developing a data mining based fraud detection model. One of major frauds is the phantom transaction, which is a colluding transaction by the seller(auction company or forwarder) and buyer(intermediary wholesaler) to commit the fraud transaction. They pretend to fulfill the transaction by recording false data in the online transaction processing system without actually selling products, and the seller receives money from the buyer. This leads to the overstatement of sales performance and illegal money transfers, which reduces the credibility of market. This paper reviews the environment of wholesale market such as types of transactions, roles of participants of the market, and various types and characteristics of frauds, and introduces the whole process of developing the phantom transaction detection model. The process consists of the following 4 modules: (1) Data cleaning and standardization (2) Statistical data analysis such as distribution and correlation analysis, (3) Construction of classification model using decision-tree induction approach, (4) Verification of the model in terms of hit ratio. We collected real data from 6 associations of agricultural producers in metropolitan markets. Final model with a decision-tree induction approach revealed that monthly average trading price of item offered by forwarders is a key variable in detecting the phantom transaction. The verification procedure also confirmed the suitability of the results. However, even though the performance of the results of this research is satisfactory, sensitive issues are still remained for improving classification accuracy and conciseness of rules. One such issue is the robustness of data mining model. Data mining is very much data-oriented, so data mining models tend to be very sensitive to changes of data or situations. Thus, it is evident that this non-robustness of data mining model requires continuous remodeling as data or situation changes. We hope that this paper suggest valuable guideline to organizations and companies that consider introducing or constructing a fraud detection model in the future.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

The Individual Discrimination Location Tracking Technology for Multimodal Interaction at the Exhibition (전시 공간에서 다중 인터랙션을 위한 개인식별 위치 측위 기술 연구)

  • Jung, Hyun-Chul;Kim, Nam-Jin;Choi, Lee-Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.19-28
    • /
    • 2012
  • After the internet era, we are moving to the ubiquitous society. Nowadays the people are interested in the multimodal interaction technology, which enables audience to naturally interact with the computing environment at the exhibitions such as gallery, museum, and park. Also, there are other attempts to provide additional service based on the location information of the audience, or to improve and deploy interaction between subjects and audience by analyzing the using pattern of the people. In order to provide multimodal interaction service to the audience at the exhibition, it is important to distinguish the individuals and trace their location and route. For the location tracking on the outside, GPS is widely used nowadays. GPS is able to get the real time location of the subjects moving fast, so this is one of the important technologies in the field requiring location tracking service. However, as GPS uses the location tracking method using satellites, the service cannot be used on the inside, because it cannot catch the satellite signal. For this reason, the studies about inside location tracking are going on using very short range communication service such as ZigBee, UWB, RFID, as well as using mobile communication network and wireless lan service. However these technologies have shortcomings in that the audience needs to use additional sensor device and it becomes difficult and expensive as the density of the target area gets higher. In addition, the usual exhibition environment has many obstacles for the network, which makes the performance of the system to fall. Above all these things, the biggest problem is that the interaction method using the devices based on the old technologies cannot provide natural service to the users. Plus the system uses sensor recognition method, so multiple users should equip the devices. Therefore, there is the limitation in the number of the users that can use the system simultaneously. In order to make up for these shortcomings, in this study we suggest a technology that gets the exact location information of the users through the location mapping technology using Wi-Fi and 3d camera of the smartphones. We applied the signal amplitude of access point using wireless lan, to develop inside location tracking system with lower price. AP is cheaper than other devices used in other tracking techniques, and by installing the software to the user's mobile device it can be directly used as the tracking system device. We used the Microsoft Kinect sensor for the 3D Camera. Kinect is equippedwith the function discriminating the depth and human information inside the shooting area. Therefore it is appropriate to extract user's body, vector, and acceleration information with low price. We confirm the location of the audience using the cell ID obtained from the Wi-Fi signal. By using smartphones as the basic device for the location service, we solve the problems of additional tagging device and provide environment that multiple users can get the interaction service simultaneously. 3d cameras located at each cell areas get the exact location and status information of the users. The 3d cameras are connected to the Camera Client, calculate the mapping information aligned to each cells, get the exact information of the users, and get the status and pattern information of the audience. The location mapping technique of Camera Client decreases the error rate that occurs on the inside location service, increases accuracy of individual discrimination in the area through the individual discrimination based on body information, and establishes the foundation of the multimodal interaction technology at the exhibition. Calculated data and information enables the users to get the appropriate interaction service through the main server.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

A Topic Modeling-based Recommender System Considering Changes in User Preferences (고객 선호 변화를 고려한 토픽 모델링 기반 추천 시스템)

  • Kang, So Young;Kim, Jae Kyeong;Choi, Il Young;Kang, Chang Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.43-56
    • /
    • 2020
  • Recommender systems help users make the best choice among various options. Especially, recommender systems play important roles in internet sites as digital information is generated innumerable every second. Many studies on recommender systems have focused on an accurate recommendation. However, there are some problems to overcome in order for the recommendation system to be commercially successful. First, there is a lack of transparency in the recommender system. That is, users cannot know why products are recommended. Second, the recommender system cannot immediately reflect changes in user preferences. That is, although the preference of the user's product changes over time, the recommender system must rebuild the model to reflect the user's preference. Therefore, in this study, we proposed a recommendation methodology using topic modeling and sequential association rule mining to solve these problems from review data. Product reviews provide useful information for recommendations because product reviews include not only rating of the product but also various contents such as user experiences and emotional state. So, reviews imply user preference for the product. So, topic modeling is useful for explaining why items are recommended to users. In addition, sequential association rule mining is useful for identifying changes in user preferences. The proposed methodology is largely divided into two phases. The first phase is to create user profile based on topic modeling. After extracting topics from user reviews on products, user profile on topics is created. The second phase is to recommend products using sequential rules that appear in buying behaviors of users as time passes. The buying behaviors are derived from a change in the topic of each user. A collaborative filtering-based recommendation system was developed as a benchmark system, and we compared the performance of the proposed methodology with that of the collaborative filtering-based recommendation system using Amazon's review dataset. As evaluation metrics, accuracy, recall, precision, and F1 were used. For topic modeling, collapsed Gibbs sampling was conducted. And we extracted 15 topics. Looking at the main topics, topic 1, top 3, topic 4, topic 7, topic 9, topic 13, topic 14 are related to "comedy shows", "high-teen drama series", "crime investigation drama", "horror theme", "British drama", "medical drama", "science fiction drama", respectively. As a result of comparative analysis, the proposed methodology outperformed the collaborative filtering-based recommendation system. From the results, we found that the time just prior to the recommendation was very important for inferring changes in user preference. Therefore, the proposed methodology not only can secure the transparency of the recommender system but also can reflect the user's preferences that change over time. However, the proposed methodology has some limitations. The proposed methodology cannot recommend product elaborately if the number of products included in the topic is large. In addition, the number of sequential patterns is small because the number of topics is too small. Therefore, future research needs to consider these limitations.

A Dynamic Prefetch Filtering Schemes to Enhance Usefulness Of Cache Memory (캐시 메모리의 유용성을 높이는 동적 선인출 필터링 기법)

  • Chon Young-Suk;Lee Byung-Kwon;Lee Chun-Hee;Kim Suk-Il;Jeon Joong-Nam
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.123-136
    • /
    • 2006
  • The prefetching technique is an effective way to reduce the latency caused memory access. However, excessively aggressive prefetch not only leads to cache pollution so as to cancel out the benefits of prefetch but also increase bus traffic leading to overall performance degradation. In this thesis, a prefetch filtering scheme is proposed which dynamically decides whether to commence prefetching by referring a filtering table to reduce the cache pollution due to unnecessary prefetches In this thesis, First, prefetch hashing table 1bitSC filtering scheme(PHT1bSC) has been shown to analyze the lock problem of the conventional scheme, this scheme such as conventional scheme used to be N:1 mapping, but it has the two state to 1bit value of each entries. A complete block address table filtering scheme(CBAT) has been introduced to be used as a reference for the comparative study. A prefetch block address lookup table scheme(PBALT) has been proposed as the main idea of this paper which exhibits the most exact filtering performance. This scheme has a length of the table the same as the PHT1bSC scheme, the contents of each entry have the fields the same as CBAT scheme recently, never referenced data block address has been 1:1 mapping a entry of the filter table. On commonly used prefetch schemes and general benchmarks and multimedia programs simulates change cache parameters. The PBALT scheme compared with no filtering has shown enhanced the greatest 22%, the cache miss ratio has been decreased by 7.9% by virtue of enhanced filtering accuracy compared with conventional PHT2bSC. The MADT of the proposed PBALT scheme has been decreased by 6.1% compared with conventional schemes to reduce the total execution time.

A Study on Mechanical Errors in Cone Beam Computed Tomography(CBCT) System (콘빔 전산화단층촬영(CBCT) 시스템에서 기계적 오류에 관한 연구)

  • Lee, Yi-Seong;Yoo, Eun-Jeong;Kim, Seung-Keun;Choi, Kyoung-Sik;Lee, Jeong-Woo;Suh, Tae-Suk;Kim, Joeng-Koo
    • Journal of radiological science and technology
    • /
    • v.36 no.2
    • /
    • pp.123-129
    • /
    • 2013
  • This study investigated the rate of setup variance by the rotating unbalance of gantry in image-guided radiation therapy. The equipments used linear accelerator(Elekta Synergy TM, UK) and a three-dimensional volume imaging mode(3D Volume View) in cone beam computed tomography(CBCT) system. 2D images obtained by rotating $360^{\circ}$and $180^{\circ}$ were reconstructed to 3D image. Catpan503 phantom and homogeneous phantom were used to measure the setup errors. Ball-bearing phantom was used to check the rotation axis of the CBCT. The volume image from CBCT using Catphan503 phantom and homogeneous phantom were analyzed and compared to images from conventional CT in the six dimensional view(X, Y, Z, Roll, Pitch, and Yaw). The variance ratio of setup error were difference in X 0.6 mm, Y 0.5 mm Z 0.5 mm when the gantry rotated $360^{\circ}$ in orthogonal coordinate. whereas rotated $180^{\circ}$, the error measured 0.9 mm, 0.2 mm, 0.3 mm in X, Y, Z respectively. In the rotating coordinates, the more increased the rotating unbalance, the more raised average ratio of setup errors. The resolution of CBCT images showed 2 level of difference in the table recommended. CBCT had a good agreement compared to each recommended values which is the mechanical safety, geometry accuracy and image quality. The rotating unbalance of gentry vary hardly in orthogonal coordinate. However, in rotating coordinate of gantry exceeded the ${\pm}1^{\circ}$ of recommended value. Therefore, when we do sophisticated radiation therapy six dimensional correction is needed.

Development and Analysis of COMS AMV Target Tracking Algorithm using Gaussian Cluster Analysis (가우시안 군집분석을 이용한 천리안 위성의 대기운동벡터 표적추적 알고리듬 개발 및 분석)

  • Oh, Yurim;Kim, Jae Hwan;Park, Hyungmin;Baek, Kanghyun
    • Korean Journal of Remote Sensing
    • /
    • v.31 no.6
    • /
    • pp.531-548
    • /
    • 2015
  • Atmospheric Motion Vector (AMV) from satellite images have shown Slow Speed Bias (SSB) in comparison with rawinsonde. The causes of SSB are originated from tracking, selection, and height assignment error, which is known to be the leading error. However, recent works have shown that height assignment error cannot be fully explained the cause of SSB. This paper attempts a new approach to examine the possibility of SSB reduction of COMS AMV by using a new target tracking algorithm. Tracking error can be caused by averaging of various wind patterns within a target and changing of cloud shape in searching process over time. To overcome this problem, Gaussian Mixture Model (GMM) has been adopted to extract the coldest cluster as target since the shape of such target is less subject to transformation. Then, an image filtering scheme is applied to weigh more on the selected coldest pixels than the other, which makes it easy to track the target. When AMV derived from our algorithm with sum of squared distance method and current COMS are compared with rawindsonde, our products show noticeable improvement over COMS products in mean wind speed by an increase of $2.7ms^{-1}$ and SSB reduction by 29%. However, the statistics regarding the bias show negative impact for mid/low level with our algorithm, and the number of vectors are reduced by 40% relative to COMS. Therefore, further study is required to improve accuracy for mid/low level winds and increase the number of AMV vectors.

Spectrophotometric Determination of Soil Chemical Properties Using Soiltek® KA-P Spectrophotometer (Soiltek KA-P 분광광도계률 사용한 토양 화학적 성질의 분광학적 분석)

  • Hyun, Hae-Nam;Oh, Sang-Sil;Koo, Bon-Jun;Kang, Ho-Jun
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.33 no.2
    • /
    • pp.127-138
    • /
    • 2000
  • To enable rapid and convenient soil test, new soil analytical methods, which require only one instrument, UV/Vis spectrophotometer, were developed and named "Soiltek KA-P spectrophotometric methods". The Soiltek$^{(R)}$ KA-P spectrophotometric method was compared with standard method of RDA in analytical capability for soil chemical properties. Using the 78 soils collected from upland, paddy, orchard, and vinyl house soils, soil organic matter, exchangeable K, Ca, and Mg. CEC, available $SiO_2$, and nitrate were analyzed by the two methods. The color stability(ratio of the absorbance at elapsed time t to the absorbance at time t=0) of organic matter. Ca, Mg, and available $SiO_2$ decreased to about 2% within one hour. However, that of exchangeable K, CEC, and nitrate remained constant. The results obtained with Soiltek$^{(R)}$ KA-P spectrophotometric method showed highly significant correlation with those measured by the standard method of RDA($R^2$ >0.9501), in which the slopes were near unity of $1.0{\pm}0.05$. The standard deviation values of organic matter, exchangeable K, Ca, and Mg, CEC, available $SiO_2$, and nitrate were apparently lower than ${\pm}1.8gkg^{-1}$, ${\pm}0.05cmol^+kg^{-1}$, ${\pm}0.18cmol^+kg^{-1}$, and ${\pm}0.13cmol^+kg^{-1}$, ${\pm}1.0cmol^+kg^{-1}$, ${\pm}5.0mgkg^{-1}$, and ${\pm}10.0mgkg^{-1}$, respectively. All the measurements showed coefficients of variation of less than 7~17% and were within the confidence level of 95%, which means both the methods are precise. Considering the relative simplicity, low cost, precision and accuracy, the proposed Soiltek$^{(R)}$ KA-P spectrophotometric methods could be recommended as an alternative to standard method.

  • PDF

Development of Devices for Improving the Reducibility of Patient Positioning on a Breast Board (Breast Board를 이용한 방사선치료에서 환자 위치 재현성 향상 방안에 대한 연구)

  • Huh Soon Nyung;Cho Woong;Park Yang Kyun;Ha Sung Whan
    • Radiation Oncology Journal
    • /
    • v.23 no.2
    • /
    • pp.123-130
    • /
    • 2005
  • Purpose: We wanted to improve the setup reproducibility of breast cancer patients when utilizing a commercially available breast board for radiation therapy. The breast board was modified by using a new head rest and 2 types of board fixation devices. Materials and Methods: A conventional head/neck rest was modified to be positioned in various slots of the breast board, and it was fabricated 1 cm thinner to provide more comfort to a patient when the patient's neck was rotated. This rest improves the uncertainty of the daily setup. Also, the sagging problems at the left and right sides became negligible with the two types of board fixation devices: (1) the stair type, and (2) the arm type. The first device consists of an upper/lower holder with 4 stair-types of grooves and 4 rectangular Inserts. In order to cover the whole range of vertical setup of the breast board, 4 rectangular inserts were needed, and each covered 10 steps. The arm-type fixation device was also fabricated and attached to the breast board, It had two aluminum bars that were fixed by utilizing a lock-type of screw. These devises were evaluated with two volunteers in order to prove the effectiveness of the improved setup accuracy. Results; The developed cranio-caudal fixation device demonstrated that it could reduce the cranio-caudal error by nearly $55\%$ compared to the old device. As for left-and-right inclination, the stair-type and arm-type fixation devices can reduce the relative inclination by nearly $80\%$ and $90\%$, respectively, compared to the breast board without the fixation device. Conclusion: It was verified that the developed devices were effective for positioning the patients and for avoiding inclination of the breast board.