• Title/Summary/Keyword: fraud detection

Search Result 128, Processing Time 0.03 seconds

Usefulness of Data Mining in Criminal Investigation (데이터 마이닝의 범죄수사 적용 가능성)

  • Kim, Joon-Woo;Sohn, Joong-Kweon;Lee, Sang-Han
    • Journal of forensic and investigative science
    • /
    • v.1 no.2
    • /
    • pp.5-19
    • /
    • 2006
  • Data mining is an information extraction activity to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Law enforcement agencies deal with mass data to investigate the crime and its amount is increasing due to the development of processing the data by using computer. Now new challenge to discover knowledge in that data is confronted to us. It can be applied in criminal investigation to find offenders by analysis of complex and relational data structures and free texts using their criminal records or statement texts. This study was aimed to evaluate possibile application of data mining and its limitation in practical criminal investigation. Clustering of the criminal cases will be possible in habitual crimes such as fraud and burglary when using data mining to identify the crime pattern. Neural network modelling, one of tools in data mining, can be applied to differentiating suspect's photograph or handwriting with that of convict or criminal profiling. A case study of in practical insurance fraud showed that data mining was useful in organized crimes such as gang, terrorism and money laundering. But the products of data mining in criminal investigation should be cautious for evaluating because data mining just offer a clue instead of conclusion. The legal regulation is needed to control the abuse of law enforcement agencies and to protect personal privacy or human rights.

  • PDF

Stock Market Forecasting : Comparison between Artificial Neural Networks and Arch Models

  • Merh, Nitin
    • Journal of Information Technology Applications and Management
    • /
    • v.19 no.1
    • /
    • pp.1-12
    • /
    • 2012
  • Data mining is the process of searching and analyzing large quantities of data for finding out meaningful patterns and rules. Artificial Neural Network (ANN) is one of the tools of data mining which is becoming very popular in forecasting the future values. Some of the areas where it is used are banking, medicine, retailing and fraud detection. In finance, artificial neural network is used in various disciplines including stock market forecasting. In the stock market time series, due to high volatility, it is very important to choose a model which reads volatility and forecasts the future values considering volatility as one of the major attributes for forecasting. In this paper, an attempt is made to develop two models - one using feed forward back propagation Artificial Neural Network and the other using Autoregressive Conditional Heteroskedasticity (ARCH) technique for forecasting stock market returns. Various parameters which are considered for the design of optimal ANN model development are input and output data normalization, transfer function and neuron/s at input, hidden and output layers, number of hidden layers, values with respect to momentum, learning rate and error tolerance. Simulations have been done using prices of daily close of Sensex. Stock market returns are chosen as input data and output is the forecasted return. Simulations of the Model have been done using MATLAB$^{(R)}$ 6.1.0.450 and EViews 4.1. Convergence and performance of models have been evaluated on the basis of the simulation results. Performance evaluation is done on the basis of the errors calculated between the actual and predicted values.

MEAT SPECIATION USING A HIERARCHICAL APPROACH AND LOGISTIC REGRESSION

  • Arnalds, Thosteinn;Fearn, Tom;Downey, Gerard
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1245-1245
    • /
    • 2001
  • Food adulteration is a serious consumer fraud and a matter of concern to food processors and regulatory agencies. A range of analytical methods have been investigated to facilitate the detection of adulterated or mis-labelled foods & food ingredients but most of these require sophisticated equipment, highly-qualified staff and are time-consuming. Regulatory authorities and the food industry require a screening technique which will facilitate fast and relatively inexpensive monitoring of food products with a high level of accuracy. Near infrared spectroscopy has been investigated for its potential in a number of authenticity issues including meat speciation (McElhinney, Downey & Fearn (1999) JNIRS, 7(3), 145-154; Downey, McElhinney & Fearn (2000). Appl. Spectrosc. 54(6), 894-899). This report describes further analysis of these spectral sets using a hierarchical approach and binary decisions solved using logistic regression. The sample set comprised 230 homogenized meat samples i. e. chicken (55), turkey (54), pork (55), beef (32) and lamb (34) purchased locally as whole cuts of meat over a 10-12 week period. NIR reflectance spectra were recorded over the wavelength range 400-2498nm at 2nm intervals on a NIR Systems 6500 scanning monochromator. The problem was defined as a series of binary decisions i. e. is the meat red or white\ulcorner is the red meat beef or lamb\ulcorner, is the white meat pork or poultry\ulcorner etc. Each of these decisions was made using an individual binary logistic model based on scores derived from principal component or partial least squares (PLS1 and PLS2) analysis. The results obtained were equal to or better than previous reports using factorial discriminant analysis, K-nearest neighbours and PLS2 regression. This new approach using a combination of exploratory and logistic analyses also appears to have advantages of transparency and the use of inherent structure in the spectral data. Additionally, it allows for the use of different data transforms and multivariate regression techniques at each decision step.

  • PDF

MEAT SPECIATION USING A HIERARCHICAL APPROACH AND LOGISTIC REGRESSION

  • Arnalds, Thosteinn;Fearn, Tom;Downey, Gerard
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1152-1152
    • /
    • 2001
  • Food adulteration is a serious consumer fraud and a matter of concern to food processors and regulatory agencies. A range of analytical methods have been investigated to facilitate the detection of adulterated or mis-labelled foods & food ingredients but most of these require sophisticated equipment, highly-qualified staff and are time-consuming. Regulatory authorities and the food industry require a screening technique which will facilitate fast and relatively inexpensive monitoring of food products with a high level of accuracy. Near infrared spectroscopy has been investigated for its potential in a number of authenticity issues including meat speciation (McElhinney, Downey & Fearn (1999) JNIRS, 7(3), 145 154; Downey, McElhinney & Fearn (2000). Appl. Spectrosc. 54(6), 894-899). This report describes further analysis of these spectral sets using a hierarchical approach and binary decisions solved using logistic regression. The sample set comprised 230 homogenized meat samples i. e. chicken (55), turkey (54), pork (55), beef (32) and lamb (34) purchased locally as whole cuts of meat over a 10-12 week period. NIR reflectance spectra were recorded over the wavelength range 400-2498nm at 2nm intervals on a NIR Systems 6500 scanning monochromator. The problem was defined as a series of binary decisions i. e. is the meat red or white\ulcorner is the red meat beef or lamb\ulcorner, is the white meat pork or poultry\ulcorner etc. Each of these decisions was made using an individual binary logistic model based on scores derived from principal component or partial least squares (PLS1 and PLS2) analysis. The results obtained were equal to or better than previous reports using factorial discriminant analysis, K-nearest neighbours and PLS2 regression. This new approach using a combination of exploratory and logistic analyses also appears to have advantages of transparency and the use of inherent structure in the spectral data. Additionally, it allows for the use of different data transforms and multivariate regression techniques at each decision step.

  • PDF

A Study on User Authentication Model Using Device Fingerprint Based on Web Standard (표준 웹 환경 디바이스 핑거프린트를 활용한 이용자 인증모델 연구)

  • Park, Sohee;Jang, Jinhyeok;Choi, Daeseon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.631-646
    • /
    • 2020
  • The government is pursuing a policy to remove plug-ins for public and private websites to create a convenient Internet environment for users. In general, financial institution websites that provide financial services, such as banks and credit card companies, operate fraud detection system(FDS) to enhance the stability of electronic financial transactions. At this time, the installation software is used to collect and analyze the user's information. Therefore, there is a need for an alternative technology and policy that can collect user's information without installing software according to the no-plug-in policy. This paper introduces the device fingerprinting that can be used in the standard web environment and suggests a guideline to select from various techniques. We also propose a user authentication model using device fingerprints based on machine learning. In addition, we actually collected device fingerprints from Chrome and Explorer users to create a machine learning algorithm based Multi-class authentication model. As a result, the Chrome-based Authentication model showed about 85%~89% perfotmance, the Explorer-based Authentication model showed about 93%~97% performance.

A study on removal of unnecessary input variables using multiple external association rule (다중외적연관성규칙을 이용한 불필요한 입력변수 제거에 관한 연구)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.877-884
    • /
    • 2011
  • The decision tree is a representative algorithm of data mining and used in many domains such as retail target marketing, fraud detection, data reduction, variable screening, category merging, etc. This method is most useful in classification problems, and to make predictions for a target group after dividing it into several small groups. When we create a model of decision tree with a large number of input variables, we suffer difficulties in exploration and analysis of the model because of complex trees. And we can often find some association exist between input variables by external variables despite of no intrinsic association. In this paper, we study on the removal method of unnecessary input variables using multiple external association rules. And then we apply the removal method to actual data for its efficiencies.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

The Base of Understanding for Interdisciplinary Studies on Cyber Crimes - Centering on Regulations in Criminal Law - (사이버범죄의 학제간 연구를 위한 이해의 기초 - 형법상 규제를 중심으로 -)

  • Lim, Byoung-Rak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.237-242
    • /
    • 2008
  • This study aims to provide theoretical base in criminal law for engineers in the viewpoint of jurists to encourage interdisciplinary studies on cyber crimes. Apart from seriousness of discussion on torrent cyber crimes, a good effect of the internet networks such as sharing of information has bee emphasized while the evil influence of its side effect has been neglected. Therefore, this study suggests that we need to consider reinforcement of cyber ethics, and legal mind of IT technicians, strict security by managers, active efforts to develop legitimate contents by managers of web hardware and P2P, and reinforcement of punishments against crimes by internet users. And this study approaches new norms on computer and cyber crimes in interpretational sense of criminal law, and provides the theoretical base of the criminal law focusing on traditional theories, assumptions, and precedents involved in regulations against computer virus distribution.

  • PDF

Determination of the Authenticity of Dairy Products on the Basis of Fatty Acids and Triacylglycerols Content using GC Analysis

  • Park, Jung-Min;Kim, Na-Kyeong;Yang, Cheul-Young;Moon, Kyong-Whan;Kim, Jin-Man
    • Food Science of Animal Resources
    • /
    • v.34 no.3
    • /
    • pp.316-324
    • /
    • 2014
  • Milk fat is an important food component, and plays a significant role in the economics, functional nutrition, and chemical properties of dairy products. Dairy products also contain nutritional resources and essential fatty acids (FAs). Because of the increasing demand for dairy products, milk fat is a common target in economic fraud. Specifically, milk fat is often replaced with cheaper or readily available vegetable oils or animal fats. In this study, a method for the discrimination of milk fat was developed, using FAs profiles, and triacylglycerols (TGs) profiles. A total of 11 samples were evaluated: four milk fats (MK), four vegetable oils (VG), two pork lards (PL), and one beef tallow (BT). Gas chromathgraphy analysis were performed, to monitor the FAs content and TGs composition in MK, VG, PL, and BT. The result showed that qualitative determination of the MK of samples adulterated with different vegetable oils and animal fats was possible by a visual comparision of FAs, using C14:0, C16:0, C18:1n9c, C18:0, and C18:2n6c, and of TGs, using C36, C38, C40, C50, C52, and C54 profiles. Overall, the objective of this study was to evaluate the potential of the use of FAs and TGs in the detection of adulterated milk fat, and accordingly characterize the samples by the adulterant oil source, and level of adulteration. Also, based on this preliminary investigation, the usefulness of this approach could be tested for other oils in the future.

Molecular Identification of Korean Mountain Ginseng Using an Amplification Refractory Mutation System (ARMS)

  • In, Jun-Gyo;Kim, Min-Kyeoung;Lee, Ok-Ran;Kim, Yu-Jin;Lee, Beom-Soo;Kim, Se-Young;Kwon, Woo-Seang;Yang, Deok-Chun
    • Journal of Ginseng Research
    • /
    • v.34 no.1
    • /
    • pp.41-46
    • /
    • 2010
  • Expensive herbs such as ginseng are always a possible target for fraudulent labeling. New mountain ginseng strains have occasionally been found deep within mountain areas and commercially traded at exorbitant prices. However, until now, no scientific basis has existed to distinguish such ginseng from commonly cultivated ginseng species other than by virtue of being found within deep mountain areas. Polymerase chain reaction (PCR) analysis of the internal transcribed spacer has been shown to be an appropriate method for the identification of the most popular species (Panax ginseng) in the Panax ginseng genus. A single nucleotide polymorphism (SNP) has been identified between three newly found mountain ginseng (KGD4, KGD5, and KW1) and already established Panax species. Specific PCR primers were designed from this SNP site within the sequence data and used to detect the mountain ginseng strains via multiplex PCR. The established multiplex-PCR method for the simultaneous detection of newly found mountain ginseng strains, Korean ginseng, and foreign ginseng in a single reaction was determined to be effective. This study is the first report of scientific discrimination of "mountain ginsengs" and describes an effective method of identification for fraud prevention and for uncovering the possible presence of other, cheaper ginseng species on the market.