• Title/Summary/Keyword: estimation by learning

Search Result 613, Processing Time 0.024 seconds

A Unicode based Deep Handwritten Character Recognition model for Telugu to English Language Translation

  • BV Subba Rao;J. Nageswara Rao;Bandi Vamsi;Venkata Nagaraju Thatha;Katta Subba Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.101-112
    • /
    • 2024
  • Telugu language is considered as fourth most used language in India especially in the regions of Andhra Pradesh, Telangana, Karnataka etc. In international recognized countries also, Telugu is widely growing spoken language. This language comprises of different dependent and independent vowels, consonants and digits. In this aspect, the enhancement of Telugu Handwritten Character Recognition (HCR) has not been propagated. HCR is a neural network technique of converting a documented image to edited text one which can be used for many other applications. This reduces time and effort without starting over from the beginning every time. In this work, a Unicode based Handwritten Character Recognition(U-HCR) is developed for translating the handwritten Telugu characters into English language. With the use of Centre of Gravity (CG) in our model we can easily divide a compound character into individual character with the help of Unicode values. For training this model, we have used both online and offline Telugu character datasets. To extract the features in the scanned image we used convolutional neural network along with Machine Learning classifiers like Random Forest and Support Vector Machine. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMS-P) and Adaptative Moment Estimation (ADAM)optimizers are used in this work to enhance the performance of U-HCR and to reduce the loss function value. This loss value reduction can be possible with optimizers by using CNN. In both online and offline datasets, proposed model showed promising results by maintaining the accuracies with 90.28% for SGD, 96.97% for RMS-P and 93.57% for ADAM respectively.

Real-time private consumption prediction using big data (빅데이터를 이용한 실시간 민간소비 예측)

  • Seung Jun Shin;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.13-38
    • /
    • 2024
  • As economic uncertainties have increased recently due to COVID-19, there is a growing need to quickly grasp private consumption trends that directly reflect the economic situation of private economic entities. This study proposes a method of estimating private consumption in real-time by comprehensively utilizing big data as well as existing macroeconomic indicators. In particular, it is intended to improve the accuracy of private consumption estimation by comparing and analyzing various machine learning methods that are capable of fitting ultra-high-dimensional big data. As a result of the empirical analysis, it has been demonstrated that when the number of covariates including big data is large, variables can be selected in advance and used for model fit to improve private consumption prediction performance. In addition, as the inclusion of big data greatly improves the predictive performance of private consumption after COVID-19, the benefit of big data that reflects new information in a timely manner has been shown to increase when economic uncertainty is high.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

A Study of Prediction of Daily Water Supply Usion ANFIS (ANFIS를 이용한 상수도 1일 급수량 예측에 관한 연구)

  • Rhee, Kyoung-Hoon;Moon, Byoung-Seok;Kang, Il-Hwan
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.6
    • /
    • pp.821-832
    • /
    • 1998
  • This study investigates the prediction of daily water supply, which is a necessary for the efficient management of water distribution system. Fuzzy neuron, namely artificial intelligence, is a neural network into which fuzzy information is inputted and then processed. In this study, daily water supply was predicted through an adaptive learning method by which a membership function and fuzzy rules were adapted for daily water supply prediction. This study was investigated methods for predicting water supply based on data about the amount of water supplied to the city of Kwangju. For variables choice, four analyses of input data were conducted: correlation analysis, autocorrelation analysis, partial autocorrelation analysis, and cross-correlation analysis. Input variables were (a) the amount of water supplied (b) the mean temperature, and (c)the population of the area supplied with water. Variables were combined in an integrated model. Data of the amount of daily water supply only was modelled and its validity was verified in the case that the meteorological office of weather forecast is not always reliable. Proposed models include accidental cases such as a suspension of water supply. The maximum error rate between the estimation of the model and the actual measurement was 18.35% and the average error was lower than 2.36%. The model is expected to be a real-time estimation of the operational control of water works and water/drain pipes.

  • PDF

A Study on Risk Parity Asset Allocation Model with XGBoos (XGBoost를 활용한 리스크패리티 자산배분 모형에 관한 연구)

  • Kim, Younghoon;Choi, HeungSik;Kim, SunWoong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.135-149
    • /
    • 2020
  • Artificial intelligences are changing world. Financial market is also not an exception. Robo-Advisor is actively being developed, making up the weakness of traditional asset allocation methods and replacing the parts that are difficult for the traditional methods. It makes automated investment decisions with artificial intelligence algorithms and is used with various asset allocation models such as mean-variance model, Black-Litterman model and risk parity model. Risk parity model is a typical risk-based asset allocation model which is focused on the volatility of assets. It avoids investment risk structurally. So it has stability in the management of large size fund and it has been widely used in financial field. XGBoost model is a parallel tree-boosting method. It is an optimized gradient boosting model designed to be highly efficient and flexible. It not only makes billions of examples in limited memory environments but is also very fast to learn compared to traditional boosting methods. It is frequently used in various fields of data analysis and has a lot of advantages. So in this study, we propose a new asset allocation model that combines risk parity model and XGBoost machine learning model. This model uses XGBoost to predict the risk of assets and applies the predictive risk to the process of covariance estimation. There are estimated errors between the estimation period and the actual investment period because the optimized asset allocation model estimates the proportion of investments based on historical data. these estimated errors adversely affect the optimized portfolio performance. This study aims to improve the stability and portfolio performance of the model by predicting the volatility of the next investment period and reducing estimated errors of optimized asset allocation model. As a result, it narrows the gap between theory and practice and proposes a more advanced asset allocation model. In this study, we used the Korean stock market price data for a total of 17 years from 2003 to 2019 for the empirical test of the suggested model. The data sets are specifically composed of energy, finance, IT, industrial, material, telecommunication, utility, consumer, health care and staple sectors. We accumulated the value of prediction using moving-window method by 1,000 in-sample and 20 out-of-sample, so we produced a total of 154 rebalancing back-testing results. We analyzed portfolio performance in terms of cumulative rate of return and got a lot of sample data because of long period results. Comparing with traditional risk parity model, this experiment recorded improvements in both cumulative yield and reduction of estimated errors. The total cumulative return is 45.748%, about 5% higher than that of risk parity model and also the estimated errors are reduced in 9 out of 10 industry sectors. The reduction of estimated errors increases stability of the model and makes it easy to apply in practical investment. The results of the experiment showed improvement of portfolio performance by reducing the estimated errors of the optimized asset allocation model. Many financial models and asset allocation models are limited in practical investment because of the most fundamental question of whether the past characteristics of assets will continue into the future in the changing financial market. However, this study not only takes advantage of traditional asset allocation models, but also supplements the limitations of traditional methods and increases stability by predicting the risks of assets with the latest algorithm. There are various studies on parametric estimation methods to reduce the estimated errors in the portfolio optimization. We also suggested a new method to reduce estimated errors in optimized asset allocation model using machine learning. So this study is meaningful in that it proposes an advanced artificial intelligence asset allocation model for the fast-developing financial markets.

Improve the Performance of People Detection using Fisher Linear Discriminant Analysis in Surveillance (서베일런스에서 피셔의 선형 판별 분석을 이용한 사람 검출의 성능 향상)

  • Kang, Sung-Kwan;Lee, Jung-Hyun
    • Journal of Digital Convergence
    • /
    • v.11 no.12
    • /
    • pp.295-302
    • /
    • 2013
  • Many reported methods assume that the people in an image or an image sequence have been identified and localization. People detection is one of very important variable to affect for the system's performance as the basis technology about the detection of other objects and interacting with people and computers, motion recognition. In this paper, we present an efficient linear discriminant for multi-view people detection. Our approaches are based on linear discriminant. We define training data with fisher Linear discriminant to efficient learning method. People detection is considerably difficult because it will be influenced by poses of people and changes in illumination. This idea can solve the multi-view scale and people detection problem quickly and efficiently, which fits for detecting people automatically. In this paper, we extract people using fisher linear discriminant that is hierarchical models invariant pose and background. We estimation the pose in detected people. The purpose of this paper is to classify people and non-people using fisher linear discriminant.

LSTM-based Anomaly Detection on Big Data for Smart Factory Monitoring (스마트 팩토리 모니터링을 위한 빅 데이터의 LSTM 기반 이상 탐지)

  • Nguyen, Van Quan;Van Ma, Linh;Kim, Jinsul
    • Journal of Digital Contents Society
    • /
    • v.19 no.4
    • /
    • pp.789-799
    • /
    • 2018
  • This article presents machine learning based approach on Big data to analyzing time series data for anomaly detection in such industrial complex system. Long Short-Term Memory (LSTM) network have been demonstrated to be improved version of RNN and have become a useful aid for many tasks. This LSTM based model learn the higher level temporal features as well as temporal pattern, then such predictor is used to prediction stage to estimate future data. The prediction error is the difference between predicted output made by predictor and actual in-coming values. An error-distribution estimation model is built using a Gaussian distribution to calculate the anomaly in the score of the observation. In this manner, we move from the concept of a single anomaly to the idea of the collective anomaly. This work can assist the monitoring and management of Smart Factory in minimizing failure and improving manufacturing quality.

A Study on the Push-based Distance Education System and Leveling Estimation Algorithm (Push 기반 원격교육 시스템과 수준별 문항평가 알고리즘에 관한 연구)

  • 김원영;김치수;김진수
    • Journal of Internet Computing and Services
    • /
    • v.2 no.3
    • /
    • pp.19-25
    • /
    • 2001
  • An educational system using computers was first conceptualized by Dr. Donald Bitzer in the University of Illinois in the late 1950s. Since the PLATO system was developed in 1961, multilateral research were made for the last 30 years. Especially, the development of Internet and Information Technology has contributed to the advancement of the distance education system. This system has greatly changed the existing educational paradigm, As the result, new education system is being realized, This study suggests a distance education system based on ‘push’ technique, which is a means of active information transmission, In this system, the ‘push’ technique is combined with the existing distance education system. Through the combination, learning contents can be provided for learners without connecting the DB on the Internet. In addition, the process of getting new information is real-timed operation. Also, the treatment of item; and the algorithm of level-based item evaluation are devised in consideration of various levels of learners, so that evaluation of items appropriate to the levels of learners can be accomplished.

  • PDF

Estimation of 3D Rotation Information of Animation Character Face (애니메이션 캐릭터 얼굴의 3차원 회전정보 측정)

  • Jang, Seok-Woo;Weon, Sun-Hee;Choi, Hyung-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.49-56
    • /
    • 2011
  • Recently, animation contents has become extensively available along with the development of cultural industry. In this paper, we propose a method to analyze a face of animation character and extract 3D rotational information of the face. The suggested method first generates a dominant color model of a face by learning the face image of animation character. Our system then detects the face and its components with the model, and establishes two coordinate systems: base coordinate system and target coordinate system. Our system estimates three dimensional rotational information of the animation character face using the geometric relationship of the two coordinate systems. Finally, in order to visually represent the extracted 3D information, a 3D face model in which the rotation information is reflected is displayed. In experiments, we show that our method can extract 3D rotation information of a character face reasonably.

Design and Implementation of WBI System for Test and Diagnoses based on WWW (WWW기반에서 테스트 및 진단을 위한 WBI 시스템의 설계 및 구현)

  • Kim, Du-Gyu;Lee, Jae-Mu
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.12
    • /
    • pp.938-946
    • /
    • 2001
  • A web support open environment in which flexibility that allows it to be applied in the education field has gradually evolved but the WBI(Web Based Instruction) which compose it have many limitations and problems, as far as learning efficiency is concerned. In particular, existing web-based estimation systems just give information on whether learner's replies are 'correct' or 'incorrect' and offer the learners evaluations of results in terms of scores. Therefore it is difficult for the learners to get more detailed information about their shortcomings and errors. What is needed for the learners is that web based instruction systems diagnose learner's comprehension status, providing c causes: Why did the learners make the errors\ulcorner In this paper, we propose the development of a web-based instruction system that learners can access with their browsers at any time and no matter where they are. Our system has a facility that analyses learner's weak points and diagnoses error cause, giving advice to learners and more detailed error information than existing systems. By accumulating user behaviors, relevant individualized information on the learners can be given.

  • PDF