• Title/Summary/Keyword: Data Mining Process

Search Result 681, Processing Time 0.025 seconds

Determinants of Accountants' Loyalty Underlying Investment Management: Evidence from FDI Firms in Thanglong Industrial Park

  • NGUYEN, Dang Huy;HA, Son Tung;TRAN, Manh Linh;NGUYEN, Duc Thang;NGUYEN, Thi Xuan Hong;NGUYEN, Dieu Linh;DO, Duc Tai
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.4
    • /
    • pp.287-297
    • /
    • 2020
  • The research aims to investigate the impact levels of determinants on the loyalty of accountants to FDI firms underlying investment management in Thanglong Industrial Park in Hanoi, Vietnam. We conducted a questionnaire consisting of 31 observation variables with a 5-point Likert scale. Independent variables were measured from 1 "without effect" to 5 "strongly". The method of data collection was done through the survey and subjects are accountants in FDI firms doing business in Thanglong Industrial Park in Hanoi. After checking the information on the votes, there are 120 questionnaires with full information for data entry and analysis, This study employs Cronbach's Alpha test, and regression model. The results show that seven determinants including Working environment, The characteristics of working; Training, promotion prospects and development; Income, Personal characteristic, Collective work together and The method of leading had positive relationships with the loyalty of accountants. Based on the findings, some recommendations are given related to such determinants to improve the loyalty of accountants of FDI firms in general and FDI firms in Thanglong Industrial Park in Hanoi in particular. With which, those firms can enhance performance, reduce financial strain, saving on investment in the recruiting process of new staff, increase profitability to ensure investment management.

XML Document Clustering Based on Sequential Pattern (순차패턴에 기반한 XML 문서 클러스터링)

  • Hwang, Jeong-Hee;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1093-1102
    • /
    • 2003
  • As the use of internet is growing, the amount of information is increasing rapidly and XML that is a standard of the web data has the property of flexibility of data representation. Therefore electronic document systems based on web, such as EDMS (Electronic Document Management System), ebXML (e-business extensible Markup Language), have been adopting XML as the method for exchange and standard of documents. So research on the method which can manage and search structural XML documents in an effective wav is required. In this paper we propose the clustering method based on structural similarity among the many XML documents, using typical structures extracted from each document by sequential pattern mining in pre-clustering process. The proposed algorithm improves the accuracy of clustering by computing cost considering cluster cohesion and inter-cluster similarity.

Determining Factors of Intention to Actual Use of Charged Long-term Care Services for the Aged (유료노인장기요양보호서비스 이용의사 결정요인)

  • Yoo, Jin-Yeong;Chun, Jin-Ho
    • Journal of Preventive Medicine and Public Health
    • /
    • v.38 no.1
    • /
    • pp.16-24
    • /
    • 2005
  • Objectives : To help develop strategies to cope with the changes arising from the rapid aging process by predicting the determining factors of intention to actual use of the charged long-term care services for elderly as perceived by the middle aged who play the major role of supports. Methods : Subjects were the parents (men 177, women 507) in their 40s of the students selected from a university of Busan city. A questionnaire survey was conducted for 4 weeks in October 2003 about the knowledge for long-term care service, the intention of actual use, and the preferences about the type of service suppliers. Data analysis was performed with frequency, chi-square test, and t-test using SPSS program (ver 10.0K), along with data mining using decision tree of Enterprise Miner V8.2 by SAS. Results : About half of the subjects (53.7%) had the actual experiences of elderly supports. Intentions to use the charged services were relatively high in home visiting nursing care service (40.1%) and long-term care facilities service (40.4%), and were influenced by previous knowledge about the services. The intentions were stronger in women, those with higher education, and those with greater income levels. Actual elderly supports were mostly (80%) done by women, and the perceived burdens for the supports were bigger in women and those of lower socioeconomic level. Desired charges were about 10,000 won for the bath service, 20,000 won for the rests services per day, and about 500,000 won for the long-term care facilities service per month. From the result of decision tree analysis, the job professionalism was the most important determining factor of intention to actual use of the services with validation as $63{\sim}71%$. Health and welfare mixed type facilities were preferred, and the most important consideration was the level of professionalism. Conclusions : Intention to actual use of the charged services was largely determined by the aspects of time and cost. Polices to increase the number of service suppliers and to decrease the burdens perceived by actual supporters were strongly recommended.

An Efficient Architecture Exploration for Embedded Core Design Exploiting Design Hierarchy (임베디드 코어 설계를 위해 설계 계층을 이용한 효율적인 아키텍처 탐색)

  • Kim, Sang-Woo;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.12B
    • /
    • pp.1758-1765
    • /
    • 2010
  • This paper proposes an architecture exploration methodology for the design of embedded cores exploiting design hierarchy. The proposed method performs systematic architecture exploration by taking different approaches for verifying designs and estimating performances depending on the hierarchy level in design process. Performance estimation tools generate profile having performance data related with design modules of an embedded core. Profile analyzer performs data-mining to acquire association rules between the design modules and performance parameters. Inference engine in the profile analyzer updates the association rules which will be used to improve the design performance at next exploration steps. To show the efficiency of the proposed architecture explorations methodology, experiments had been performed for JPEG encoder, Chen-DCT, and FFT application functions. The embedded cores designed by taking the proposed method show performance improvement by 60.8% in terms of clock cycles on the average when compared with the initial embedded core in MIPS R3000.

Performance Improvement of Freight Logistics Hub Selection in Thailand by Coordinated Simulation and AHP

  • Wanitwattanakosol, Jirapat;Holimchayachotikul, Pongsak;Nimsrikul, Phatchari;Sopadang, Apichat
    • Industrial Engineering and Management Systems
    • /
    • v.9 no.2
    • /
    • pp.88-96
    • /
    • 2010
  • This paper presents a two-phase quantitative framework to aid the decision making process for effective selection of an efficient freight logistics hub from 8 alternatives in Thailand on the North-South economic corridor. Phase 1 employs both multiple regression and Pearson Feature selection to find the important criteria, as defined by logistics hub score, and to reduce number of criteria by eliminating the less important criteria. The result of Pearson Feature selection indicated that only 5 of 15 criteria affected the logistics hub score. Moreover, Genetic Algorithm (GA) was constructed from original 15 criteria data set to find the relationship between logistics criteria and freight logistics hub score. As a result, the statistical tools are provided the same 5 important criteria, affecting logistics hub score from GA, and data mining tool. Phase 2 performs the fuzzy stochastic AHP analysis with the five important criteria. This approach could help to gain insight into how the imprecision in judgment ratios may affect their alternatives toward the best solution and how the best alternative may be identified with certain confidence. The main objective of the paper is to find the best alternative for selecting freight logistics hub under proper criteria. The experimental results show that by using this approach, Chiang Mai province is the best place with the confidence interval 95%.

A Basic Study on Real Time 3D Location-Tracking in Ground and Underground Using MEMS Sensor (MEMS 센서를 이용한 지상 및 지하에서의 실시간 3차원 위치추적 기술에 관한 기초적 연구)

  • Seol, Munhyung;Jang, Yonggu;Jeon, Heungsoo;Kang, Injoon
    • Journal of the Korean GEO-environmental Society
    • /
    • v.14 no.4
    • /
    • pp.47-52
    • /
    • 2013
  • In Korea, the number of mining operations are getting smaller. But buried accidents are on the increase every year. For this reason, it is important to safety management in construction process, especially the worker's safety. In the field of construction needs utilization of integration system according to purpose of utilization, particularly in underground construction sites utilizing is emphasized even more. The current element technologies of location tracking, sensors and wireless communication possible to utilize but it is still difficult to utilization of integration system in construction field because a study is not complete on commercialization and availability. In this study, for real time 3-dimensional management of ubiquitous construction site in ground and underground, measure data using MEMS sensor, EDM and DGPS in 2 test site. Also results were analysed by MATLAB. As a result, error is verification less than 3 meter that possible to distinguish with the naked eye and construct direction of study based on result of former.

AutoFe-Sel: A Meta-learning based methodology for Recommending Feature Subset Selection Algorithms

  • Irfan Khan;Xianchao Zhang;Ramesh Kumar Ayyasam;Rahman Ali
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1773-1793
    • /
    • 2023
  • Automated machine learning, often referred to as "AutoML," is the process of automating the time-consuming and iterative procedures that are associated with the building of machine learning models. There have been significant contributions in this area across a number of different stages of accomplishing a data-mining task, including model selection, hyper-parameter optimization, and preprocessing method selection. Among them, preprocessing method selection is a relatively new and fast growing research area. The current work is focused on the recommendation of preprocessing methods, i.e., feature subset selection (FSS) algorithms. One limitation in the existing studies regarding FSS algorithm recommendation is the use of a single learner for meta-modeling, which restricts its capabilities in the metamodeling. Moreover, the meta-modeling in the existing studies is typically based on a single group of data characterization measures (DCMs). Nonetheless, there are a number of complementary DCM groups, and their combination will allow them to leverage their diversity, resulting in improved meta-modeling. This study aims to address these limitations by proposing an architecture for preprocess method selection that uses ensemble learning for meta-modeling, namely AutoFE-Sel. To evaluate the proposed method, we performed an extensive experimental evaluation involving 8 FSS algorithms, 3 groups of DCMs, and 125 datasets. Results show that the proposed method achieves better performance compared to three baseline methods. The proposed architecture can also be easily extended to other preprocessing method selections, e.g., noise-filter selection and imbalance handling method selection.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.