• Title/Summary/Keyword: Sampling set selection

Search Result 38, Processing Time 0.024 seconds

Development of a Gangwon Province Forest Fire Prediction Model using Machine Learning and Sampling (머신러닝과 샘플링을 이용한 강원도 지역 산불발생예측모형 개발)

  • Chae, Kyoung-jae;Lee, Yu-Ri;cho, yong-ju;Park, Ji-Hyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2018
  • The study is based on machine learning techniques to increase the accuracy of the forest fire predictive model. It used 14 years of data from 2003 to 2016 in Gang-won-do where forest fire were the most frequent. To reduce weather data errors, Gang-won-do was divided into nine areas and weather data from each region was used. However, dividing the forest fire forecast model into nine zones would make a large difference between the date of occurrence and the date of not occurring. Imbalance issues can degrade model performance. To address this, several sampling methods were applied. To increase the accuracy of the model, five indices in the Canadian Frost Fire Weather Index (FWI) were used as derived variable. The modeling method used statistical methods for logistic regression and machine learning methods for random forest and xgboost. The selection criteria for each zone's final model were set in consideration of accuracy, sensitivity and specificity, and the prediction of the nine zones resulted in 80 of the 104 fires that occurred, and 7426 of the 9758 non-fires. Overall accuracy was 76.1%.

Teens and College Students' Purchasing Decision Factors of Denim Jeans In the United States

  • Hwang Shin, Su-Jeong;Fowler, Deborah;Lee, Jinhee
    • Fashion & Textile Research Journal
    • /
    • v.15 no.6
    • /
    • pp.971-976
    • /
    • 2013
  • This study provides insight into current social media influences and purchasing power of the young generation in that the size of both of these demographic groups will impact the apparel companies and retail market for the predictable future Denim apparel companies are aware of the discretionary spending power of the Y and Z Generations. The characteristics of current teens are so similar to college-age individuals in that they have grown up with digital technology and they prefer to communicate via social networking sites. Retailers have utilized these social media platforms in order to capture the attention of the generations. Traditionally marketing campaigns have differentiated between teens and the college-age population. However, the teens actually have larger spending power and more discretionary income. A survey consisted of 32 questions pertaining to Internet media influences, influence of people, and decision factors on decisionmaking related to purchasing selection. A random sampling of 163 females responded to a set of questionnaires. Teens, like college students desire to make their own decisions when they select and purchase denim jeans. Overall 40% of them wanted to make their own decisions when purchasing their jeans, however, a significant number are influenced by their friend's opinions (34%) and the opinions of family members (15%). However, celebrities (10%) had the least influence on their decisions. Teens, like colleges students make decisions based on the same decision factors: fit (63%), cost (23%), brand (10%) and color (2%). The most important factor in determining preference was "fit".

Predicting the Accuracy of Breeding Values Using High Density Genome Scans

  • Lee, Deuk-Hwan;Vasco, Daniel A.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.24 no.2
    • /
    • pp.162-172
    • /
    • 2011
  • In this paper, simulation was used to determine accuracies of genomic breeding values for polygenic traits associated with many thousands of markers obtained from high density genome scans. The statistical approach was based upon stochastically simulating a pedigree with a specified base population and a specified set of population parameters including the effective and noneffective marker distances and generation time. For this population, marker and quantitative trait locus (QTL) genotypes were generated using either a single linkage group or multiple linkage group model. Single nucleotide polymorphism (SNP) was simulated for an entire bovine genome (except for the sex chromosome, n = 29) including linkage and recombination. Individuals drawn from the simulated population with specified marker and QTL genotypes were randomly mated to establish appropriate levels of linkage disequilibrium for ten generations. Phenotype and genomic SNP data sets were obtained from individuals starting after two generations. Genetic prediction was accomplished by statistically modeling the genomic relationship matrix and standard BLUP methods. The effect of the number of linkage groups was also investigated to determine its influence on the accuracy of breeding values for genomic selection. When using high density scan data (0.08 cM marker distance), accuracies of breeding values on juveniles were obtained of 0.60 and 0.82, for a low heritable trait (0.10) and high heritable trait (0.50), respectively, in the single linkage group model. Estimates of 0.38 and 0.60 were obtained for the same cases in the multiple linkage group models. Unexpectedly, use of BLUP regression methods across many chromosomes was found to give rise to reduced accuracy in breeding value determination. The reasons for this remain a target for further research, but the role of Mendelian sampling may play a fundamental role in producing this effect.

A Study on Bayesian Approach of Software Stochastic Reliability Superposition Model using General Order Statistics (일반 순서 통계량을 이용한 소프트웨어 신뢰확률 중첩모형에 관한 베이지안 접근에 관한 연구)

  • Lee, Byeong-Su;Kim, Hui-Cheol;Baek, Su-Gi;Jeong, Gwan-Hui;Yun, Ju-Yong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2060-2071
    • /
    • 1999
  • The complicate software failure system is defined to the superposition of the points of failure from several component point process. Because the likelihood function is difficulty in computing, we consider Gibbs sampler using iteration sampling based method. For each observed failure epoch, we applied to latent variables that indicates with component of the superposition mode. For model selection, we explored the posterior Bayesian criterion and the sum of relative errors for the comparison simple pattern with superposition model. A numerical example with NHPP simulated data set applies the thinning method proposed by Lewis and Shedler[25] is given, we consider Goel-Okumoto model and Weibull model with GOS, inference of parameter is studied. Using the posterior Bayesian criterion and the sum of relative errors, as we would expect, the superposition model is best on model under diffuse priors.

  • PDF

Factor Analysis for Exploratory Research in the Distribution Science Field (유통과학분야에서 탐색적 연구를 위한 요인분석)

  • Yim, Myung-Seong
    • Journal of Distribution Science
    • /
    • v.13 no.9
    • /
    • pp.103-112
    • /
    • 2015
  • Purpose - This paper aims to provide a step-by-step approach to factor analytic procedures, such as principal component analysis (PCA) and exploratory factor analysis (EFA), and to offer a guideline for factor analysis. Authors have argued that the results of PCA and EFA are substantially similar. Additionally, they assert that PCA is a more appropriate technique for factor analysis because PCA produces easily interpreted results that are likely to be the basis of better decisions. For these reasons, many researchers have used PCA as a technique instead of EFA. However, these techniques are clearly different. PCA should be used for data reduction. On the other hand, EFA has been tailored to identify any underlying factor structure, a set of measured variables that cause the manifest variables to covary. Thus, it is needed for a guideline and for procedures to use in factor analysis. To date, however, these two techniques have been indiscriminately misused. Research design, data, and methodology - This research conducted a literature review. For this, we summarized the meaningful and consistent arguments and drew up guidelines and suggested procedures for rigorous EFA. Results - PCA can be used instead of common factor analysis when all measured variables have high communality. However, common factor analysis is recommended for EFA. First, researchers should evaluate the sample size and check for sampling adequacy before conducting factor analysis. If these conditions are not satisfied, then the next steps cannot be followed. Sample size must be at least 100 with communality above 0.5 and a minimum subject to item ratio of at least 5:1, with a minimum of five items in EFA. Next, Bartlett's sphericity test and the Kaiser-Mayer-Olkin (KMO) measure should be assessed for sampling adequacy. The chi-square value for Bartlett's test should be significant. In addition, a KMO of more than 0.8 is recommended. The next step is to conduct a factor analysis. The analysis is composed of three stages. The first stage determines a rotation technique. Generally, ML or PAF will suggest to researchers the best results. Selection of one of the two techniques heavily hinges on data normality. ML requires normally distributed data; on the other hand, PAF does not. The second step is associated with determining the number of factors to retain in the EFA. The best way to determine the number of factors to retain is to apply three methods including eigenvalues greater than 1.0, the scree plot test, and the variance extracted. The last step is to select one of two rotation methods: orthogonal or oblique. If the research suggests some variables that are correlated to each other, then the oblique method should be selected for factor rotation because the method assumes all factors are correlated in the research. If not, the orthogonal method is possible for factor rotation. Conclusions - Recommendations are offered for the best factor analytic practice for empirical research.

A Study of Precedence and Result Factors on Team Commitment on Distribution and Hotel Employees (유통·호텔 종사원의 팀에 대한 몰입의 선행요인과 결과요인에 관한 연구)

  • Ryu, Baek-Hyun;Lee, Seung-Il
    • Journal of Distribution Science
    • /
    • v.14 no.2
    • /
    • pp.113-121
    • /
    • 2016
  • Purpose - The purpose of this study is to identify team commitment affecting employees' innovative activities and factors affecting team commitment including empowerment by leaders and job enrichment factors. In other words, so as to explain outcome variables of innovative activities, this study aims to emphasize employees' attachment roles towards their groups within nomological network, and identify the motives encouraging employees' innovative activities. The research purpose is significant due to the realistic situation of hotel industry. The reason why innovative activities are important can be found in recent changes of business environment. Also, unlike other various studies on precedence factors encouraging employees' innovative activities, this study classified those precedence factors into job and leader characteristics, and it emphasized the importance of team commitment as the process that job and leader characteristics are connected to innovative activities. Research design, data and methodology - The survey for this study was conducted during October 6th ~ November 10th in 2014 to the employees who are working in 5-star hotels in Korea. As for the selection of hotels and sampling method, convenience sampling method was used to the employees in 5-star hotels. Self-report method was used in the survey, judging that the employees' characteristics would be relatively homogeneous. 311 questionnaires were distributed in total, and 275 reponses were collected. After excluding the missing and unreliable responses, 245 questionnaires were used in the research. SPSS and AMOS programs were used for the analysis. Results - First, empowering leadership had positive effects on hotel employees' team commitment. It indicates that hotel employees are more committed to their team when their leaders set examples, provide information, and involve employees in decision-making process. Second, as a result of the relationship analysis in task diversity, task significance, task identity and team commitment, task diversity and task significance had significant effect on team commitment, while task identity had no significant effect on team commitment. It indicates that team commitment is enhanced when the employees can conduct diverse types of jobs and get more opportunities to talk with the guests. Also, the repetition for the same jobs in hotel rooms and the space for preparing food and beverage do not lead to team commitment, even though the employees fulfill their duties to the end. Third, hotel employees' team commitment has positive effect on their innovative activities. It indicates that employees voluntarily conduct innovative activities when they are attached to their team and identifies themselves with the team. Conclusions - There are theoretical and practical implications in this study. First, in terms of the theoretical perspective, this study proposes structural framework in team commitment, and it identifies the psychological mechanism in team commitment from the aspect of social exchange, which resulted in identification of precedence factors related to team commitment. In addition, this study presents new possibilities for relevant studies about team commitment by examining the effect on team commitment when the importance of innovative activities is emphasized in recent business environment.

An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems (베이지언 문서분류시스템을 위한 능동적 학습 기반의 학습문서집합 구성방법)

  • 김제욱;김한준;이상구
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.966-978
    • /
    • 2002
  • There are two important problems in improving text classification systems based on machine learning approach. The first one, called "selection problem", is how to select a minimum number of informative documents from a given document collection. The second one, called "composition problem", is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in "active learning" algorithms, and the latter is discussed in "boosting" algorithms. This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Naive Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the valiance in "weak" hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Routers-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Naive Bayes-based classification system than other conventional learning methodson system than other conventional learning methods

On the Study of Initializing Extended Depth of Focus Algorithm Parameters (Extended Depth of Focus 알고리듬 파라메타 초기설정에 관한 연구)

  • Yoo, Kyung-Moo;Joo, Hyo-Nam;Kim, Joon-Seek;Park, Duck-Chun;Choi, In-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.4
    • /
    • pp.625-633
    • /
    • 2012
  • Extended Depth of Focus (EDF) algorithms for extracting three-dimensional (3D) information from a set of optical image slices are studied by many researches recently. Due to the limited depth of focus of the microscope, only a small portion of the image slices are in focus. Most of the EDF algorithms try to find the in-focus area to generate a single focused image and a 3D depth image. Inherent to most image processing algorithms, the EDF algorithms need parameters to be properly initialized to perform successfully. In this paper, we select three popular transform-based EDF algorithms which are each based on pyramid, wavelet transform, and complex wavelet transform, and study the performance of the algorithms according to the initialization of its parameters. The parameters we considered consist of the number of levels used in the transform, the selection of the lowest level image, the window size used in high frequency filter, the noise reduction method, etc. Through extended simulation, we find a good relationship between the initialization of the parameters and the properties of both the texture and 3D ground truth images. Typically, we find that a proper initialization of the parameters improve the algorithm performance 3dB ~ 19dB over a default initialization in recovering the 3D information.