• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.02 seconds

Studies on Predicting Chemical Composition of Permanent Pastures in Hilly Grazing Area Using Near-Infrared Spectroscopy (근적외선 분광법을 이용한 산지방목지 목초시료 화학적 성분 분석에 관한 연구)

  • Park, Hyung-Soo;Lee, Hyo-Jin;Lee, Hyo-won;Ko, Han-Jong;Jeong, Jong-Sung
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.37 no.2
    • /
    • pp.154-160
    • /
    • 2017
  • This study was conducted to find out an alternative way of rapid and accurate analysis of chemical composition of permanent pastures in hilly grazing area. Near reflectance infrared spectroscopy (NIRS) was used to evaluate the potential for predicting proximate analysis of permanent pastures in a vegetative stage. 386 pasture samples obtained from hilly grazing area in 2015 and 2016 were scanned for their visible-NIR spectra from 400~2,400nm. 163 samples with different spectral characteristics were selected and analysed for moisture, crude protein (CP), crude ash (CA), acid detergent fiber (ADF) and neutral detergent fiber (NDF). Multiple linear regression was used with wet analysis data and spectra for developing the calibration and validation mode1. Wavelength of 400 to 2500nm and near infrared range with different critical T outlier value 2.5 and 1.5 were used for developing the most suitable equation. The important index in this experiment was SEC and SEP. The $R^2$ value for moisture, CP, CA, CF, Ash, ADF, NDF in calibration set was 0.86, 0.94, 0.91, 0.88, 0.48 and 0.93, respectively. The value in validation set was 0.66, 0.86, 0.83, 0.71, 0.35 and 0.88, respectively. The results of this experiment indicate that NIRS is a reliable analytical method to assess forage quality for CP, CF, NDF except ADF and moisture in permanent pastures when proper samples incorporated into the equation development.

Estimation of Consolidation Characteristics of Soft Ground in Major River Mouth (주요 강하구 연약지반의 압밀 특성 평가)

  • Lee, JunDae;Kwon, YoungChul;Bae, WooSeok
    • Journal of the Korean GEO-environmental Society
    • /
    • v.20 no.2
    • /
    • pp.69-79
    • /
    • 2019
  • The coastal area forms various sedimentary layers according to the environmental conditions such as the topography and geological features of the upper region of the river, ocean currents, and river mouth. Therefore, identifying the characteristics of the marine clay deposited in the coastal area plays a key role in the investigation of the formation of soft ground. In general, alluvial grounds are formed by a variety of factors such as changes in topography and natural environment, they have very diverse qualities depending on the deposited region or sedimentation conditions. The most important thing for the construction of social infrastructures in soft ground areas is economical and efficient treatment of soft ground. In this study, the author collected data from diverse laboratory and field tests on five areas in western and southern offshore with relatively high reliability, and then statistically analyzed them, thereby presenting standard constants for construction design. Correlation between design parameters such as over consolidation ratio, preconsolidation pressure was analyzed using linear and non-linear regression analyses. Also, proposed distribution characteristics of design parameters in consideration of each region's uncertainty through statistical analyses such as normality verification, outlier removal.

A Review of Statistical Methods in the Korean Journal of Orthodontics and the American Journal of Orthodontics and Dentofacial Orthopedics (대한치과교정학회지(KJO)와 미국교정학회지(AJODO)에서 사용된 통계기법의 비교분석 및 고찰(1999-2003))

  • Lim, Hoi-Jeong
    • The korean journal of orthodontics
    • /
    • v.34 no.5 s.106
    • /
    • pp.371-379
    • /
    • 2004
  • The purpose of this study was to investigate the changes and types of statistical methods used in the Korean Journal of Orthodontics (KJO) and the American Journal of Orthodontics and Dentofacial Orthopedics (AJODO) from )999 to 2003. The frequency of use, transitions, assumption check of statistical methods and types of advanced statistical methods were examined from each journal. The study consisted of 247 articles published in the KJO and randomly chosen 50 articles per year which were original articles and used statistical methods T-test, analysis of variance(ANOVA), correlation analysis, nonparametric analysis. regression analysis chi-square test. factor analysis, were the order of statistical methods most frequently used in the KJO, while t-test. ANOVA, nonparametric analysis, correlation analysis, regression analysis, chi-square test. factor analysis. were the order of statistical methods used in the AJODO The changes of statistical methods observed in the KJO were not significant $(X^2=17.4\;p=0.5881)$ but the changes observed in the AJODO was seen to be significant $(x^2=42.4,\;p=0.0397)$ Some of the studies examined had overlooked the assumptions of the statistical methods employed. Data investigation such as outlier should be performed before analysis and alternative statistical approaches are applied for a small sample size. Types of advanced statistical methods were factor analysis and discriminant analysis in the KJO and Intention-To-Treat (ITT) analysis in clinical trials through multi-center, survival analysis and Generalized Estimating Equations (GEE) in the AJODO. Appropriate analysis approaches and interpretations should be applied for the correlated and repeated measurements of the orthodontic data set.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A Study on the Observation of Soil Moisture Conditions and its Applied Possibility in Agriculture Using Land Surface Temperature and NDVI from Landsat-8 OLI/TIRS Satellite Image (Landsat-8 OLI/TIRS 위성영상의 지표온도와 식생지수를 이용한 토양의 수분 상태 관측 및 농업분야에의 응용 가능성 연구)

  • Chae, Sung-Ho;Park, Sung-Hwan;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.6_1
    • /
    • pp.931-946
    • /
    • 2017
  • The purpose of this study is to observe and analyze soil moisture conditions with high resolution and to evaluate its application feasibility to agriculture. For this purpose, we used three Landsat-8 OLI (Operational Land Imager)/TIRS (Thermal Infrared Sensor) optical and thermal infrared satellite images taken from May to June 2015, 2016, and 2017, including the rural areas of Jeollabuk-do, where 46% of agricultural areas are located. The soil moisture conditions at each date in the study area can be effectively obtained through the SPI (Standardized Precipitation Index)3 drought index, and each image has near normal, moderately wet, and moderately dry soil moisture conditions. The temperature vegetation dryness index (TVDI) was calculated to observe the soil moisture status from the Landsat-8 OLI/TIRS images with different soil moisture conditions and to compare and analyze the soil moisture conditions obtained from the SPI3 drought index. TVDI is estimated from the relationship between LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index) calculated from Landsat-8 OLI/TIRS satellite images. The maximum/minimum values of LST according to NDVI are extracted from the distribution of pixels in the feature space of LST-NDVI, and the Dry/Wet edges of LST according to NDVI can be determined by linear regression analysis. The TVDI value is obtained by calculating the ratio of the LST value between the two edges. We classified the relative soil moisture conditions from the TVDI values into five stages: very wet, wet, normal, dry, and very dry and compared to the soil moisture conditions obtained from SPI3. Due to the rice-planing season from May to June, 62% of the whole images were classified as wet and very wet due to paddy field areas which are the largest proportions in the image. Also, the pixels classified as normal were analyzed because of the influence of the field area in the image. The TVDI classification results for the whole image roughly corresponded to the SPI3 soil moisture condition, but they did not correspond to the subdivision results which are very dry, wet, and very wet. In addition, after extracting and classifying agricultural areas of paddy field and field, the paddy field area did not correspond to the SPI3 drought index in the very dry, normal and very wet classification results, and the field area did not correspond to the SPI3 drought index in the normal classification. This is considered to be a problem in Dry/Wet edge estimation due to outlier such as extremely dry bare soil and very wet paddy field area, water, cloud and mountain topography effects (shadow). However, in the agricultural area, especially the field area, in May to June, it was possible to effectively observe the soil moisture conditions as a subdivision. It is expected that the application of this method will be possible by observing the temporal and spatial changes of the soil moisture status in the agricultural area using the optical satellite with high spatial resolution and forecasting the agricultural production.

Estimation of Moisture Content in Cucumber and Watermelon Seedlings Using Hyperspectral Imagery (초분광영상 이용 오이 및 수박 묘의 수분함량 추정)

  • Kim, Seong-Heon;Kang, Jeong-Gyun;Ryu, Chan-Seok;Kang, Ye-Seong;Sarkar, Tapash Kumar;Kang, Dong Hyeon;Ku, Yang-Gyu;Kim, Dong-Eok
    • Journal of Bio-Environment Control
    • /
    • v.27 no.1
    • /
    • pp.34-39
    • /
    • 2018
  • This research was conducted to estimate moisture content in cucurbitaceae seedlings, such as cucumber and watermelon, using hyperspectral imagery. Using a hyperspectral image acquisition system, the reflectance of leaf area of cucumber and watermelon seedlings was calculated after providing water stress. Then, moisture content in each seedling was measured by using a dry oven. Finally, using reflectance and moisture content, the moisture content estimation models were developed by PLSR analysis. After developing the estimation models, performance of the cucumber showed 0.73 of $R^2$, 1.45% of RMSE, and 1.58% of RE. Performance of the watermelon showed 0.66 of $R^2$, 1.06% of RMSE, and 1.14% of RE. The model performed slightly better after removing one sample from cucumber seedlings as outlier and unnecessary. Hence, the performance of new model for cucumber seedlings showed 0.79 of $R^2$, 1.10% of RMSE, and 1.20% of RE. The model performance combined with all samples showed 0.67 of $R^2$, 1.26% of RMSE, and 1.36% of RE. The model of cucumber showed better performance than the model of watermelon. This is because variables of cucumber are consisted of widely distributed variation, and it affected the performance. Further, accuracy and precision of the cucumber model were increased when an insignificant sample was eliminated from the dataset. Finally, it is considered that both models can be significantly used to estimate moisture content, as gradients of trend line are almost same and intersected. It is considered that the accuracy and precision of the estimating models possibly can be improved, if the models are constructed by using variables with widely distributed variation. The improved models will be utilized as the basis for developing low-priced sensors.