• Title/Summary/Keyword: Outlier

Search Result 653, Processing Time 0.024 seconds

Processing and Quality Control of Flux Data at Gwangneung Forest (광릉 산림의 플럭스 자료 처리와 품질 관리)

  • Lim, Hee-Jeong;Lee, Young-Hee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.10 no.3
    • /
    • pp.82-93
    • /
    • 2008
  • In order to ensure a standardized data analysis of the eddy covariance measurements, Hong and Kim's quality control program has been updated and used to process eddy covariance data measured at two levels on the main flux tower at Gwangneung site from January to May in 2005. The updated program was allowed to remove outliers automatically for $CO_2$ and latent heat fluxes. The flag system consists of four quality groups(G, D, B and M). During the study period, the missing data were about 25% of the total records. About 60% of the good quality data were obtained after the quality control. The number of record in G group was larger at 40m than at 20m. It is due that the level of 20m was within the roughness sublayer where the presence of the canopy influences directly on the character of the turbulence. About 60% of the bad data were due to low wind speed. Energy balance closure at this site was about 40% during the study period. Large imbalance is attributed partly to the combined effects of the neglected heat storage terms, inaccuracy of ground heat flux and advection due to local wind system near the surface. The analysis of wind direction indicates that the frequent occurrence of positive momentum flux was closely associated with mountain valley wind system at this site. The negative $CO_2$ flux at night was examined in terms of averaging time. The results show that when averaging time is larger than 10min, the magnitude of calculated $CO_2$ fluxes increases rapidly, suggesting that the 30min $CO_2$ flux is influenced severely by the mesoscale motion or nonstationarity. A proper choice of averaging time needs to be considered to get accurate turbulent fluxes during nighttime.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A Study on the Observation of Soil Moisture Conditions and its Applied Possibility in Agriculture Using Land Surface Temperature and NDVI from Landsat-8 OLI/TIRS Satellite Image (Landsat-8 OLI/TIRS 위성영상의 지표온도와 식생지수를 이용한 토양의 수분 상태 관측 및 농업분야에의 응용 가능성 연구)

  • Chae, Sung-Ho;Park, Sung-Hwan;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.6_1
    • /
    • pp.931-946
    • /
    • 2017
  • The purpose of this study is to observe and analyze soil moisture conditions with high resolution and to evaluate its application feasibility to agriculture. For this purpose, we used three Landsat-8 OLI (Operational Land Imager)/TIRS (Thermal Infrared Sensor) optical and thermal infrared satellite images taken from May to June 2015, 2016, and 2017, including the rural areas of Jeollabuk-do, where 46% of agricultural areas are located. The soil moisture conditions at each date in the study area can be effectively obtained through the SPI (Standardized Precipitation Index)3 drought index, and each image has near normal, moderately wet, and moderately dry soil moisture conditions. The temperature vegetation dryness index (TVDI) was calculated to observe the soil moisture status from the Landsat-8 OLI/TIRS images with different soil moisture conditions and to compare and analyze the soil moisture conditions obtained from the SPI3 drought index. TVDI is estimated from the relationship between LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index) calculated from Landsat-8 OLI/TIRS satellite images. The maximum/minimum values of LST according to NDVI are extracted from the distribution of pixels in the feature space of LST-NDVI, and the Dry/Wet edges of LST according to NDVI can be determined by linear regression analysis. The TVDI value is obtained by calculating the ratio of the LST value between the two edges. We classified the relative soil moisture conditions from the TVDI values into five stages: very wet, wet, normal, dry, and very dry and compared to the soil moisture conditions obtained from SPI3. Due to the rice-planing season from May to June, 62% of the whole images were classified as wet and very wet due to paddy field areas which are the largest proportions in the image. Also, the pixels classified as normal were analyzed because of the influence of the field area in the image. The TVDI classification results for the whole image roughly corresponded to the SPI3 soil moisture condition, but they did not correspond to the subdivision results which are very dry, wet, and very wet. In addition, after extracting and classifying agricultural areas of paddy field and field, the paddy field area did not correspond to the SPI3 drought index in the very dry, normal and very wet classification results, and the field area did not correspond to the SPI3 drought index in the normal classification. This is considered to be a problem in Dry/Wet edge estimation due to outlier such as extremely dry bare soil and very wet paddy field area, water, cloud and mountain topography effects (shadow). However, in the agricultural area, especially the field area, in May to June, it was possible to effectively observe the soil moisture conditions as a subdivision. It is expected that the application of this method will be possible by observing the temporal and spatial changes of the soil moisture status in the agricultural area using the optical satellite with high spatial resolution and forecasting the agricultural production.