• Title/Summary/Keyword: Simple and multiple regression model

Search Result 123, Processing Time 0.028 seconds

Multimodal Emotional State Estimation Model for Implementation of Intelligent Exhibition Services (지능형 전시 서비스 구현을 위한 멀티모달 감정 상태 추정 모형)

  • Lee, Kichun;Choi, So Yun;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.1-14
    • /
    • 2014
  • Both researchers and practitioners are showing an increased interested in interactive exhibition services. Interactive exhibition services are designed to directly respond to visitor responses in real time, so as to fully engage visitors' interest and enhance their satisfaction. In order to install an effective interactive exhibition service, it is essential to adopt intelligent technologies that enable accurate estimation of a visitor's emotional state from responses to exhibited stimulus. Studies undertaken so far have attempted to estimate the human emotional state, most of them doing so by gauging either facial expressions or audio responses. However, the most recent research suggests that, a multimodal approach that uses people's multiple responses simultaneously may lead to better estimation. Given this context, we propose a new multimodal emotional state estimation model that uses various responses including facial expressions, gestures, and movements measured by the Microsoft Kinect Sensor. In order to effectively handle a large amount of sensory data, we propose to use stratified sampling-based MRA (multiple regression analysis) as our estimation method. To validate the usefulness of the proposed model, we collected 602,599 responses and emotional state data with 274 variables from 15 people. When we applied our model to the data set, we found that our model estimated the levels of valence and arousal in the 10~15% error range. Since our proposed model is simple and stable, we expect that it will be applied not only in intelligent exhibition services, but also in other areas such as e-learning and personalized advertising.

Diagnosis of Nitrogen Content in the Leaves of Apple Tree Using Spectral Imagery (분광 영상을 이용한 사과나무 잎의 질소 영양 상태 진단)

  • Jang, Si Hyeong;Cho, Jung Gun;Han, Jeom Hwa;Jeong, Jae Hoon;Lee, Seul Ki;Lee, Dong Yong;Lee, Kwang Sik
    • Journal of Bio-Environment Control
    • /
    • v.31 no.4
    • /
    • pp.384-392
    • /
    • 2022
  • The objective of this study was to estimated nitrogen content and chlorophyll using RGB, Hyperspectral sensors to diagnose of nitrogen nutrition in apple tree leaves. Spectral data were acquired through image processing after shooting with high resolution RGB and hyperspectral sensor for two-year-old 'Hongro/M.9' apple. Growth data measured chlorophyll and leaf nitrogen content (LNC) immediately after shooting. The growth model was developed by using regression analysis (simple, multi, partial least squared) with growth data (chlorophyll, LNC) and spectral data (SPAD meter, color vegetation index, wavelength). As a result, chlorophyll and LNC showed a statistically significant difference according to nitrogen fertilizer level regardless of date. Leaf color became pale as the nutrients in the leaf were transferred to the fruit as over time. RGB sensor showed a statistically significant difference at the red wavelength regardless of the date. Also hyperspectral sensor showed a spectral difference depend on nitrogen fertilizer level for non-visible wavelength than visible wavelength at June 10th and July 14th. The estimation model performance of chlorophyll, LNC showed Partial least squared regression using hyperspectral data better than Simple and multiple linear regression using RGB data (Chlorophyll R2: 81%, LNC: 81%). The reason is that hyperspectral sensor has a narrow Full Half at Width Maximum (FWHM) and broad wavelength range (400-1,000 nm), so it is thought that the spectral analysis of crop was possible due to stress cause by nitrogen deficiency. In future study, it is thought that it will contribute to development of high quality and stable fruit production technology by diagnosis model of physiology and pest for all growth stage of tree using hyperspectral imagery.

An Alternative Model for Determining the Optimal Fertilizer Level (수도(水稻) 적정시비량(適正施肥量) 결정(決定)에 대한 대체모형(代替模型))

  • Chang, Suk-Hwan
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.13 no.1
    • /
    • pp.21-32
    • /
    • 1980
  • Linear models, with and without site variables, have been investigated in order to develop an alternative methodology for determining optimal fertilizer levels. The resultant models are : (1) Model I is an ordinary quadratic response function formed by combining the simple response function estimated at each site in block diagonal form, and has parameters [${\gamma}^{(1)}_{m{\ell}}$], for m=1, 2, ${\cdots}$, n sites and degrees of polynomial, ${\ell}$=0, 1, 2. (2) Mode II is a multiple regression model with a set of site variables (including an intercept) repeated for each fertilizer level and the linear and quadratic terms of the fertilizer variables arranged in block diagonal form as in Model I. The parameters are equal to [${\beta}_h\;{\gamma}^{(2)}_{m{\ell}}$] for h=0, 1, 2, ${\cdots}$, k site variable, m=1, 2, ${\cdots}$ and ${\ell}$=1, 2. (3) Model III is a classical response surface model, I. e., a common quadratic polynomial model for the fertilizer variables augmented with site variables and interactions between site variables and the linear fertilizer terms. The parameters are equal to [${\beta}_h\;{\gamma}_{\ell}\;{\theta}_h$], for h=0, 1, ${\cdots}$, k, ${\ell}$=1, 2, and h'=1, 2, ${\cdots}$, k. (4) Model IV has the same basic structure as Mode I, but estimation procedure involves two stages. In stage 1, yields for each fertilizer level are regressed on the site variables and the resulting predicted yields for each site are then regressed on the fertilizer variables in stage 2. Each model has been evaluated under the assumption that Model III is the postulated true response function. Under this assumption, Models I, II and IV give biased estimators of the linear fertilizer response parameter which depend on the interaction between site variables and applied fertilizer variables. When the interaction is significant, Model III is the most efficient for calculation of optimal fertilizer level. It has been found that Model IV is always more efficient than Models I and II, with efficiency depending on the magnitude of ${\lambda}m$, the mth diagonal element of X (X' X)' X' where X is the site variable matrix. When the site variable by linear fertilizer interaction parameters are zero or when the estimated interactions are not important, it is demonstrated that Model IV can be a reasonable alternative model for calculation of optimal fertilizer level. The efficiencies of the models are compared us ing data from 256 fertilizer trials on rice conducted in Korea. Although Model III is usually preferred, the empirical results from the data analysis support the feasibility of using Model IV in practice when the estimated interaction term between measured soil organic matter and applied nitrogen is not important.

  • PDF

Assessment of statistical sampling methods and approximation models applied to aeroacoustic and vibroacoustic problems

  • Biedermann, Till M.;Reich, Marius;Kameier, Frank;Adam, Mario;Paschereit, C.O.
    • Advances in aircraft and spacecraft science
    • /
    • v.6 no.6
    • /
    • pp.529-550
    • /
    • 2019
  • The effect of multiple process parameters on a set of continuous response variables is, especially in experimental designs, difficult and intricate to determine. Due to the complexity in aeroacoustic and vibroacoustic studies, the often-performed simple one-factor-at-a-time method turns out to be the least effective approach. In contrast, the statistical Design of Experiments is a technique used with the objective to maximize the obtained information while keeping the experimental effort at a minimum. The presented work aims at giving insights on Design of Experiments applied to aeroacoustic and vibroacoustic problems while comparing different experimental designs and approximation models. For this purpose, an experimental rig of a ducted low-pressure fan is developed that allows gathering data of both, aerodynamic and aeroacoustic nature while analysing three independent process parameters. The experimental designs used to sample the design space are a Central Composite design and a Box-Behnken design, both used to model a response surface regression, and Latin Hypercube sampling to model an Artificial Neural network. The results indicate that Latin Hypercube sampling extracts information that is more diverse and, in combination with an Artificial Neural network, outperforms the quadratic response surface regressions. It is shown that the Latin Hypercube sampling, initially developed for computer-aided experiments, can also be used as an experimental design. To further increase the benefit of the presented approach, spectral information of every experimental test point is extracted and Artificial Neural networks are chosen for modelling the spectral information since they show to be the most universal approximators.

The Factors Implicated When an Individual Starts to Smoke Again After a 6 Month Cessation (보건소 금연클리닉 6개월 금연성공자의 재흡연과 관련요인)

  • Son, Hyo-Kyung;Jung, Un-Young;Park, Ki-Soo;Kam, Sin;Park, Sun-Kyun;Lee, Won-Kee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.42 no.1
    • /
    • pp.42-48
    • /
    • 2009
  • Objectives : This study was conducted to examine the factors implicated when people start smoking again after a 6 month cessation, and was carried out at the smoking cessation clinic of a public health center. Methods : The study subjects were 191 males who had attended the smoking cessation clinic of a public health center for 6 months in an attempt to quit smoking. Data was collected, by phone interview, regarding individual smoking habits, if any, over the 6 month study period. The factors which may have caused an individual to smoke again were examined. This study employed a health belief model as it theoretical basis. Results : Following a 6 month cessation, 24.1% of the study group began to smoke again during the 6 month test period. In a simple analysis, the factors related to individuals relapsing and smoking again included barriers of stress reduction, body weight gain and induction of smoking by surroundings among perceived barriers factor of our health belief model(p<0.05). In multiple logistic regression analysis for relapsed smoking, significant factors included barriers of stress reduction and induction of smoking by surroundings(p<0.05). The most important reason of for an individual to relapse into smoking was stress(60.9%) and the most likely place for a relapse to occur was a drinking establishment(39.1%). Conclusions : Our results indicate that both regular consultations and a follow-up management program are important considerations in a public health center program geared towards maintaining smoking cessation.

Improving Estimation Ability of Software Development Effort Using Principle Component Analysis (주성분분석을 이용한 소프트웨어 개발노력 추정능력 향상)

  • Lee, Sang-Un
    • The KIPS Transactions:PartD
    • /
    • v.9D no.1
    • /
    • pp.75-80
    • /
    • 2002
  • Putnam develops SLIM (Software LIfecycle Management) model based upon the assumption that the manpower utilization during software project development is followed by a Rayleigh distribution. To obtain the manpower distribution, we have to be estimate the total development effort and difficulty ratio parameter. We need a way to accurately estimate these parameters early in the requirements and specification phase before investment decisions have to be made. Statistical tests show that system attributes are highly correlation (redundant) so that Putnam discards one and get a parameter estimator from the other attributes. But, different statistical method has different system attributes and presents different performance. To select the principle system attributes, this paper uses the principle component analysis (PCA) instead of Putnam's method. The PCA's results improve a 9.85 percent performance more than the Putnam's result. Also, this model seems to be simple and easily realize.

Fertility Evaluation of Upland Fields by Combination of Landscape and Soil Survey Data with Chemical Properties in Soil (토양 화학성과 지형 및 토양 조사자료를 활용한 밭 토양의 비옥도 평가)

  • Hong, Soon-Dal;Kim, Jai-Joung;Min, Kyong-Beum;Kang, Bo-Goo;Kim, Hyun-Ju
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.33 no.4
    • /
    • pp.221-233
    • /
    • 2000
  • Evaluation method of soil fertility by application of geographic information system (GIS) which includes landscape characteristics and soil map data was investigated from productivities of red pepper and tobacco grown on the fields with no fertilization. Total 131 fields experiments, 64 fields of red pepper and 67 fields of tobacco were conducted from 22 and 23 fields for red pepper and tobacco, respectively, located at Cheangweon and Eumseong counties in 1996, from 20 and 25 fields at Boeun and Goesan counties in 1997, and 22 and 19 fields at Jincheon and Chungju counties in 1998. All the experimental sites were selected on the basis of wide range of distribution in landscape and soil attributes. Dry weights and nutrients (N, P and K) uptakes by red pepper plant and tobacco leaves were considered as basic fertility of the soil (BFS). The BFS was estimated by twenty-five independent variables including 13 chemical properties and 12 GIS data. Twenty-five independent variables were classified by two groups, 15 quantitative variables and 10 qualitative variables, and were analyzed by multiple linear regression (MLR) of REG and GLM models of SAS. Dry weight of red pepper (DWRP) and dry weight of tobacco leaves (DWTL) every year showed high variations by five times in difference plots with minimum yield and maximum yield indicating the diverse soil fertility among the experimental fields. Evaluation for the BFS by the MLR including independent variables was better than that by simple regression showing gradual improvement by adding chemical properties, quantitative variables, and qualitative variables of the GIS. However the evaluation for the BFS by the MLR showed the better result for tobacco than red pepper. For example the variability in the DWTL by MLR was explained 34.2% by only chemical properties, 35.0% by adding quantitative variables, and 72.5% by adding both the quantitative and qualitative variables of the GIS compared with 21.7% by simple regression with $NO_3-N$ content in soil. Consequently, it is assumed that this approach by the MLR including both the quantitative and qualitative variables was available as an evaluation model of soil fertility for upland field.

  • PDF

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

Availability of Diagnosis of Yin-deficiency in Elderly People with Xerostomia and Factors Influencing Subjective Oral Dryness: A Prospective Cross-sectional Study (노인 구강건조증에 대한 음허 진단의 유용성 및 주관적 구강건조감의 영향요인 : 전향적 단면 조사 연구)

  • Kim, Juyeon;Kim, Jinsung;Park, Jaewoo;Ryu, Bongha
    • The Journal of Korean Medicine
    • /
    • v.34 no.3
    • /
    • pp.13-24
    • /
    • 2013
  • Objectives: The aims of this study were to investigate the availability of diagnosis of Yin-deficiency in the elderly with xerostomia and factors influencing subjective oral dryness. Methods: We surveyed 50 patients recruited by the clinical trial, 'Efficacy of Yukmijihwang-tang on Xerostomia in the Elderly: A Randomized, Double-blind, Placebo-controlled, Two-center Trial'. The subjects were assessed on their subjective oral dryness using the Dry Mouth Symptom Questionnaire (DMSQ). Their salivary functions were measured by Unstimulated Salivary Flow Rate (USFR) measurements. In addition, the subjects were evaluated on their Qi-stagnation and Yin-deficiency conditions using the Qi-stagnation questionnaire and Yin-deficiency questionnaire. Results: There were statistically significant correlations between three variables (USFR, DMSQ score and Qi-stagnation score) and Yin-deficiency score. In the multiple regression analysis, the regression model was statistically significant (F = 10.273, p < .001). The factor most strongly influencing the subjective oral dryness was USFR (${\beta}$ = -0.386). Yin-deficiency had the next strongest impact on the subjective oral dryness (${\beta}$ = 0.371). Qi-stagnation affected the subjective oral dryness weakly (${\beta}$ = 0.075). In the simple regression analysis, Yin-deficiency had a statistically significant effect on each of six subscales of DMSQ (p < .01). Among the six subscales, DMSQ-1 ('Oral dryness at night or on awakening') was the most strongly influenced by Yin-deficiency. Conclusions: The results of this study show that the diagnosis of Yin-deficiency in the elderly with xerostomia was available and Yin-deficiency was an important factor influencing the subjective oral dryness. Therefore, the consideration of Yin-deficiency is significant for diagnosis and treatment in the elderly with xerostomia.

Derivation of Probability Plot Correlation Coefficient Test Statistics and Regression Equation for the GEV Model based on L-moments (L-모멘트 법 기반의 GEV 모형을 위한 확률도시 상관계수 검정 통계량 유도 및 회귀식 산정)

  • Ahn, Hyunjun;Jeong, Changsam;Heo, Jun-Haeng
    • Journal of Korean Society of Disaster and Security
    • /
    • v.13 no.1
    • /
    • pp.1-11
    • /
    • 2020
  • One of the important problem in statistical hydrology is to estimate the appropriated probability distribution for a given sample data. For the problem, a goodness-of-fit test is conducted based on the similarity between estimated probability distribution and assumed theoretical probability distribution. Probability plot correlation coefficient test (PPCC) is one of the goodness-of-fit test method. PPCC has high rejection power and its application is simple. In this study, test statistics of PPCC were derived for generalized extreme value distribution (GEV) models based on L-moments and these statistics were suggested by the multiple and nonlinear regression equations for its usability. To review the rejection power of the newly proposed method in this study, Monte Carlo simulation was performed with other goodness-of-fit tests including the existing PPCC test. The results showed that PPCC-A test which is proposed in this study demonstrated better rejection power than other methods, including the existing PPCC test. It is expected that the new method will be helpful to estimate the appropriate probability distribution model.