• Title/Summary/Keyword: 편향각

Search Result 174, Processing Time 0.024 seconds

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

Automatic Classification and Vocabulary Analysis of Political Bias in News Articles by Using Subword Tokenization (부분 단어 토큰화 기법을 이용한 뉴스 기사 정치적 편향성 자동 분류 및 어휘 분석)

  • Cho, Dan Bi;Lee, Hyun Young;Jung, Won Sup;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • In the political field of news articles, there are polarized and biased characteristics such as conservative and liberal, which is called political bias. We constructed keyword-based dataset to classify bias of news articles. Most embedding researches represent a sentence with sequence of morphemes. In our work, we expect that the number of unknown tokens will be reduced if the sentences are constituted by subwords that are segmented by the language model. We propose a document embedding model with subword tokenization and apply this model to SVM and feedforward neural network structure to classify the political bias. As a result of comparing the performance of the document embedding model with morphological analysis, the document embedding model with subwords showed the highest accuracy at 78.22%. It was confirmed that the number of unknown tokens was reduced by subword tokenization. Using the best performance embedding model in our bias classification task, we extract the keywords based on politicians. The bias of keywords was verified by the average similarity with the vector of politicians from each political tendency.

Differences in Environmental Behavior Practice Experience according to the Level of Environmental Literacy Factors (환경소양 요인별 수준에 따른 환경행동 실천 경험의 차이)

  • Yoonkyung Kim;Jihoon Kang;Dongyoung Lee
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.16 no.1
    • /
    • pp.153-165
    • /
    • 2023
  • This study investigates learners' environmental literacy, classifies the results by factors of environmental literacy, and then investigates the differences in the students' environmental behavior practice experiences according to the classification by factor. The study was conducted with 47 6th grade students from D elementary school located in P metropolitan city as the subject of final analysis, and environmental literacy questionnaires and environmental behavior practice experience questionnaires were used as the main data. As a result of the study, the learners were classified into three groups according to the factors of environmental literacy, and they were respectively named as the "High environmental literacy group", "low environmental literacy group", and "Low Function and Affectif group". A Word network was formed using the descriptions of environmental behavior practice experiences for each cluster, and a Degree Centrality Analysis was performed to visualize and then analyze. As a result of the analysis, "High environmental literacy group" was confirmed, 1) recognized the subjects of environmental action practice as individuals and families, 2) described his experience of environmental action practice in relation to all elements of environmental literacy, and had a relatively pessimistic view. "low environmental literacy group", and "Low Function and Affectif group" were confirmed 1) perceive the subject of environmental behavior practice as a relatively social problem, 2) the description of the experience of environmental behavior practice is relatively biased specific factors, and the "Low Function and Affectif group" is particularly focused on the knowledge element. And 3) it was confirmed that they were aware of climate change from a relatively optimistic perspective. Based on this conclusion, suggestions were made from the perspective of environmental education.

The Effects of Self-Defense Categories, Rate of Self-Defense recognition in News Article, and the Individual Characteristics of Mock Jurors on the Self-Defense Judgment (정당방위 유형, 신문기사의 정당방위 인정비율, 판단자 개인 특성이 정당방위 판단에 미치는 영향)

  • Kim, Yong ae;Kim, Min Chi
    • Korean Journal of Forensic Psychology
    • /
    • v.12 no.2
    • /
    • pp.171-197
    • /
    • 2021
  • The purpose of this study is to examine empirically how the lay people judge self-defense and what factors could affect it. A total of 651 participants aged 20 years and over were asked to answer, attitude toward interpersonal violence, and legal attitude questionnaire, all divided by the type of self-defense. Participants were assigned one of the three types of situations that were claimed to be self-defense, and were given articles and scenarios related to each type of self-defense before making self-defense judgments. In addition, the impact of personal factors on self-defense judgment was analyzed after the legal attitude, and the attitude toward interpersonal violence, which are personal factors, was also measured. The results showed that the rate of recognition of self-defense was the highest in the type of self-defense for oneself, but the rate of denial of self-defense against state agencies was much higher, indicating the opposite. Furthemore, negative articles on self-defense were found to affect the judgment of self-defense. In addition, it was found that the level of the attitude toward interpersonal violence and legal attitude of individual participants could affect the judgment of self-defense. The general public's judgment process and the factors that affect self-defense judgment may be considered to prevent biased judgment in actual jury trials. Finally, influence, and limitations of this study and suggestions of subsequent study were also discussed.

Study on Risk Priority for TBM Tunnel Collapse based on Bayes Theorem through Case Study (사례분석을 통한 베이즈 정리 기반 TBM 터널 붕괴 리스크 우선순위 도출 연구)

  • Kwon, Kibeom;Kang, Minkyu;Hwang, Byeonghyun;Choi, Hangseok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.785-791
    • /
    • 2023
  • Risk management is essential for preventing accidents arising from uncertainties in TBM tunnel projects, especially concerning managing the risk of TBM tunnel collapse, which can cause extensive damage from the tunnel face to the ground surface. In addition, prioritizing risks is necessary to allocate resources efficiently within time and cost constraints. Therefore, this study aimed to establish a TBM risk database through case studies of TBM accidents and determine a risk priority for TBM tunnel collapse using the Bayes theorem. The database consisted of 87 cases, dealing with three accidents and five geological sources. Applying the Bayes theorem to the database, it was found that fault zones and weak ground significantly increased the probability of tunnel collapse, while the other sources showed low correlations with collapse. Therefore, the risk priority for TBM tunnel collapse, considering geological sources, is as follows: 1) Fault zone, 2) Weak ground, 3) Mixed ground, 4) High in-situ stress, and 5) Expansive ground. In practice, the derived risk priority can serve as a valuable reference for risk management, enhancing the safety and efficiency of TBM construction. It provides guidance for developing appropriate countermeasure plans and allocating resources effectively to mitigate the risk of TBM tunnel collapse.

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

A Study on Sample Allocation for Stratified Sampling (층화표본에서의 표본 배분에 대한 연구)

  • Lee, Ingue;Park, Mingue
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1047-1061
    • /
    • 2015
  • Stratified random sampling is a powerful sampling strategy to reduce variance of the estimators by incorporating useful auxiliary information to stratify the population. Sample allocation is the one of the important decisions in selecting a stratified random sample. There are two common methods, the proportional allocation and Neyman allocation if we could assume data collection cost for different observation units equal. Theoretically, Neyman allocation considering the size and standard deviation of each stratum, is known to be more effective than proportional allocation which incorporates only stratum size information. However, if the information on the standard deviation is inaccurate, the performance of Neyman allocation is in doubt. It has been pointed out that Neyman allocation is not suitable for multi-purpose sample survey that requires the estimation of several characteristics. In addition to sampling error, non-response error is another factor to evaluate sampling strategy that affects the statistical precision of the estimator. We propose new sample allocation methods using the available information about stratum response rates at the designing stage to improve stratified random sampling. The proposed methods are efficient when response rates differ considerably among strata. In particular, the method using population sizes and response rates improves the Neyman allocation in multi-purpose sample survey.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Relationships between Nailfold Plexus Visibility, and Clinical Variables and Neuropsychological Functions in Schizophrenic Patients (정신분열병 환자에서 손톱 주름 총 시도(叢 視度) (Nailfold Plexus Visibility)와 임상양상, 신경심리 기능과의 관계)

  • Kang, Dae-Yeob;Jang, Hye-Ryeon
    • Korean Journal of Biological Psychiatry
    • /
    • v.9 no.1
    • /
    • pp.50-61
    • /
    • 2002
  • Objectives:High nailfold plexus visibility can reflect central nervous system defects as an etiologic factor of schizophrenia indirectly. Previous studies suggest that this visibility is particularly related to the negative symptoms of schizophrenia and frontal lobe deficiency. In this study, we examined the relationships between nailfold plexus visibility, and various clinical variables and neuropsychological functions in schizo-phrenic patients. Methods:Forty patients(21males, 19 females) satisfying the DSM-IV criteria for schizophrenia and thirty eight normal controls(20 males, 18 females) were measured for Plexus Visualization Score(PVS) by using the capillary microscopic examination. For the assessment of psychopathology, process-reactivity, premorbid adjustment, and neuropsychological functions, we used Positive and Negative Syndrome Scale(PANSS), Ullmann-Giovannoni Process-Reactive Questionnaire(PRQ), Phillips Premorbid Adjustment Scale(PAS), Korean Wechsler Adult Intelligence Scale(KWIS), Continuous Performance Test(CPT), Wisconsin Card Sort Test (WCST), and Word Fluency Test. We also collected data about clinical variables. Results:PVS was correlated with PANSS positive symptom score and composite score negatively. There were no correlations between PVS and PRQ score, PAS score and neuropsychological variables respectively. Conclusions:This study showed that nailfold plexus visibility was a characteristic feature in some schizophrenic patients, and that higher plexus visibility was associated with the negative symptoms of schizophrenia. There was no association between plexus visibility and neuropsychological functions.

  • PDF

Rectal Balloon for the Immobilization of the Prostate Internal Motion (전립선암의 방사선치료 시 직장풍선의 유용성 평가)

  • Lee Sang-Kyu;Beak Jong-Geal;Kim Joo-Ho;Jeon Byong-Chul;Cho Jeong-Hee;Kim Dong-Wook;Na Soo-Kyong;Song Tae-Soo;Cho Jae-Ho
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.17 no.2
    • /
    • pp.113-124
    • /
    • 2005
  • Purpose : The using of endo-rectal balloon has proposed as optimal method that minimized the motion of prostate and the dose of rectum wall volume for treated prostate cancer patients, so we make the customized rectal balloon device. In this study, we analyzed the efficiency of the Self-customized rectal balloon in the aspects of its reproducibility. Materials and Methods : In 5 patients, for treatment planning, each patient was acquired CT slice images in state of with and without rectal balloon. Also they had CT scanning samely repeated third times in during radiation treatment (IMRT). In each case, we analyzed the deviation of rectal ballon position and verified the isodose distribution of rectum wall at closed prostate. Results : Using the rectal balloon, we minimized the planning target volume (PTV) by decreased the internal motion of prostate and overcome the dose limit of radiation therapy in prostate cancer by increased the gap between the rectum wall and high dose region. Conclusion : The using of rectal balloon, although, was reluctant to treat by patients. View a point of immobilization of prostate internal motion and dose escalation of GTV (gross tumor volume), its using consider large efficients for treated prostate cancer patients.

  • PDF