• Title/Summary/Keyword: DISTRIBUTION

Search Result 66,986, Processing Time 0.094 seconds

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

The Study on Conservation and Management of Natural Habitat of Spleenworts on Samdo Island (Asplenium antiquum Makino), Jeju (Natural Monument No. 18) (천연기념물 제주 삼도 파초일엽 자생지 생육 및 관리 현황 연구)

  • Shin, Jin-Ho;Kim, Han;Lee, Na-Ra;Son, Ji-Won
    • Korean Journal of Environment and Ecology
    • /
    • v.33 no.3
    • /
    • pp.280-291
    • /
    • 2019
  • A. antiquum, first observed in Jeju Samdo Island in 1949, was designated as the Natural Monument No. 18 in December 1962 in recognition of its academic value. In Korea, it grows in nature only in Samdo in Jeju Island. Although its natural habitat was greatly damaged and almost destroyed due to firewood, stealing, etc. After the emancipation, it has been maintained by the transplantation and restoration. The site observed by this study has been managed as a restricted area since 2011. Since it has been about 20 years since the restoration of the native site in the 2000s, it is necessary to check the official management history records, such as the origin of transplantation and restoration to monitor the changes in the growth status and to control the habitat. As the results of this study, we have secured the records of cultural property management history, such as the identification of native species and the transplantation and restoration records. We also examined the change of the growth and development of A. antiquum 20 years after the restoration. There are no official records of the individuals transplanted to the restored natural habitat of A. antiquum in the 1970s and 1980s, and there was a controversy about the nativeness of those individuals that were restored and transplanted in 1974 since they were Japanese individuals. The studies of identifying native as the results of this study, we have secured the records of cultural property management history, such as the identification of native species and the transplantation and restoration records. We also examined the change of the growth and development of A. antiquum 20 years after the restoration. There are two sites in natural habitat in Samdo Island. A total of 65 individuals grow in three layers on three stone walls in a site while 29 individuals grow in two columns in the other site. A. antiquum grows in an evergreen broad-leaved forest dominated by Neolitsea sericea, and we did not find any other individuals of naturally growing A. antiquum outside the investigated site. This study checked the distribution of A. antiquum seedlings observed initially after the restoration. There were more than 300 seedling individuals, and we selected three densely populated sites for monitoring. There were 23 A. antiquum seedlings with 4 - 17 leaves per individual and the leaf length of 0.5 - 20 cm in monitoring site 1. There were 88 individuals with 5 - 6 leaves per individual and the leaf length of 1.3 - 10.4 cm in monitoring site 2 while there were 22 individuals with 5 - 9 leaves per individual and the leaf length of 4.5 - 12.1 cm in monitoring site 3. Although the natural habitat of A. antiquum was designated as a restricted public area in 2011, there is a high possibility that the habitat can be damaged because some activities, such as fishing and scuba diving are allowed. Therefore, it is necessary to enforce the law strictly, to provide sufficient education for the preservation of natural treasures, and to present accurate information about cultural assets.

Conservation Status, Construction Type and Stability Considerations for Fortress Wall in Hongjuupseong (Town Wall) of Hongseong, Korea (홍성 홍주읍성 성벽의 보존상태 및 축성유형과 안정성 고찰)

  • Park, Junhyoung;Lee, Chanhee
    • Korean Journal of Heritage: History & Science
    • /
    • v.51 no.3
    • /
    • pp.4-31
    • /
    • 2018
  • It is difficult to ascertain exactly when the Hongjuupseong (Town Wall) was first constructed, due to it had undergone several times of repair and maintenance works since it was piled up newly in 1415, when the first year of the reign of King Munjong (the 5th King of the Joseon Dynasty). Parts of its walls were demolished during the Japanese occupation, leaving the wall as it is today. Hongseong region is also susceptible to historical earthquakes for geological reasons. There have been records of earthquakes, such as the ones in 1978 and 1979 having magnitudes of 5.0 and 4.0, respectively, which left part of the walls collapsed. Again, in 2010, heavy rainfall destroyed another part of the wall. The fortress walls of the Hongjuupseong comprise various rocks, types of facing, building methods, and filling materials, according to sections. Moreover, the remaining wall parts were reused in repair works, and characteristics of each period are reflected vertically in the wall. Therefore, based on the vertical distribution of the walls, the Hongjuupseong was divided into type I, type II, and type III, according to building types. The walls consist mainly of coarse-grained granites, but, clearly different types of rocks were used for varying types of walls. The bottom of the wall shows a mixed variety of rocks and natural and split stones, whereas the center is made up mostly of coarse-grained granites. For repairs, pink feldspar granites was used, but it was different from the rock variety utilized for Suguji and Joyangmun Gate. Deterioration types to the wall can be categorized into bulging, protrusion of stones, missing stones at the basement, separation of framework, fissure and fragmentation, basement instability, and structural deformation. Manually and light-wave measurements were used to check the amount and direction of behavior of the fortress walls. A manual measurement revealed the sections that were undergoing structural deformation. Compared with the result of the light-wave measurement, the two monitoring methods proved correlational. As a result, the two measuring methods can be used complementarily for the long-term conservation and management of the wall. Additionally, the measurement system must be maintained, managed, and improved for the stability of the Hongjuupseong. The measurement of Nammunji indicated continuing changes in behavior due to collapse and rainfall. It can be greatly presumed that accumulated changes over the long period reached the threshold due to concentrated rainfall and subsequent behavioral irregularities, leading to the walls' collapse. Based on the findings, suggestions of the six grades of management from 0 to 5 have been made, to manage the Hongjuupseong more effectively. The applied suggested grade system of 501.9 m (61.10%) was assessed to grade 1, 29.5 m (3.77%) to grade 2, 10.4 m (1.33%) to grade 3, 241.2 m (30.80%) and grade 4. The sections with grade 4 concentrated around the west of Honghwamun Gate and the east of the battlement, which must be monitored regularly in preparation for a potential emergency. The six-staged management grade system is cyclical, where after performing repair and maintenance works through a comprehensive stability review, the section returned to grade 0. It is necessary to monitor thoroughly and evaluate grades on a regular basis.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Types and Characteristics of Traditional Music Performance of the 1920s - Focused on the mixed performances type in the western-style genre - (1920년대 전통음악공연의 형태와 특징 - 서양식 장르와의 혼성공연형태를 중심으로 -)

  • Keum, Yong-woong
    • (The) Research of the performance art and culture
    • /
    • no.35
    • /
    • pp.61-92
    • /
    • 2017
  • During the Japanese colonial era, traditional music performances were gradually diminishing and weakening in the particular condition of colonization. Meanwhile, from the time of enlightenment, Western genre performances were becoming vitalized with the influence of Western civilization that began to be spread steadily throughout the society. In that situation, traditional music performances tended to be mixed performances accompanied by Western ones, not independent performances. Mostly, they were accompanied by Western music, and also, they were performed along with other genres like plays, lectures, movies, dances, or magic, too. Such form of mixed performances accompanied by Western genres became even more vitalized in the 1920's and came to be positioned as a form of traditional music performances. Therefore, research on the forms of mixed performances between Western genres and traditional music is meaningful in examining the forms of traditional music performances that have not been studied in the history of Korean modern music and understanding the trends of traditional music performances which were generally found in the Japanese colonial era. However, such research has hardly been conducted concretely yet. Accordingly, concerning the forms of mixed performances between Western genres and traditional music in the 1920's, this author considered the background of vitalizing mixed performances between Western genres and traditional music mainly with newspaper articles of the time and their formal characteristics. Regarding the background of vitalizing the forms of mixed performances between Western genres and traditional music, from the 1920's, the forms of mixed performances between Western genres and traditional music became more vitalized than before. The causes of that may include the increase of groups hosting or sponsoring such performances from the 1920's and also the dramatic increase of such performances in general. Moreover, the increased performances were conducted in the forms of mixed performances mainly in order to satisfy the people's needs becoming diversified with the distribution of Western civilization. Concerning the formal characteristics of mixed performances between Western genres and traditional music, this researcher classified western genres performed with traditional music and examined what characteristics were found in such mixed performances of tradition music by the types of Western genres respectively. First, in the mixed performances type of western-type genre and traditional music, the number of programs for the western music had significant portion in general, and there were certain ensemble of the western music and traditional musical instrument that was rare at this period of time, and it also had the characteristics of classifying two genres to perform for each title or date. Second, in the mixed performances type of the drama and traditional music, the traditional music is directly participated in the drama with the similar type to the theater, or performed independently from the drama with the role of interlude performance for the stage conversion of the drama to have the characteristics of performing in audience publicity or entertainment. Third, in the mixed performances type of the lecture and traditional music, the traditional music is played before or after the lecture to play the role to set the atmosphere and entertainment for the lecture as displaying the feature to perform for the audience attraction. And, fourth, in the mixed performances type of the movie and traditional music, the traditional music sometimes directly participated in the movie or had the features of independent performance, and there was a characteristic to perform for the entertainment after showing a movie.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

Performance Evaluation of Radiochromic Films and Dosimetry CheckTM for Patient-specific QA in Helical Tomotherapy (나선형 토모테라피 방사선치료의 환자별 품질관리를 위한 라디오크로믹 필름 및 Dosimetry CheckTM의 성능평가)

  • Park, Su Yeon;Chae, Moon Ki;Lim, Jun Teak;Kwon, Dong Yeol;Kim, Hak Joon;Chung, Eun Ah;Kim, Jong Sik
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.32
    • /
    • pp.93-109
    • /
    • 2020
  • Purpose: The radiochromic film (Gafchromic EBT3, Ashland Advanced Materials, USA) and 3-dimensional analysis system dosimetry checkTM (DC, MathResolutions, USA) were evaluated for patient-specific quality assurance (QA) of helical tomotherapy. Materials and Methods: Depending on the tumors' positions, three types of targets, which are the abdominal tumor (130.6㎤), retroperitoneal tumor (849.0㎤), and the whole abdominal metastasis tumor (3131.0㎤) applied to the humanoid phantom (Anderson Rando Phantom, USA). We established a total of 12 comparative treatment plans by the four geometric conditions of the beam irradiation, which are the different field widths (FW) of 2.5-cm, 5.0-cm, and pitches of 0.287, 0.43. Ionization measurements (1D) with EBT3 by inserting the cheese phantom (2D) were compared to DC measurements of the 3D dose reconstruction on CT images from beam fluence log information. For the clinical feasibility evaluation of the DC, dose reconstruction has been performed using the same cheese phantom with the EBT3 method. Recalculated dose distributions revealed the dose error information during the actual irradiation on the same CT images quantitatively compared to the treatment plan. The Thread effect, which might appear in the Helical Tomotherapy, was analyzed by ripple amplitude (%). We also performed gamma index analysis (DD: 3mm/ DTA: 3%, pass threshold limit: 95%) for pattern check of the dose distribution. Results: Ripple amplitude measurement resulted in the highest average of 23.1% in the peritoneum tumor. In the radiochromic film analysis, the absolute dose was on average 0.9±0.4%, and gamma index analysis was on average 96.4±2.2% (Passing rate: >95%), which could be limited to the large target sizes such as the whole abdominal metastasis tumor. In the DC analysis with the humanoid phantom for FW of 5.0-cm, the three regions' average was 91.8±6.4% in the 2D and 3D plan. The three planes (axial, coronal, and sagittal) and dose profile could be analyzed with the entire peritoneum tumor and the whole abdominal metastasis target, with planned dose distributions. The dose errors based on the dose-volume histogram in the DC evaluations increased depending on FW and pitch. Conclusion: The DC method could implement a dose error analysis on the 3D patient image data by the measured beam fluence log information only without any dosimetry tools for patient-specific quality assurance. Also, there may be no limit to apply for the tumor location and size; therefore, the DC could be useful in patient-specific QAl during the treatment of Helical Tomotherapy of large and irregular tumors.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Feed Value of the Different Plant Parts of Main Forage Rice Varieties (사료용 벼 주요 품종의 수확부위 별 사료가치)

  • Ahn, Eok-Keun;Won, Yong-Jae;Kang, Kyung-Ho;Park, Hyang-Mi;Jung, Kuk-Hyun;Hyun, Ung-Jo;Lee, Yoon-Sung
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.67 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • In order to manufacture feed suitable for consumer use and provide feed value information, we analyzed the feed components of the four main forage rice varieties by plant parts harvested 30 days after heading. The contents of the six feed ingredients were significantly different (p<0.05) among harvested parts. In the panicle, the crude protein (CP) (6.97%) and lignin (3.11%) were the highest, while the crude ash (CA) and neutral detergent fiber (NDF) contents were significantly lower, resulting in a total digestible nutrient (TDN) content of 77.29%, which is higher than that of the stem (64.82%) and leaf blade and sheath (LBS) (63.57%) (p<0.05). In contrast, the content of crude fat (CF) did not differ significantly among parts (p<0.05). In panicles from 'Jonong', 'Nokyang' and 'Yeongwoo', the TDN content of each cultivar was 78.48-79.07%, with no significant difference among the varieties. In 'Mogwoo' (Mw), the CP content was 8.70%, which was much higher than that of other varieties (p<0.05). In particular, the Mw TDN content was slightly lower in the panicle (72.95%) but higher in the stem (75.37%) and LBS (66.49%) than in the other varieties. The CA, NDF, acid detergent fiber (ADF), and lignin contents were also very low compared to other varieties; therefore, the feed value of the stem and LBS was excellent. In addition, the total dry matter weight (DMW) was 123 g per hill, which was much higher than 82-105 g per hill for other varieties. The distribution of DMW by part was LBS (56.9 g), stem (36.8 g), and panicle (29.3 g), and because the parts, except the panicles, were much higher than the 43-57% of other varieties (grain straw ratio: 76%), rice straw is advantageous in terms of quantity and feed value when used as forage on farms. The relative feed value (RFV) of the four cultivars ranged from 86.79-403.74 across all parts, and hay of grade 3 or higher with an RFV of 100 or more increased with delayed heading in both stems and LBS. This is due to the accumulation of starch into grains during ripening, which supports the observation that the RFV of the early flowering 'Jonong' and 'Nokyang' panicles increased.

Varietal and Locational Variation of Grain Quality Components of Rice Produced n Middle and Southern Plain Areas in Korea (중ㆍ남부 평야지산 발 형태 및 이화학적 특성의 품종 및 산지간 변이)

  • Choi, Hae-Chune;Chi, Jeong-Hyun;Lee, Chong-Seob;Kim, Young-Bae;Cho, Soo-Yeon
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.39 no.1
    • /
    • pp.15-26
    • /
    • 1994
  • To understand the relative contribution of varietal and environmental variation on various grain quality components in rice, grain appearance, milling recovery, several physicochemical properties of rice grain and texture or palatability of cooked rice for milled rice materials of seven cultivars(five japonica & two Tongil-type), produced at six locations of the middle and southern plain area of Korea in 1989, were evaluated and analyzed the obtained data. Highly significant varietal variations were detected in all grain quality components of the rice materials and marked locational variations with about 14-54% portion of total variation were recognized in grain appearance, milling recovery, alkali digestibility, protein content, K /Mg ratio, gelatinization temperature, breakdown and setback viscosities. Variations of variety x location interaction were especially large in overall palatability score of cooked rice and consistency or set- back viscosities of amylograph. Tongil-type cultivars showed poor marketing quality, lower milling recovery, slightly lower alkali digestibility and amylose content, a little higher protein content and K /Mg ratio, relatively higher peak, breakdown and consistency viscosities, significantly lower setback viscosity, and more undesirable palatability of cooked rice compared with japonica rices. The japonica rice varieties possessing good palatability of cooked rice were slightly low in protein content and a little high in K /Mg ratio and stickiness /hardness ratio of cooked rice. Rice 1000-kernel weight was significantly heavier in rice materials produced in Iri lowland compared with other locations. Milling recovery from rough to brown rice and ripening quality were lowest in Milyang late-planted rice while highest in Iri lowland and Gyehwa reclaimed-land rice. Amylose content of milled rice was about 1% lower in Gyehwa rice compared with other locations. Protein content of polished rice was about 1% lower in rice materials of middle plain area than those of southern plain regions. K/Mg ratio of milled rice was lowest in Iri rice while highest in Milyang rice. Alkali digestibility was highest in Milyang rice while lowest in Honam plain rice, but the temperature of gelatinization initiation of rice flour in amylograph was lowest in Suwon and Iri rices while highest in Milyang rice. Breakdown viscosity was lowest in Milyang rice and next lower in Ichon lowland rice while highest in Gyehwa and Iri rices, and setback viscosity was the contrary tendency. The stickiness/hardness ratio of cooked rice was slightly lower in southern-plain rices than in middle-plain ones, and the palatability of cooked rice was best in Namyang reclaimed-land rice and next better with the order of Suwon$\geq$Iri$\geq$Ichon$\geq$Gyehwa$\geq$Milyang rices. The rice materials can be classified genotypically into two ecotypes of japonica and Tongil-type rice groups, and environmentally into three regions of Milyang, middle and Honam lowland by the distribution on the plane of 1st and 2nd principal components contracted from eleven grain quality properties closely associated with palatability of cooked rice by principal component analysis.

  • PDF