• Title/Summary/Keyword: Accuracy of Prediction

Search Result 3,757, Processing Time 0.037 seconds

Studies on the Changes of Sex Hormone Concentrations in Milk during the Reproductive Stages of Dairy Cows (유우의 번식과정에 따른 유즙중의 성호르몬 수준 변화에 관한 연구)

  • 김상근;이재근
    • Korean Journal of Animal Reproduction
    • /
    • v.9 no.1
    • /
    • pp.9-30
    • /
    • 1985
  • The study was carried out to find out the changes of the sex hormone levels in the milk of Holstein cows during the reproductive stages such as the estrous cycle, pregnancy and periparturient period. The FSH, LH, estradiol-17$\beta$ and progesterone from the milk samples were assayed by radioimmunoassay methods. The results of this study were summarized as follows: 1. The levels of progesterone and estradiol-17$\beta$ were similar among inter-quarters, but they were higher in after milking than before milking times, with no statistical significance. 2. The milk progesterone levels during the estrous cycles reached a peak mean level of 3.55$\pm$0.26ng/$m\ell$ at 15 days after estrus and they did not show any differences among the length of estrous cycles. The estradiol-17$\beta$ levels during the estrous cycles showed a peak level of 36.40$\pm$2.38pg/$m\ell$ at estrus, and decreased(17.20$\pm$0.46 pg/$m\ell$ to 18.65$\pm$1.26pg/$m\ell$) at luteal phase. 3. The FSH levels during the estrous cycles ranged from 2.25$\pm$0.23mIU/$m\ell$ to 4.35$\pm$0.24mIU/$m\ell$ showing significant changes. The LH levels during the estrous cycles gradually increased and remained a peak level of 10.90$\pm$0.36mIU/$m\ell$ from 20 to 25 days after estrus. 4. The progesterone levels during the pregnancy were decreased from 30 to 60 days after artificial insemination, and therafter continuously increased until 240 days. The estradiol-17$\beta$ levels during the pregnancy were 24.56$\pm$1.19pg/$m\ell$ at day 30 after artificial inseminaton, and increased rapidly until 180 days. The levles were agagin decreased by 26.17$\pm$3.03pg/$m\ell$ until 210 days and markedly increased by 68.00$\pm$8.70pg/$m\ell$ until 240 days. 5. The prolactin levels during the pregnancy were 31.27$\pm$2.31ng/$m\ell$ and 42.60$\pm$2.37ng/$m\ell$ at day 150 and 240 after artificial insemination respectively. The LH levels during the pregnancy reached a peak of 27.47$\pm$7.90mIU/$m\ell$ at day 30 after artificial insemination, and thereafter gradually decreased. 6. The progesterone levels during the periparturient period reached a peak of 4.61$\pm$0.34ng/$m\ell$ at day 3 prepartum, and thereafter gradually decreased, and showed 2.05$\pm$0.60ng/$m\ell$ at day 7 postpartum. The estradiol-17$\beta$ levels during the periparturient period showed high level from 207.23$\pm$6.04pg/$m\ell$ at day 1 prepartum to 239.90$\pm$13.90pg/$m\ell$ at day 2 prepartum, and thereafter began to decline and reached 51.87$\pm$1.72pg/$m\ell$ at by 7 postpartum. 7. The prolactin levels during the periparturient period showed relatively higher level at the time of parturition. The LH levels during the periparturient period rnage from 6.32$\pm$0.32mIU/$m\ell$ to 13.90$\pm$1.37mIU/$m\ell$ showing significant changes. 8. The progesterone levels(4.6$\pm$0.8ng/$m\ell$) of the pregnant cows were significantly higher than those (1.84$\pm$1.4ng/$m\ell$) of nonpregnant cows. The cows of artificial insemination from 61 to 90 days after parturition showed higher progesterone levels. 9. During 20 to 25 days after artificial insemination, the accuracy of pregnancy diagnosis from milk progesterone levels were 94.4% for nonpregnant cows(<2.3ng/$m\ell$), and 75.0% for pregnant cows( 3.2ng/$m\ell$). The average overall accuracy of pregnancy prediction for nonpregnant and pregnant cows 83.3% 10. The results obtained this study suggest that the understanding of the endocrinological mechanisms by means of milk hormone analysis during the estrous cycle, pregnancy and parturition would give the basic information needed for increasing efficiency of reproduction. This study would not only provide an accurate method of the early pregnancy diagnosis by milk progesterone levels but also contribute to the research of providing the method of detecting of FSH levels in milk, which was difficult in blood serum.

  • PDF

Evaluation of the quality of Italian Ryegrass Silages by Near Infrared Spectroscopy (근적외선 분광법을 이용한 이탈리안 라이그라스 사일리지의 품질 평가)

  • Park, Hyung-Soo;Lee, Sang-Hoon;Choi, Ki-Choon;Lim, Young-Chul;Kim, Jong-Gun;Jo, Kyu-Chea;Choi, Gi-Jun
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.32 no.3
    • /
    • pp.301-308
    • /
    • 2012
  • Near infrared reflectance spectroscopy (NIRS) has become increasingly used as a rapid and accurate method of evaluating some chemical compositions in forages. This study was carried out to explore the accuracy of near infrared spectroscopy (NIRS) for the prediction of chemical parameters of Italian ryegrass silages. A population of 267 Italian ryegrass silages representing a wide range in chemical parameters and fermentative characteristics was used in this investigation. Samples of silage were scanned at 2 nm intervals over the wavelength range 680~2,500 nm and the optical data recorded as log 1/Reflectance (log 1/R) and scanned in intact fresh condition. The spectral data were regressed against a range of chemical parameters using partial least squares (PLS) multivariate analysis in conjunction with spectral math treatments to reduced the effect of extraneous noise. The optimum calibrations were selected on the basis of the highest coefficients of determination in cross validation ($R^2$) and the lowest standard error of cross validation (SECV). The results of this study showed that NIRS predicted the chemical parameters with very high degree of accuracy. The $R^2$ and SECV were 0.98 (SECV 1.27%) for moisture, 0.88 (SECV 1.26%) for ADF, 0.84 (SECV 2.0%), 0.93 (SECV 0.96%) for CP and 0.78 (SECV 0.56), 0.81 (SECV 0.31%), 0.88 (SECV 1.26%) and 0.82 (SECV 4.46) for pH, lactic acid, TDN and RFV on a dry matter (%), respectively. Results of this experiment showed the possibility of NIRS method to predict the chemical composition and fermentation quality of Italian ryegrass silages as routine analysis method in feeding value evaluation and for farmer advice.

Safety and Efficacy of Ultrasound-Guided Percutaneous Core Needle Biopsy of Pancreatic and Peripancreatic Lesions Adjacent to Critical Vessels (주요 혈관 근처의 췌장 또는 췌장 주위 병변에 대한 초음파 유도하 경피적 중심 바늘 생검의 안전성과 효율성)

  • Sun Hwa Chung;Hyun Ji Kang;Hyo Jeong Lee;Jin Sil Kim;Jeong Kyong Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.82 no.5
    • /
    • pp.1207-1217
    • /
    • 2021
  • Purpose To evaluate the safety and efficacy of ultrasound-guided percutaneous core needle biopsy (USPCB) of pancreatic and peripancreatic lesions adjacent to critical vessels. Materials and Methods Data were collected retrospectively from 162 patients who underwent USPCB of the pancreas (n = 98), the peripancreatic area adjacent to the portal vein, the paraaortic area adjacent to pancreatic uncinate (n = 34), and lesions on the third duodenal portion (n = 30) during a 10-year period. An automated biopsy gun with an 18-gauge needle was used for biopsies under US guidance. The USPCB results were compared with those of the final follow-up imaging performed postoperatively. The diagnostic accuracy and major complication rate of the USPCB were calculated. Multiple factors were evaluated for the prediction of successful biopsies using univariate and multivariate analyses. Results The histopathologic diagnosis from USPCB was correct in 149 (92%) patients. The major complication rate was 3%. Four cases of mesenteric hematomas and one intramural hematoma of the duodenum occurred during the study period. The following factors were significantly associated with successful biopsies: a transmesenteric biopsy route rather than a transgastric or transenteric route; good visualization of targets; and evaluation of the entire US pathway. In addition, the number of biopsies required was less when the biopsy was successful. Conclusion USPCB demonstrated high diagnostic accuracy and a low complication rate for the histopathologic diagnosis of pancreatic and peripancreatic lesions adjacent to critical vessels.

Evaluation of Moisture and Feed Values for Winter Annual Forage Crops Using Near Infrared Reflectance Spectroscopy (근적외선분광법을 이용한 동계사료작물 풀 사료의 수분함량 및 사료가치 평가)

  • Kim, Ji Hea;Lee, Ki Won;Oh, Mirae;Choi, Ki Choon;Yang, Seung Hak;Kim, Won Ho;Park, Hyung Soo
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.39 no.2
    • /
    • pp.114-120
    • /
    • 2019
  • This study was carried out to explore the accuracy of near infrared spectroscopy(NIRS) for the prediction of moisture content and chemical parameters on winter annual forage crops. A population of 2454 winter annual forages representing a wide range in chemical parameters was used in this study. Samples of forage were scanned at 1nm intervals over the wavelength range 680-2500nm and the optical data was recorded as log 1/Reflectance(log 1/R), which scanned in intact fresh condition. The spectral data were regressed against a range of chemical parameters using partial least squares(PLS) multivariate analysis in conjunction with spectral math treatments to reduced the effect of extraneous noise. The optimum calibrations were selected based on the highest coefficients of determination in cross validation($R^2$) and the lowest standard error of cross-validation(SECV). The results of this study showed that NIRS calibration model to predict the moisture contents and chemical parameters had very high degree of accuracy except for barely. The $R^2$ and SECV for integrated winter annual forages calibration were 0.99(SECV 1.59%) for moisture, 0.89(SECV 1.15%) for acid detergent fiber, 0.86(SECV 1.43%) for neutral detergent fiber, 0.93(SECV 0.61%) for crude protein, 0.90(SECV 0.45%) for crude ash, and 0.82(SECV 3.76%) for relative feed value on a dry matter(%), respectively. Results of this experiment showed the possibility of NIRS method to predict the moisture and chemical composition of winter annual forage for routine analysis method to evaluate the feed value.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Association between Texture Analysis Parameters and Molecular Biologic KRAS Mutation in Non-Mucinous Rectal Cancer (원발성 비점액성 직장암 환자에서 자기공명영상 기반 텍스처 분석 변수와 KRAS 유전자 변이와의 연관성)

  • Sung Jae Jo;Seung Ho Kim;Sang Joon Park;Yedaun Lee;Jung Hee Son
    • Journal of the Korean Society of Radiology
    • /
    • v.82 no.2
    • /
    • pp.406-416
    • /
    • 2021
  • Purpose To evaluate the association between magnetic resonance imaging (MRI)-based texture parameters and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutation in patients with non-mucinous rectal cancer. Materials and Methods Seventy-nine patients who had pathologically confirmed rectal non-mucinous adenocarcinoma with or without KRAS-mutation and had undergone rectal MRI were divided into a training (n = 46) and validation dataset (n = 33). A texture analysis was performed on the axial T2-weighted images. The association was statistically analyzed using the Mann-Whitney U test. To extract an optimal cut-off value for the prediction of KRAS mutation, a receiver operating characteristic curve analysis was performed. The cut-off value was verified using the validation dataset. Results In the training dataset, skewness in the mutant group (n = 22) was significantly higher than in the wild-type group (n = 24) (0.221 ± 0.283; -0.006 ± 0.178, respectively, p = 0.003). The area under the curve of the skewness was 0.757 (95% confidence interval, 0.606 to 0.872) with a maximum accuracy of 71%, a sensitivity of 64%, and a specificity of 78%. None of the other texture parameters were associated with KRAS mutation (p > 0.05). When a cut-off value of 0.078 was applied to the validation dataset, this had an accuracy of 76%, a sensitivity of 86%, and a specificity of 68%. Conclusion Skewness was associated with KRAS mutation in patients with non-mucinous rectal cancer.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

Prediction of Forest Fire Danger Rating over the Korean Peninsula with the Digital Forecast Data and Daily Weather Index (DWI) Model (디지털예보자료와 Daily Weather Index (DWI) 모델을 적용한 한반도의 산불발생위험 예측)

  • Won, Myoung-Soo;Lee, Myung-Bo;Lee, Woo-Kyun;Yoon, Suk-Hee
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.14 no.1
    • /
    • pp.1-10
    • /
    • 2012
  • Digital Forecast of the Korea Meteorological Administration (KMA) represents 5 km gridded weather forecast over the Korean Peninsula and the surrounding oceanic regions in Korean territory. Digital Forecast provides 12 weather forecast elements such as three-hour interval temperature, sky condition, wind direction, wind speed, relative humidity, wave height, probability of precipitation, 12 hour accumulated rain and snow, as well as daily minimum and maximum temperatures. These forecast elements are updated every three-hour for the next 48 hours regularly. The objective of this study was to construct Forest Fire Danger Rating Systems on the Korean Peninsula (FFDRS_KORP) based on the daily weather index (DWI) and to improve the accuracy using the digital forecast data. We produced the thematic maps of temperature, humidity, and wind speed over the Korean Peninsula to analyze DWI. To calculate DWI of the Korean Peninsula it was applied forest fire occurrence probability model by logistic regression analysis, i.e. $[1+{\exp}\{-(2.494+(0.004{\times}T_{max})-(0.008{\times}EF))\}]^{-1}$. The result of verification test among the real-time observatory data, digital forecast and RDAPS data showed that predicting values of the digital forecast advanced more than those of RDAPS data. The results of the comparison with the average forest fire danger rating index (sampled at 233 administrative districts) and those with the digital weather showed higher relative accuracy than those with the RDAPS data. The coefficient of determination of forest fire danger rating was shown as $R^2$=0.854. There was a difference of 0.5 between the national mean fire danger rating index (70) with the application of the real-time observatory data and that with the digital forecast (70.5).

Study on Tourism Demand Forecast and Influencing Factors in Busan Metropolitan City (부산 연안도시 관광수요 예측과 영향요인에 관한 연구)

  • Kyu Won Hwang;Sung Mo Nam;Ah Reum Jang;Moon Suk Lee
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.915-929
    • /
    • 2023
  • Improvements in people's quality of life, diversification of leisure activities, and changes in population structure have led to an increase in the demand for tourism and an expansion of the diversification of tourism activities. In particular, for coastal cities where land and marine tourism elements coexist, various factors influence their tourism demands. Tourism requires the construction of infrastructure and content development according to the demand at the tourist destination. This study aims to improve the prediction accuracy and explore influencing factors through time series analysis of tourism scale using agent-based data. Basic local governments in the Busan area were examined, and the data used were the number of tourists and the amount of tourism consumption on a monthly basis. The univariate time series analysis, which is a deterministic model, was used along with the SARIMAX analysis to identify the influencing factor. The tourism consumption propensity, focusing on the consumption amount according to business types and the amount of mentions on SNS, was set as the influencing factor. The difference in accuracy (RMSE standard) between the time series models that did and did not consider COVID-19 was found to be very wide, ranging from 1.8 times to 32.7 times by region. Additionally, considering the influencing factor, the tourism consumption business type and SNS trends were found to significantly impact the number of tourists and the amount of tourism consumption. Therefore, to predict future demand, external influences as well as the tourists' consumption tendencies and interests in terms of local tourism must be considered. This study aimed to predict future tourism demand in a coastal city such as Busan and identify factors affecting tourism scale, thereby contributing to policy decision-making to prepare tourism demand in consideration of government tourism policies and tourism trends.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.