• Title/Summary/Keyword: Prediction Process Prediction Process

Search Result 3,109, Processing Time 0.034 seconds

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

A Study on the Ecosystem Services Value Assessment According to City Development: In Case of the Busan Eco-Delta City Development (도시개발에 따른 생태계서비스 가치 평가 연구: 부산 에코델타시티 사업을 대상으로)

  • Choi, Jiyoung;Lee, Youngsoo;Lee, Sangdon
    • Journal of Environmental Impact Assessment
    • /
    • v.28 no.5
    • /
    • pp.427-439
    • /
    • 2019
  • Natural environmental ecology ofthe environmental impact assessment(EIA)is very much lacking in quantitative evaluation. Thus, this study attempted to evaluate quantitative assessment for ecosystem service in the site of Eco-delta project in Busan. As a part of climate change adaptation, this study evaluated and compared with the value for carbon fixation and habitat quality using the InVEST model before and after development with three alternatives of land-use change. Carbon fixation showed 216,674.48 Mg of C (year 2000), and 203,474.25 Mg of C (year 2015)reducing about 6.1%, and in the future of year 2030 the value was dropped to 120,490.84 Mg of C which is 40% lower than year 2015. Alternative 3 of land use planning was the best in terms of carbon fixation showing 6,811.31 Mg of C. Habitat quality also changed from 0.57 (year 2000), 0.35 (year 2015), and 0.21 (year 2030) with continued degradation as development goes further. Alternative 3 also was the highest with 0.21(Alternative 1 : 0.20, Alternative 2 : 0.18). In conclusion,this study illustrated that quantitative method forland use change in the process of EIA can helpdecision making for stakeholders anddevelopers with serving the best scenario forlow impact of carbon. Also it can help better for land use plan, greenhouse gas and natural environmental assets in EIA. This study could be able to use in the environmental policy with numerical data of ecosystem and prediction. Supplemented with detailed analysis and accessibility of basic data, this method will make it possible for wide application in the ecosystem evaluation.

Recent Changes in Bloom Dates of Robinia pseudoacacia and Bloom Date Predictions Using a Process-Based Model in South Korea (최근 12년간 아까시나무 만개일의 변화와 과정기반모형을 활용한 지역별 만개일 예측)

  • Kim, Sukyung;Kim, Tae Kyung;Yoon, Sukhee;Jang, Keunchang;Lim, Hyemin;Lee, Wi Young;Won, Myoungsoo;Lim, Jong-Hwan;Kim, Hyun Seok
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.322-340
    • /
    • 2021
  • Due to climate change and its consequential spring temperature rise, flowering time of Robinia pseudoacacia has advanced and a simultaneous blooming phenomenon occurred in different regions in South Korea. These changes in flowering time became a major crisis in the domestic beekeeping industry and the demand for accurate prediction of flowering time for R. pseudoacacia is increasing. In this study, we developed and compared performance of four different models predicting flowering time of R. pseudoacacia for the entire country: a Single Model for the country (SM), Modified Single Model (MSM) using correction factors derived from SM, Group Model (GM) estimating parameters for each region, and Local Model (LM) estimating parameters for each site. To achieve this goal, the bloom date data observed at 26 points across the country for the past 12 years (2006-2017) and daily temperature data were used. As a result, bloom dates for the north central region, where spring temperature increase was more than two-fold higher than southern regions, have advanced and the differences compared with the southwest region decreased by 0.7098 days per year (p-value=0.0417). Model comparisons showed MSM and LM performed better than the other models, as shown by 24% and 15% lower RMSE than SM, respectively. Furthermore, validation with 16 additional sites for 4 years revealed co-krigging of LM showed better performance than expansion of MSM for the entire nation (RMSE: p-value=0.0118, Bias: p-value=0.0471). This study improved predictions of bloom dates for R. pseudoacacia and proposed methods for reliable expansion to the entire nation.

Coupled Hydro-Mechanical Modelling of Fault Reactivation Induced by Water Injection: DECOVALEX-2019 TASK B (Benchmark Model Test) (유체 주입에 의한 단층 재활성 해석기법 개발: 국제공동연구 DECOVALEX-2019 Task B(Benchmark Model Test))

  • Park, Jung-Wook;Kim, Taehyun;Park, Eui-Seob;Lee, Changsoo
    • Tunnel and Underground Space
    • /
    • v.28 no.6
    • /
    • pp.670-691
    • /
    • 2018
  • This study presents the research results of the BMT(Benchmark Model Test) simulations of the DECOVALEX-2019 project Task B. Task B named 'Fault slip modelling' is aiming at developing a numerical method to predict fault reactivation and the coupled hydro-mechanical behavior of fault. BMT scenario simulations of Task B were conducted to improve each numerical model of participating group by demonstrating the feasibility of reproducing the fault behavior induced by water injection. The BMT simulations consist of seven different conditions depending on injection pressure, fault properties and the hydro-mechanical coupling relations. TOUGH-FLAC simulator was used to reproduce the coupled hydro-mechanical process of fault slip. A coupling module to update the changes in hydrological properties and geometric features of the numerical mesh in the present study. We made modifications to the numerical model developed in Task B Step 1 to consider the changes in compressibility, Permeability and geometric features with hydraulic aperture of fault due to mechanical deformation. The effects of the storativity and transmissivity of the fault on the hydro-mechanical behavior such as the pressure distribution, injection rate, displacement and stress of the fault were examined, and the results of the previous step 1 simulation were updated using the modified numerical model. The simulation results indicate that the developed model can provide a reasonable prediction of the hydro-mechanical behavior related to fault reactivation. The numerical model will be enhanced by continuing interaction and collaboration with other research teams of DECOVALEX-2019 Task B and validated using the field experiment data in a further study.

Characteristics Analysis of Snow Particle Size Distribution in Gangwon Region according to Topography (지형에 따른 강원지역의 강설입자 크기 분포 특성 분석)

  • Bang, Wonbae;Kim, Kwonil;Yeom, Daejin;Cho, Su-jeong;Lee, Choeng-lyong;Lee, Daehyung;Ye, Bo-Young;Lee, GyuWon
    • Journal of the Korean earth science society
    • /
    • v.40 no.3
    • /
    • pp.227-239
    • /
    • 2019
  • Heavy snowfall events frequently occur in the Gangwon province, and the snowfall amount significantly varies in space due to the complex terrain and topographical modulation of precipitation. Understanding the spatial characteristics of heavy snowfall and its prediction is particularly challenging during snowfall events in the easterly winds. The easterly wind produces a significantly different atmospheric condition. Hence, it brings different precipitation characteristics. In this study, we have investigated the microphysical characteristics of snowfall in the windward and leeward sides of the Taebaek mountain range in the easterly condition. The two snowfall events are selected in the easterly, and the snow particles size distributions (SSD) are observed in the four sites (two windward and two leeward sites) by the PARSIVEL distrometers. We compared the characteristic parameters of SSDs that come from leeward sites to that of windward sites. The results show that SSDs of windward sites have a relatively wide distribution with many small snow particles compared to those of leeward sites. This characteristic is clearly shown by the larger characteristic number concentration and characteristic diameter in the windward sites. Snowfall rate and ice water content of windward also are larger than those of leeward sites. The results indicate that a new generation of snowfall particles is dominant in the windward sites which is likely due to the orographic lifting. In addition, the windward sites show heavy aggregation particles by nearby zero ground temperature that is likely driven by the wet and warm condition near the ocean.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Development of Stand Yield Table Based on Current Growth Characteristics of Chamaecyparis obtusa Stands (현실임분 생장특성에 의한 편백 임분수확표 개발)

  • Jung, Su Young;Lee, Kwang Soo;Lee, Ho Sang;Ji Bae, Eun;Park, Jun Hyung;Ko, Chi-Ung
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.4
    • /
    • pp.477-483
    • /
    • 2020
  • We constructed a stand yield table for Chamaecyparis obtusa based on data from an actual forest. The previous stand yield table had a number of disadvantages because it was based on actual forest information. In the present study we used data from more than 200 sampling plots in a stand of Chamaecyparis obtusa. The analysis included theestimation, recovery and prediction of the distribution of values for diameter at breast height (DBH), and the result is a valuable process for the preparation ofstand yield tables. The DBH distribution model uses a Weibull function, and the site index (base age: 30 years), the standard for assessing forest productivity, was derived using the Chapman-Richards formula. Several estimation formulas for the preparation of the stand yield table were considered for the fitness index, and the optimal formula was chosen. The analysis shows that the site index is in the range of 10 to 18 in the Chamaecyparis obtusa stand. The estimated stand volume of each sample plot was found to have an accuracy of 62%. According to the residuals analysis, the stands showed even distribution around zero, which indicates that the results are useful in the field. Comparing the table constructed in this study to the existing stand yield table, we found that our table yielded comparatively higher values for growth. This is probably because the existing analysis data used a small amount of research data that did not properly reflect. We hope that the stand yield table of Chamaecyparis obtusa, a representative species of southern regions, will be widely used for forest management. As these forests stabilize and growth progresses, we plan to construct an additional yield table applicable to the production of developed stands.

The Clinical Utility of Korean Bayley Scales of Infant and Toddler Development-III - Focusing on using of the US norm - (베일리영유아발달검사 제3판(Bayley-III)의 미국 규준 적용의 문제: 미숙아 집단을 대상으로)

  • Lim, Yoo Jin;Bang, Hee Jeong;Lee, Soonhang
    • Korean journal of psychology:General
    • /
    • v.36 no.1
    • /
    • pp.81-107
    • /
    • 2017
  • The study aims to investigate the clinical utility of Bayley-III using US norm in Korea. A total of 98 preterm infants and 93 term infants were assessed with the K-Bayley-III. The performance pattern of preterm infants was analyzed with mixed design ANOVA which examined the differences of scaled scores and composite scores of Bayley-III between full term- and preterm- infant group and within preterm infants group. Then, We have investigated agreement between classifications of delay made using the BSID-II and Bayley-III. In addition, ROC plots were constructed to identify a Bayley-III cut-off score with optimum diagnostic utility in this sample. The results were as follows. (1) Preterm infants have significantly lower function levels in areas of 5 scaled scores and 3 developmental indexes compared with infants born at term. Significant differences among scores within preterm infant group were also found. (2) Bayley-III had the higher scores of the Mental Development Index and Psychomotor Developmental Index comparing to the scores of K-BSID-II, and had the lower rates of developmental delay. (3) All scales of Bayley-III, Cognitive, Language and Motor scale had the appropriate level of discrimination, but the cut-off composite scores of Bayley-III were adjusted 13~28 points higher than 69 for prediction of delay, as defined by the K-BSID-II. It explains the lower rates of developmental delay using the standard of two standard deviation. This study has provided empirical data to inform that we must careful when interpreting the score for clinical applications, identified the discriminating power, and proposed more appropriate cut-off scores. In addition, discussion about the sampling for making the Korean norm of Bayley-III was provided. It is preferable that infants in Korea should use our own validated norms. The standardization process to get Korean normative data must be performed carefully.

A Prediction of N-value Using Artificial Neural Network (인공신경망을 이용한 N치 예측)

  • Kim, Kwang Myung;Park, Hyoung June;Goo, Tae Hun;Kim, Hyung Chan
    • The Journal of Engineering Geology
    • /
    • v.30 no.4
    • /
    • pp.457-468
    • /
    • 2020
  • Problems arising during pile design works for plant construction, civil and architecture work are mostly come from uncertainty of geotechnical characteristics. In particular, obtaining the N-value measured through the Standard Penetration Test (SPT) is the most important data. However, it is difficult to obtain N-value by drilling investigation throughout the all target area. There are many constraints such as licensing, time, cost, equipment access and residential complaints etc. it is impossible to obtain geotechnical characteristics through drilling investigation within a short bidding period in overseas. The geotechnical characteristics at non-drilling investigation points are usually determined by the engineer's empirical judgment, which can leads to errors in pile design and quantity calculation causing construction delay and cost increase. It would be possible to overcome this problem if N-value could be predicted at the non-drilling investigation points using limited minimum drilling investigation data. This study was conducted to predicted the N-value using an Artificial Neural Network (ANN) which one of the Artificial intelligence (AI) method. An Artificial Neural Network treats a limited amount of geotechnical characteristics as a biological logic process, providing more reliable results for input variables. The purpose of this study is to predict N-value at the non-drilling investigation points through patterns which is studied by multi-layer perceptron and error back-propagation algorithms using the minimum geotechnical data. It has been reviewed the reliability of the values that predicted by AI method compared to the measured values, and we were able to confirm the high reliability as a result. To solving geotechnical uncertainty, we will perform sensitivity analysis of input variables to increase learning effect in next steps and it may need some technical update of program. We hope that our study will be helpful to design works in the future.