• Title/Summary/Keyword: searching model

Search Result 775, Processing Time 0.032 seconds

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Dynamic Traffic Assignment Using Genetic Algorithm (유전자 알고리즘을 이용한 동적통행배정에 관한 연구)

  • Park, Kyung-Chul;Park, Chang-Ho;Chon, Kyung-Soo;Rhee, Sung-Mo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.8 no.1 s.15
    • /
    • pp.51-63
    • /
    • 2000
  • Dynamic traffic assignment(DTA) has been a topic of substantial research during the past decade. While DTA is gradually maturing, many aspects of DTA still need improvement, especially regarding its formulation and solution algerian Recently, with its promise for In(Intelligent Transportation System) and GIS(Geographic Information System) applications, DTA have received increasing attention. This potential also implies higher requirement for DTA modeling, especially regarding its solution efficiency for real-time implementation. But DTA have many mathematical difficulties in searching process due to the complexity of spatial and temporal variables. Although many solution algorithms have been studied, conventional methods cannot iud the solution in case that objective function or constraints is not convex. In this paper, the genetic algorithm to find the solution of DTA is applied and the Merchant-Nemhauser model is used as DTA model because it has a nonconvex constraint set. To handle the nonconvex constraint set the GENOCOP III system which is a kind of the genetic algorithm is used in this study. Results for the sample network have been compared with the results of conventional method.

  • PDF

Development and Analysis of COMS AMV Target Tracking Algorithm using Gaussian Cluster Analysis (가우시안 군집분석을 이용한 천리안 위성의 대기운동벡터 표적추적 알고리듬 개발 및 분석)

  • Oh, Yurim;Kim, Jae Hwan;Park, Hyungmin;Baek, Kanghyun
    • Korean Journal of Remote Sensing
    • /
    • v.31 no.6
    • /
    • pp.531-548
    • /
    • 2015
  • Atmospheric Motion Vector (AMV) from satellite images have shown Slow Speed Bias (SSB) in comparison with rawinsonde. The causes of SSB are originated from tracking, selection, and height assignment error, which is known to be the leading error. However, recent works have shown that height assignment error cannot be fully explained the cause of SSB. This paper attempts a new approach to examine the possibility of SSB reduction of COMS AMV by using a new target tracking algorithm. Tracking error can be caused by averaging of various wind patterns within a target and changing of cloud shape in searching process over time. To overcome this problem, Gaussian Mixture Model (GMM) has been adopted to extract the coldest cluster as target since the shape of such target is less subject to transformation. Then, an image filtering scheme is applied to weigh more on the selected coldest pixels than the other, which makes it easy to track the target. When AMV derived from our algorithm with sum of squared distance method and current COMS are compared with rawindsonde, our products show noticeable improvement over COMS products in mean wind speed by an increase of $2.7ms^{-1}$ and SSB reduction by 29%. However, the statistics regarding the bias show negative impact for mid/low level with our algorithm, and the number of vectors are reduced by 40% relative to COMS. Therefore, further study is required to improve accuracy for mid/low level winds and increase the number of AMV vectors.

Estimation Model for Simplification and Validation of Soil Water Characteristics Curve on Volcanic Ash Soil in Subtropical Area in Korea (난지권 화산회토양의 토색별 토양수분 특성곡선 및 단일화 추정모형)

  • Hur, Seung-Oh;Moon, Kyung-Hwan;Jung, Kang-Ho;Ha, Sang-Keun;Song, Kwan-Cheol;Lim, Han-Cheol;Kim, Geong-Gyu
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.39 no.6
    • /
    • pp.329-333
    • /
    • 2006
  • Most of volcanic ash soils in South Korea are distributed in Jeju province which is an island placed on southern part of Korea and has steep slope mountain area. There are many soils containing high contents of organic matter (OM) derived from volcanic ash in Jejudo, also. Therefore, irrigation and drainage in volcanic ash soil different with general soil which has low OM content have to be applied with another management way, but studies searching appropriate methods for them are set on insufficient situation because the area of volcanic ash soil in South Korea is only 1.3% (130,000ha). This study was conducted for analysis of soil water content and irrigation quantity appropriate for crops cultivated in volcanic ash soil with high OM content. Although soils with different soil color have the same soil texture, soil water characteristics curve by soil color showed the difference of water retention capability by OM content. But, this characteristics classified with soil color could be unified by scaling technique with similitude analysis method which get dimensionless water content using a present water content, a residual water content and saturated water content (or water content at 10kPa). A relation of gravimetric soil water content (GSWC) and dimensionless water content by the results showed a form of power function. The dimensionless water content (DWC) express a relative saturation degree of present water content. This was also expressed by van Genuchten model which describe the relation between relative saturation degrees and matric potentials. These results on soil water characteristics curve (SWCC) of volcanic ash soil will be the basic of irrigation plan in area having high organic contents into soil.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

A Study on the Resilience Process of Persons with Disabilities (중도장애인의 레질리언스(Resilience) 과정에 관한 연구)

  • Kim, Mi-Ok
    • Korean Journal of Social Welfare
    • /
    • v.60 no.2
    • /
    • pp.99-129
    • /
    • 2008
  • This study analyzed the resilience process of persons with disabilities using the grounded theory approach. To conduct this study, the researcher conducted in-depth interviews with 8 persons with disabilities. In data analysis, this study identified 393 concepts on the resilience process of persons with disabilities and the concepts were categorized into 45 sub-categories and 18 primary categories. In the paradigm model on the resilience process of persons with disabilities, it was identified that casual conditions included 'unawareness of disability before being disability', 'extreme pain', 'repressing psychological pain', and the contingent conditions were 'dis-empowerment by staying in home', 'isolation by himself with difficulty in accepting the disability', 'experience of frustration from social barriers with prejudice against persons with disabilities'. Also, it was identified that the resilience process could be dependent on the type and the degree of the disability, the gender, and the length of time being disability. In spite of the casual and contingent conditions, the central way in which persons with disabilities could acquire resilience was identified as 'enhancement of the power of positive thinking'. The control conditions which accelerate or retard central phenomenon were 'the awareness of not being alone through family, friends, neighborhood and the social system' externally and 'finding purpose in life through religion and help from other persons with disabilities', internally. The action/interactional sequences enhanced the efforts, self searching and active acting, and as a result, persons with disabilities could find comfort in life, participate in society and change the perspective of disability in society. The core categories of resilience process in persons with disabilities were a belief in affirmation and choice of life by initiative. In the process analysis, stages developed in the following: 'pain', 'strangeness', 'reflection', 'daily life'. This stage was more continuous and causal than discrete and complete. In this process, the types of resilience of persons with disabilities are divided into 'existence reflection', 'course development', 'implicit endeavor', and 'active execution'. This study showed the details of the paradigm models, the process and types with an in-depth understanding of the resilience process of persons with disabilities using grounded theory as well as theory construction and policy and clinical involvement on the study of persons with disabilities.

  • PDF

Evaluation of Maternal Behavior between Normal Parturition and Expected Cesarean Section in Rats (자연 분만 및 예정된 제왕절개 수술 랫드에 있어서 모성 행동의 차이에 대한 검토)

  • Lee, S.K.;Kang, H.G.;Kim, I.W.;Jeong, J.M.;Hwang, D.Y.;Kim, C.K.;Chae, K.R.;Cho, J.S.
    • Journal of Embryo Transfer
    • /
    • v.22 no.3
    • /
    • pp.161-165
    • /
    • 2007
  • Oxytocin is a neurohypophyseal hormone which has multiple functions in mammals. Mainly, oxytocin regulates milk ejection and has an effect on uterine contraction and is related to maternal behavior. Maternal behavior is believed to be suppressed by stress and facilitated by oxytocin. In the cesarean section, oxytocin may be administrated into uterus to promote uterine involution. The present study aimed to test the effect of oxytocin into uterus on maternal behavior of rats with cesarean section. It was measured the effects on maternal behavior of oxytocin infused into uterus in rats with cesarean section as a stressor. In the first experiment, pup survival rate of between a control group and a group with laparotomy as a stress in natural parturition rats was compared. In the second experiment, survival rate for 2 weeks and maternal pup searching behavior (MPSV) were observed in one cesarean sectioned group without oxytocin and the other cesarean sectioned group with oxytocin. Infanticide was observed in stressed group in the first experiment while a normal maternal behavior was observed in a control one. In the second experiment, MPSV was only observed in a cesarean sectioned group with oxytocin and infanticide was observed in two groups except one rat which is thought to be affected by oxytocin as operated relatively late. This is the first study to show that the administration of oxytocin into uterus in the cesarean section is not involved in the regulation of maternal behavior in rats. In conclusion, this study proves the needs of oxytocin into brain in cesarean section related rats model and further study of maternal behavior list, like MPSV.

Development and Evaluation of a Nutritional Risk Screening Tool (NRST) for Hospitalized Patients (입원환자의 영양불량위험 검색도구의 개발 및 평가)

  • Han, Jin-Soon;Lee, Song-Mi;Chung, Hye-Kyung;Ahn, Hong-Seok;Lee, Seung-Min
    • Journal of Nutrition and Health
    • /
    • v.42 no.2
    • /
    • pp.119-127
    • /
    • 2009
  • Malnutrition of hospitalized patients can adversely affect clinical outcomes and cost. Several nutritional screening tools have been developed to identify patients with malnutrition risk. However, many of those possess practical pitfalls of requiring much time and labor to administer and may not be highly applicable to a Korean population. This study sought to develop and evaluate a Nutrition Risk Screening Tool (NRST) which is simple and quick to administer and widely applicable to Korean hospitalized patients with various diseases. The study was also designed to generate a screening tool predictable of various clinical outcomes and to validate it against the Nutritional Risk Screening 2002 (NRS 2002). Electronic medical records of 424 patients hospitalized at a general hospital in Seoul during a 14-month period were abstracted for anthropometric, medical, biochemical, and clinical outcome variables. The study employed a 4-step process consisting of selecting NRST components, searching a scoring scheme, validating against a reference tool, and confirming clinical outcome predictability. NRST components were selected by stepwise multiple regression analysis of each clinical outcome (i.e., hospitalization period, complication, disease progress, and death) on several readily available patient characteristics. Age and serum levels of albumin, hematocrit (Hct), and total lymphocyte count (TLC) remained in the last model for any of 4 dependent variables were decided as NRST components. Odds ratios of malnutrition risk based on NRS 2002 according to levels of the selected components were utilized to frame a scoring scheme of NRST. A NRST score higher than 3.5 was set as a cut-off score for malnutrition risk based on sensitivity and specificity levels against NRS 2002. Lastly differences in clinical outcomes by patients' NRST results were examined. The results showed that the NRST can significantly predict the in-hospital clinical outcomes. It is concluded that the NRST can be useful to simply and quickly screen patients at high-nutritional risk in relation to prospective clinical outcomes.

Development of Music Recommendation System based on Customer Sentiment Analysis (소비자 감성 분석 기반의 음악 추천 알고리즘 개발)

  • Lee, Seung Jun;Seo, Bong-Goon;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.197-217
    • /
    • 2018
  • Music is one of the most creative act that can express human sentiment with sound. Also, since music invoke people's sentiment to get empathized with it easily, it can either encourage or discourage people's sentiment with music what they are listening. Thus, sentiment is the primary factor when it comes to searching or recommending music to people. Regard to the music recommendation system, there are still lack of recommendation systems that are based on customer sentiment. An algorithm's that were used in previous music recommendation systems are mostly user based, for example, user's play history and playlists etc. Based on play history or playlists between multiple users, distance between music were calculated refer to basic information such as genre, singer, beat etc. It can filter out similar music to the users as a recommendation system. However those methodology have limitations like filter bubble. For example, if user listen to rock music only, it would be hard to get hip-hop or R&B music which have similar sentiment as a recommendation. In this study, we have focused on sentiment of music itself, and finally developed methodology of defining new index for music recommendation system. Concretely, we are proposing "SWEMS" index and using this index, we also extracted "Sentiment Pattern" for each music which was used for this research. Using this "SWEMS" index and "Sentiment Pattern", we expect that it can be used for a variety of purposes not only the music recommendation system but also as an algorithm which used for buildup predicting model etc. In this study, we had to develop the music recommendation system based on emotional adjectives which people generally feel when they listening to music. For that reason, it was necessary to collect a large amount of emotional adjectives as we can. Emotional adjectives were collected via previous study which is related to them. Also more emotional adjectives has collected via social metrics and qualitative interview. Finally, we could collect 134 individual adjectives. Through several steps, the collected adjectives were selected as the final 60 adjectives. Based on the final adjectives, music survey has taken as each item to evaluated the sentiment of a song. Surveys were taken by expert panels who like to listen to music. During the survey, all survey questions were based on emotional adjectives, no other information were collected. The music which evaluated from the previous step is divided into popular and unpopular songs, and the most relevant variables were derived from the popularity of music. The derived variables were reclassified through factor analysis and assigned a weight to the adjectives which belongs to the factor. We define the extracted factors as "SWEMS" index, which describes sentiment score of music in numeric value. In this study, we attempted to apply Case Based Reasoning method to implement an algorithm. Compare to other methodology, we used Case Based Reasoning because it shows similar problem solving method as what human do. Using "SWEMS" index of each music, an algorithm will be implemented based on the Euclidean distance to recommend a song similar to the emotion value which given by the factor for each music. Also, using "SWEMS" index, we can also draw "Sentiment Pattern" for each song. In this study, we found that the song which gives a similar emotion shows similar "Sentiment Pattern" each other. Through "Sentiment Pattern", we could also suggest a new group of music, which is different from the previous format of genre. This research would help people to quantify qualitative data. Also the algorithms can be used to quantify the content itself, which would help users to search the similar content more quickly.

Numerical Study on Thermochemical Conversion of Non-Condensable Pyrolysis Gas of PP and PE Using 0D Reaction Model (0D 반응 모델을 활용한 PP와 PE의 비응축성 열분해 기체의 열화학적 전환에 대한 수치해석 연구)

  • Eunji Lee;Won Yang;Uendo Lee;Youngjae Lee
    • Clean Technology
    • /
    • v.30 no.1
    • /
    • pp.37-46
    • /
    • 2024
  • Environmental problems caused by plastic waste have been continuously growing around the world, and plastic waste is increasing even faster after COVID-19. In particular, PP and PE account for more than half of all plastic production, and the amount of waste from these two materials is at a serious level. As a result, researchers are searching for an alternative method to plastic recycling, and plastic pyrolysis is one such alternative. In this paper, a numerical study was conducted on the pyrolysis behavior of non-condensable gas to predict the chemical reaction behavior of the pyrolysis gas. Based on gas products estimated from preceding literature, the behavior of non-condensable gas was analyzed according to temperature and residence time. Numerical analysis showed that as the temperature and residence time increased, the production of H2 and heavy hydrocarbons increased through the conversion of the non-condensable gas, and at the same time, the CH4 and C6H6 species decreased by participating in the reaction. In addition, analysis of the production rate showed that the decomposition reaction of C2H4 was the dominant reaction for H2 generation. Also, it was found that more H2 was produced by PE with higher C2H4 contents. As a future work, an experiment is needed to confirm how to increase the conversion rate of H2 and carbon in plastics through the various operating conditions derived from this study's numerical analysis results.