• Title/Summary/Keyword: Rule-based Algorithm

Search Result 795, Processing Time 0.026 seconds

Context Prediction Using Right and Wrong Patterns to Improve Sequential Matching Performance for More Accurate Dynamic Context-Aware Recommendation (보다 정확한 동적 상황인식 추천을 위해 정확 및 오류 패턴을 활용하여 순차적 매칭 성능이 개선된 상황 예측 방법)

  • Kwon, Oh-Byung
    • Asia pacific journal of information systems
    • /
    • v.19 no.3
    • /
    • pp.51-67
    • /
    • 2009
  • Developing an agile recommender system for nomadic users has been regarded as a promising application in mobile and ubiquitous settings. To increase the quality of personalized recommendation in terms of accuracy and elapsed time, estimating future context of the user in a correct way is highly crucial. Traditionally, time series analysis and Makovian process have been adopted for such forecasting. However, these methods are not adequate in predicting context data, only because most of context data are represented as nominal scale. To resolve these limitations, the alignment-prediction algorithm has been suggested for context prediction, especially for future context from the low-level context. Recently, an ontological approach has been proposed for guided context prediction without context history. However, due to variety of context information, acquiring sufficient context prediction knowledge a priori is not easy in most of service domains. Hence, the purpose of this paper is to propose a novel context prediction methodology, which does not require a priori knowledge, and to increase accuracy and decrease elapsed time for service response. To do so, we have newly developed pattern-based context prediction approach. First of ail, a set of individual rules is derived from each context attribute using context history. Then a pattern consisted of results from reasoning individual rules, is developed for pattern learning. If at least one context property matches, say R, then regard the pattern as right. If the pattern is new, add right pattern, set the value of mismatched properties = 0, freq = 1 and w(R, 1). Otherwise, increase the frequency of the matched right pattern by 1 and then set w(R,freq). After finishing training, if the frequency is greater than a threshold value, then save the right pattern in knowledge base. On the other hand, if at least one context property matches, say W, then regard the pattern as wrong. If the pattern is new, modify the result into wrong answer, add right pattern, and set frequency to 1 and w(W, 1). Or, increase the matched wrong pattern's frequency by 1 and then set w(W, freq). After finishing training, if the frequency value is greater than a threshold level, then save the wrong pattern on the knowledge basis. Then, context prediction is performed with combinatorial rules as follows: first, identify current context. Second, find matched patterns from right patterns. If there is no pattern matched, then find a matching pattern from wrong patterns. If a matching pattern is not found, then choose one context property whose predictability is higher than that of any other properties. To show the feasibility of the methodology proposed in this paper, we collected actual context history from the travelers who had visited the largest amusement park in Korea. As a result, 400 context records were collected in 2009. Then we randomly selected 70% of the records as training data. The rest were selected as testing data. To examine the performance of the methodology, prediction accuracy and elapsed time were chosen as measures. We compared the performance with case-based reasoning and voting methods. Through a simulation test, we conclude that our methodology is clearly better than CBR and voting methods in terms of accuracy and elapsed time. This shows that the methodology is relatively valid and scalable. As a second round of the experiment, we compared a full model to a partial model. A full model indicates that right and wrong patterns are used for reasoning the future context. On the other hand, a partial model means that the reasoning is performed only with right patterns, which is generally adopted in the legacy alignment-prediction method. It turned out that a full model is better than a partial model in terms of the accuracy while partial model is better when considering elapsed time. As a last experiment, we took into our consideration potential privacy problems that might arise among the users. To mediate such concern, we excluded such context properties as date of tour and user profiles such as gender and age. The outcome shows that preserving privacy is endurable. Contributions of this paper are as follows: First, academically, we have improved sequential matching methods to predict accuracy and service time by considering individual rules of each context property and learning from wrong patterns. Second, the proposed method is found to be quite effective for privacy preserving applications, which are frequently required by B2C context-aware services; the privacy preserving system applying the proposed method successfully can also decrease elapsed time. Hence, the method is very practical in establishing privacy preserving context-aware services. Our future research issues taking into account some limitations in this paper can be summarized as follows. First, user acceptance or usability will be tested with actual users in order to prove the value of the prototype system. Second, we will apply the proposed method to more general application domains as this paper focused on tourism in amusement park.

Analysis of Genetics Problem-Solving Processes of High School Students with Different Learning Approaches (학습접근방식에 따른 고등학생들의 유전 문제 해결 과정 분석)

  • Lee, Shinyoung;Byun, Taejin
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.4
    • /
    • pp.385-398
    • /
    • 2020
  • This study aims to examine genetics problem-solving processes of high school students with different learning approaches. Two second graders in high school participated in a task that required solving the complicated pedigree problem. The participants had similar academic achievements in life science but one had a deep learning approach while the other had a surface learning approach. In order to analyze in depth the students' problem-solving processes, each student's problem-solving process was video-recorded, and each student conducted a think-aloud interview after solving the problem. Although students showed similar errors at the first trial in solving the problem, they showed different problem-solving process at the last trial. Student A who had a deep learning approach voluntarily solved the problem three times and demonstrated correct conceptual framing to the three constraints using rule-based reasoning in the last trial. Student A monitored the consistency between the data and her own pedigree, and reflected the problem-solving process in the check phase of the last trial in solving the problem. Student A's problem-solving process in the third trial resembled a successful problem-solving algorithm. However, student B who had a surface learning approach, involuntarily repeated solving the problem twice, and focused and used only part of the data due to her goal-oriented attitude to solve the problem in seeking for answers. Student B showed incorrect conceptual framing by memory-bank or arbitrary reasoning, and maintained her incorrect conceptual framing to the constraints in two problem-solving processes. These findings can help in understanding the problem-solving processes of students who have different learning approaches, allowing teachers to better support students with difficulties in accessing genetics problems.

OD matrix estimation using link use proportion sample data as additional information (표본링크이용비를 추가정보로 이용한 OD 행렬 추정)

  • 백승걸;김현명;신동호
    • Journal of Korean Society of Transportation
    • /
    • v.20 no.4
    • /
    • pp.83-93
    • /
    • 2002
  • To improve the performance of estimation, the research that uses additional information addition to traffic count and target OD with additional survey cost have been studied. The purpose of this paper is to improve the performance of OD estimation by reducing the feasible solutions with cost-efficiently additional information addition to traffic counts and target OD. For this purpose, we Propose the OD estimation method with sample link use proportion as additional information. That is, we obtain the relationship between OD trip and link flow from sample link use proportion that is high reliable information with roadside survey, not from the traffic assignment of target OD. Therefore, this paper proposes OD estimation algorithm in which the conservation of link flow rule under the path-based non-equilibrium traffic assignment concept. Numerical result with test network shows that it is possible to improve the performance of OD estimation where the precision of additional data is low, since sample link use Proportion represented the information showing the relationship between OD trip and link flow. And this method shows the robust performance of estimation where traffic count or OD trip be changed, since this method did not largely affected by the error of target OD and the one of traffic count. In addition to, we also propose that we must set the level of data precision by considering the level of other information precision, because "precision problem between information" is generated when we use additional information like sample link use proportion etc. And we Propose that the method using traffic count as basic information must obtain the link flow to certain level in order to high the applicability of additional information. Finally, we propose that additional information on link have a optimal counting location problem. Expecially by Precision of information side it is possible that optimal survey location problem of sample link use proportion have a much impact on the performance of OD estimation rather than optimal counting location problem of link flow.

The Behavior Analysis of Exhibition Visitors using Data Mining Technique at the KIDS & EDU EXPO for Children (유아교육 박람회에서 데이터마이닝 기법을 이용한 전시 관람 행동 패턴 분석)

  • Jung, Min-Kyu;Kim, Hyea-Kyeong;Choi, Il-Young;Lee, Kyoung-Jun;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.77-96
    • /
    • 2011
  • An exhibition is defined as market events for specific duration to present exhibitors' main products to business or private visitors, and it plays a key role as effective marketing channels. As the importance of exhibition is getting more and more, domestic exhibition industry has achieved such a great quantitative growth. But, In contrast to the quantitative growth of domestic exhibition industry, the qualitative growth of Exhibition has not achieved competent growth. In order to improve the quality of exhibition, we need to understand the preference or behavior characteristics of visitors and to increase the level of visitors' attention and satisfaction through the understanding of visitors. So, in this paper, we used the observation survey method which is a kind of field research to understand visitors and collect the real data for the analysis of behavior pattern. And this research proposed the following methodology framework consisting of three steps. First step is to select a suitable exhibition to apply for our method. Second step is to implement the observation survey method. And we collect the real data for further analysis. In this paper, we conducted the observation survey method to obtain the real data of the KIDS & EDU EXPO for Children in SETEC. Our methodology was conducted on 160 visitors and 78 booths from November 4th to 6th in 2010. And, the last step is to analyze the record data through observation. In this step, we analyze the feature of exhibition using Demographic Characteristics collected by observation survey method at first. And then we analyze the individual booth features by the records of visited booth. Through the analysis of individual booth features, we can figure out what kind of events attract the attention of visitors and what kind of marketing activities affect the behavior pattern of visitors. But, since previous research considered only individual features influenced by exhibition, the research about the correlation among features is not performed much. So, in this research, additional analysis is carried out to supplement the existing research with data mining techniques. And we analyze the relation among booths using data mining techniques to know behavior patterns of visitors. Among data mining techniques, we make use of two data mining techniques, such as clustering analysis and ARM(Association Rule Mining) analysis. In clustering analysis, we use K-means algorithm to figure out the correlation among booths. Through data mining techniques, we figure out that there are two important features to affect visitors' behavior patterns in exhibition. One is the geographical features of booths. The other is the exhibit contents of booths. Those features are considered when the organizer of exhibition plans next exhibition. Therefore, the results of our analysis are expected to provide guideline to understanding visitors and some valuable insights for the exhibition from the earlier phases of exhibition planning. Also, this research would be a good way to increase the quality of visitor satisfaction. Visitors' movement paths, booth location, and distances between each booth are considered to plan next exhibition in advance. This research was conducted at the KIDS & EDU EXPO for Children in SETEC(Seoul Trade Exhibition & Convention), but it has some constraints to be applied directly to other exhibitions. Also, the results were derived from a limited number of data samples. In order to obtain more accurate and reliable results, it is necessary to conduct more experiments based on larger data samples and exhibitions on a variety of genres.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.