• Title/Summary/Keyword: similar problem retrieval

Search Result 49, Processing Time 0.027 seconds

Customized Configuration with Template and Options (맞춤구성을 위한 템플릿과 Option 기반의 추론)

  • 이현정;이재규
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.1
    • /
    • pp.119-139
    • /
    • 2002
  • In electronic catalogs, each item is represented as an independent unit while the parts of the item can be composed of a higher level of functionality. Thus, the search for this kind of product database is limited to the retrieval of most similar standard commodities. However, many industrial products need to configure optional parts to fulfill the required specifications. Since there are many paths in finding the required specifications, we need to develop a search system via the configuration process. In this system, we adopt a two-phased approach. The first phase finds the most similar template, and the second phase adjusts the template specifications toward the required set of specifications by the Constraint and Rule Satisfaction Problem approach. There is no guarantee that the most similar template can find the most desirable configuration. The search system needs backtracking capability, so the search can stop at a satisfied local optimal satisfaction. This framework is applied to the configuration of computers and peripherals. Template-based reasoning is basically the same as case-based reasoning. The required set of specifications is represented by a list of criteria, and matched with the product specifications to find the closest ones. To measure the distance, we develop a thesaurus of values, which can identify the meaning of numbers, symbols, and words. With this configuration, the performance of the search by configuration algorithm is evaluated in terms of feasibility and admissibility.

  • PDF

Efficient Subsequence Searching in Sequence Databases : A Segment-based Approach (시퀀스 데이터베이스를 위한 서브시퀀스 탐색 : 세그먼트 기반 접근 방안)

  • Park, Sang-Hyun;Kim, Sang-Wook;Loh, Woong-Kee
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.344-356
    • /
    • 2001
  • This paper deals with the subsequence searching problem under time-warping in sequence databases. Our work is motivated by the observation that subsequence searches slow down quadratically as the average length of data sequences increases. To resolve this problem, the Segment-Based Approach for Subsequence Searches (SBSS) is proposed. The SBASS divides data and query sequences into a series of segments, and retrieves all data subsequences that satisfy the two conditions: (1) the number of segments is the same as the number of segments in a query sequence, and (2) the distance of every segment pair is less than or equal to a tolerance. Our segmentation scheme allows segments to have different lengths; thus we employ the time warping distance as a similarity measure for each segment pair. For efficient retrieval of similar subsequences, we extract feature vectors from all data segments exploiting their monotonically changing properties, and build a spatial index using feature vectors. Using this index, queries are processed with the four steps: (1) R-tree filtering, (2) feature filtering, (3) successor filtering, and (4) post-processing. The effectiveness of our approach is verified through extensive experiments.

  • PDF

A Study of PLC Simulation for Automobile Panel AS/RS (자동차 패널 자동창고 시스템의 PLC 시뮬레이션 적용 연구)

  • Ko, Min-Suk;Koo, Lock-Jo;Kwak, Jong-Geun;Hong, Sang-Hyun;Wang, Gi-Nam;Park, Sang-Chul
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.3
    • /
    • pp.1-11
    • /
    • 2009
  • This paper illustrates a case study of PLC logic simulation in a car manufacturing system. It was developed to simulate and verify PLC control program for automobile panel AS/RS. Because car models become varied, the complexity of supply problem is increasing in the car manufacturing system. To cope with this problem, companies use the AS (automated storage) and RS (retrieval system) but it has logical complexity. Industrial automated process uses PLC code to control the AS/RS, however control information and control codes (PLC code) are difficult to understand. This paper suggests a PLC simulation environment, using 3D models and PLC code with realistic data. Data used in this simulation is based on realistic 3D model and I/O model, using actual size and PLC signals, respectively. The environment is similar to a real factory; users can verify and test the PLC code using this simulation before the implementation of AS/RS. Proposed simulation environment can be used for test run of AS/RS to reduce implementation time and cost.

An Efficient Frequent Melody Indexing Method to Improve Performance of Query-By-Humming System (허밍 질의 처리 시스템의 성능 향상을 위한 효율적인 빈번 멜로디 인덱싱 방법)

  • You, Jin-Hee;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.283-303
    • /
    • 2007
  • Recently, the study of efficient way to store and retrieve enormous music data is becoming the one of important issues in the multimedia database. Most general method of MIR (Music Information Retrieval) includes a text-based approach using text information to search a desired music. However, if users did not remember the keyword about the music, it can not give them correct answers. Moreover, since these types of systems are implemented only for exact matching between the query and music data, it can not mine any information on similar music data. Thus, these systems are inappropriate to achieve similarity matching of music data. In order to solve the problem, we propose an Efficient Query-By-Humming System (EQBHS) with a content-based indexing method that efficiently retrieve and store music when a user inquires with his incorrect humming. For the purpose of accelerating query processing in EQBHS, we design indices for significant melodies, which are 1) frequent melodies occurring many times in a single music, on the assumption that users are to hum what they can easily remember and 2) melodies partitioned by rests. In addition, we propose an error tolerated mapping method from a note to a character to make searching efficient, and the frequent melody extraction algorithm. We verified the assumption for frequent melodies by making up questions and compared the performance of the proposed EQBHS with N-gram by executing various experiments with a number of music data.

Generative AI service implementation using LLM application architecture: based on RAG model and LangChain framework (LLM 애플리케이션 아키텍처를 활용한 생성형 AI 서비스 구현: RAG모델과 LangChain 프레임워크 기반)

  • Cheonsu Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.129-164
    • /
    • 2023
  • In a situation where the use and introduction of Large Language Models (LLMs) is expanding due to recent developments in generative AI technology, it is difficult to find actual application cases or implementation methods for the use of internal company data in existing studies. Accordingly, this study presents a method of implementing generative AI services using the LLM application architecture using the most widely used LangChain framework. To this end, we reviewed various ways to overcome the problem of lack of information, focusing on the use of LLM, and presented specific solutions. To this end, we analyze methods of fine-tuning or direct use of document information and look in detail at the main steps of information storage and retrieval methods using the retrieval augmented generation (RAG) model to solve these problems. In particular, similar context recommendation and Question-Answering (QA) systems were utilized as a method to store and search information in a vector store using the RAG model. In addition, the specific operation method, major implementation steps and cases, including implementation source and user interface were presented to enhance understanding of generative AI technology. This has meaning and value in enabling LLM to be actively utilized in implementing services within companies.

A Study on Improvement of Collaborative Filtering Based on Implicit User Feedback Using RFM Multidimensional Analysis (RFM 다차원 분석 기법을 활용한 암시적 사용자 피드백 기반 협업 필터링 개선 연구)

  • Lee, Jae-Seong;Kim, Jaeyoung;Kang, Byeongwook
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.139-161
    • /
    • 2019
  • The utilization of the e-commerce market has become a common life style in today. It has become important part to know where and how to make reasonable purchases of good quality products for customers. This change in purchase psychology tends to make it difficult for customers to make purchasing decisions in vast amounts of information. In this case, the recommendation system has the effect of reducing the cost of information retrieval and improving the satisfaction by analyzing the purchasing behavior of the customer. Amazon and Netflix are considered to be the well-known examples of sales marketing using the recommendation system. In the case of Amazon, 60% of the recommendation is made by purchasing goods, and 35% of the sales increase was achieved. Netflix, on the other hand, found that 75% of movie recommendations were made using services. This personalization technique is considered to be one of the key strategies for one-to-one marketing that can be useful in online markets where salespeople do not exist. Recommendation techniques that are mainly used in recommendation systems today include collaborative filtering and content-based filtering. Furthermore, hybrid techniques and association rules that use these techniques in combination are also being used in various fields. Of these, collaborative filtering recommendation techniques are the most popular today. Collaborative filtering is a method of recommending products preferred by neighbors who have similar preferences or purchasing behavior, based on the assumption that users who have exhibited similar tendencies in purchasing or evaluating products in the past will have a similar tendency to other products. However, most of the existed systems are recommended only within the same category of products such as books and movies. This is because the recommendation system estimates the purchase satisfaction about new item which have never been bought yet using customer's purchase rating points of a similar commodity based on the transaction data. In addition, there is a problem about the reliability of purchase ratings used in the recommendation system. Reliability of customer purchase ratings is causing serious problems. In particular, 'Compensatory Review' refers to the intentional manipulation of a customer purchase rating by a company intervention. In fact, Amazon has been hard-pressed for these "compassionate reviews" since 2016 and has worked hard to reduce false information and increase credibility. The survey showed that the average rating for products with 'Compensated Review' was higher than those without 'Compensation Review'. And it turns out that 'Compensatory Review' is about 12 times less likely to give the lowest rating, and about 4 times less likely to leave a critical opinion. As such, customer purchase ratings are full of various noises. This problem is directly related to the performance of recommendation systems aimed at maximizing profits by attracting highly satisfied customers in most e-commerce transactions. In this study, we propose the possibility of using new indicators that can objectively substitute existing customer 's purchase ratings by using RFM multi-dimensional analysis technique to solve a series of problems. RFM multi-dimensional analysis technique is the most widely used analytical method in customer relationship management marketing(CRM), and is a data analysis method for selecting customers who are likely to purchase goods. As a result of verifying the actual purchase history data using the relevant index, the accuracy was as high as about 55%. This is a result of recommending a total of 4,386 different types of products that have never been bought before, thus the verification result means relatively high accuracy and utilization value. And this study suggests the possibility of general recommendation system that can be applied to various offline product data. If additional data is acquired in the future, the accuracy of the proposed recommendation system can be improved.

PIRS : Personalized Information Retrieval System using Adaptive User Profiling and Real-time Filtering for Search Results (적응형 사용자 프로파일기법과 검색 결과에 대한 실시간 필터링을 이용한 개인화 정보검색 시스템)

  • Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.21-41
    • /
    • 2010
  • This paper proposes a system that can serve users with appropriate search results through real time filtering, and implemented adaptive user profiling based personalized information retrieval system(PIRS) using users' implicit feedbacks in order to deal with the problem of existing search systems such as Google or MSN that does not satisfy various user' personal search needs. One of the reasons that existing search systems hard to satisfy various user' personal needs is that it is not easy to recognize users' search intentions because of the uncertainty of search intentions. The uncertainty of search intentions means that users may want to different search results using the same query. For example, when a user inputs "java" query, the user may want to be retrieved "java" results as a computer programming language, a coffee of java, or a island of Indonesia. In other words, this uncertainty is due to ambiguity of search queries. Moreover, if the number of the used words for a query is fewer, this uncertainty will be more increased. Real-time filtering for search results returns only those results that belong to user-selected domain for a given query. Although it looks similar to a general directory search, it is different in that the search is executed for all web documents rather than sites, and each document in the search results is classified into the given domain in real time. By applying information filtering using real time directory classifying technology for search results to personalization, the number of delivering results to users is effectively decreased, and the satisfaction for the results is improved. In this paper, a user preference profile has a hierarchical structure, and consists of domains, used queries, and selected documents. Because the hierarchy structure of user preference profile can apply the context when users perfomed search, the structure is able to deal with the uncertainty of user intentions, when search is carried out, the intention may differ according to the context such as time or place for the same query. Furthermore, this structure is able to more effectively track web documents search behaviors of a user for each domain, and timely recognize the changes of user intentions. An IP address of each device was used to identify each user, and the user preference profile is continuously updated based on the observed user behaviors for search results. Also, we measured user satisfaction for search results by observing the user behaviors for the selected search result. Our proposed system automatically recognizes user preferences by using implicit feedbacks from users such as staying time on the selected search result and the exit condition from the page, and dynamically updates their preferences. Whenever search is performed by a user, our system finds the user preference profile for the given IP address, and if the file is not exist then a new user preference profile is created in the server, otherwise the file is updated with the transmitted information. If the file is not exist in the server, the system provides Google' results to users, and the reflection value is increased/decreased whenever user search. We carried out some experiments to evaluate the performance of adaptive user preference profile technique and real time filtering, and the results are satisfactory. According to our experimental results, participants are satisfied with average 4.7 documents in the top 10 search list by using adaptive user preference profile technique with real time filtering, and this result shows that our method outperforms Google's by 23.2%.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.