• 제목/요약/키워드: Similar Data

검색결과 9,187건 처리시간 0.031초

Similar Patent Search Service System using Latent Dirichlet Allocation (잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템)

  • Lim, HyunKeun;Kim, Jaeyoon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제22권8호
    • /
    • pp.1049-1054
    • /
    • 2018
  • Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.

A Study of Economical Sample Size for Reliability Test of One-Shot Device with Bayesian Techniques (베이지안 기법을 적용한 일회성 장비의 경제적 시험 수량 연구)

  • Lee, Youn Ho;Lee, Kye Shin;Lee, Hak Jae;Kim, Sang Moon;Moon, Ki Sung
    • Journal of Applied Reliability
    • /
    • 제14권3호
    • /
    • pp.162-168
    • /
    • 2014
  • This paper discusses the application of Bayesian techniques with test data on similar products for performing the Economical Reliability Test of new one-shot device. Using the test data on similar products, reliability test required lower sample size currently being spent in order to demonstrate a target reliability with a specified confidence level. Furthermore, lower sample size reduces cost, time and various resources on reliability test. In this paper, we use similarity as calculating weight of similar products and analyze similarity between new and similar product for comparison of the essential function.

Machine learning-based categorization of source terms for risk assessment of nuclear power plants

  • Jin, Kyungho;Cho, Jaehyun;Kim, Sung-yeop
    • Nuclear Engineering and Technology
    • /
    • 제54권9호
    • /
    • pp.3336-3346
    • /
    • 2022
  • In general, a number of severe accident scenarios derived from Level 2 probabilistic safety assessment (PSA) are typically grouped into several categories to efficiently evaluate their potential impacts on the public with the assumption that scenarios within the same group have similar source term characteristics. To date, however, grouping by similar source terms has been completely reliant on qualitative methods such as logical trees or expert judgements. Recently, an exhaustive simulation approach has been developed to provide quantitative information on the source terms of a large number of severe accident scenarios. With this motivation, this paper proposes a machine learning-based categorization method based on exhaustive simulation for grouping scenarios with similar accident consequences. The proposed method employs clustering with an autoencoder for grouping unlabeled scenarios after dimensionality reductions and feature extractions from the source term data. To validate the suggested method, source term data for 658 severe accident scenarios were used. Results confirmed that the proposed method successfully characterized the severe accident scenarios with similar behavior more precisely than the conventional grouping method.

Voice Similarities between Brothers

  • Ko, Do-Heung;Kang, Sun-Mee
    • Speech Sciences
    • /
    • 제9권2호
    • /
    • pp.1-11
    • /
    • 2002
  • This paper aims to provide a guideline for modelling speaker identification and speaker verification by comparing voice similarities between brothers. Five pairs of brothers who are believed to have similar voices participated in this experiment. Before conducted in the experiment, perceptual tests were measured if the voices were similar between brothers. The words were measured in both isolation and context, and the subjects were asked to read five times with about three seconds of interval between readings. Recordings were made at natural speed in a quiet room. The data were analyzed in pitch and formant frequencies using CSL (Computerized Speech Lab), PCQuirer and MDVP (Multi -dimensional Voice Program). It was found that data of the initial vowels are much more similar and homogeneous than those of vowels in other position. The acoustic data showed that voice similarities are strikingly high in both pitch and formant frequencies. It was also found that the correlation coefficient was not significant between parameters above.

  • PDF

Enhancing Similar Business Group Recommendation through Derivative Criteria and Web Crawling

  • Min Jeong LEE;In Seop NA
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권10호
    • /
    • pp.2809-2821
    • /
    • 2023
  • Effective recommendation of similar business groups is a critical factor in obtaining market information for companies. In this study, we propose a novel method for enhancing similar business group recommendation by incorporating derivative criteria and web crawling. We use employment announcements, employment incentives, and corporate vocational training information to derive additional criteria for similar business group selection. Web crawling is employed to collect data related to the derived criteria from 'credit jobs' and 'worknet' sites. We compare the efficiency of different datasets and machine learning methods, including XGBoost, LGBM, Adaboost, Linear Regression, K-NN, and SVM. The proposed model extracts derivatives that reflect the financial and scale characteristics of the company, which are then incorporated into a new set of recommendation criteria. Similar business groups are selected using a Euclidean distance-based model. Our experimental results show that the proposed method improves the accuracy of similar business group recommendation. Overall, this study demonstrates the potential of incorporating derivative criteria and web crawling to enhance similar business group recommendation and obtain market information more efficiently.

An Effective Data Model for Forecasting and Analyzing Securities Data

  • Lee, Seung Ho;Shin, Seung Jung
    • International journal of advanced smart convergence
    • /
    • 제5권4호
    • /
    • pp.32-39
    • /
    • 2016
  • Machine learning is a field of artificial intelligence (AI), and a technology that collects, forecasts, and analyzes securities data is developed upon machine learning. The difference between using machine learning and not using machine learning is that machine learning-seems similar to big data-studies and collects data by itself which big data cannot do. Machine learning can be utilized, for example, to recognize a certain pattern of an object and find a criminal or a vehicle used in a crime. To achieve similar intelligent tasks, data must be more effectively collected than before. In this paper, we propose a method of effectively collecting data.

VISUALIZATION OF 3D DATA PRESERVING CONVEXITY

  • Hussain Malik Zawwar;Hussain Maria
    • Journal of applied mathematics & informatics
    • /
    • 제23권1_2호
    • /
    • pp.397-410
    • /
    • 2007
  • Visualization of 2D and 3D data, which arises from some scientific phenomena, physical model or mathematical formula, in the form of curve or surface view is one of the important topics in Computer Graphics. The problem gets critically important when data possesses some inherent shape feature. For example, it may have positive feature in one instance and monotone in the other. This paper is concerned with the solution of similar problems when data has convex shape and its visualization is required to have similar inherent features to that of data. A rational cubic function [5] has been used for the review of visualization of 2D data. After that it has been generalized for the visualization of 3D data. Moreover, simple sufficient constraints are made on the free parameters in the description of rational bicubic functions to visualize the 3D convex data in the view of convex surfaces.

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching (집합 유사 시퀀스 매칭의 성능 향상을 위한 인덱스 기반 검색 방법)

  • Lee, Juwon;Lim, Hyo-Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제6권11호
    • /
    • pp.507-520
    • /
    • 2017
  • The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.

Comparison of the effects of irradiation on iso-molded, fine grain nuclear graphites: ETU-10, IG-110 and NBG-25

  • Chi, Se-Hwan
    • Nuclear Engineering and Technology
    • /
    • 제54권7호
    • /
    • pp.2359-2366
    • /
    • 2022
  • Selecting graphite grades with superior irradiation characteristics is important task for designers of graphite moderation reactors. To provide reference information and data for graphite selection, the effects of irradiation on three fine-grained, iso-molded nuclear grade graphites, ETU-10, IG-110, and NBG-25, were compared based on irradiation-induced changes in volume, thermal conductivity, dynamic Young's modulus, and coefficient of thermal expansion. Data employed in this study were obtained from reported irradiation test results in the high flux isotope reactor (HFIR)(ORNL) (ETU-10, IG-110) and high flux reactor (HFR)(NRL) (IG-110, NBG-25). Comparisons were made based on the irradiation dose and irradiation temperature. Overall, the three grades showed similar irradiation-induced property change behaviors, which followed the historic data. More or less grade-sensitive behaviors were observed for the changes in volume and thermal conductivity, and, in contrast, grade-insensitive behaviors were observed for dynamic Young's modulus and coefficient of thermal expansion changes. The ETU-10 of the smallest grain size appeared to show a relatively smaller VC to IG-110 and NBG-25. Drastic decrease in the difference in thermal conductivity was observed for ETU-10 and IG-110 after irradiation. The similar irradiation-induced properties changing behaviors observed in this study especially in the DYM and CTE may be attributed to the assumed similar microstructures that evolved from the similar size coke particles and the same forming method.

A Study on the Real-Time Preference Prediction for Personalized Recommendation on the Mobile Device (모바일 기기에서 개인화 추천을 위한 실시간 선호도 예측 방법에 대한 연구)

  • Lee, Hak Min;Um, Jong Seok
    • Journal of Korea Multimedia Society
    • /
    • 제20권2호
    • /
    • pp.336-343
    • /
    • 2017
  • We propose a real time personalized recommendation algorithm on the mobile device. We use a unified collaborative filtering with reduced data. We use Fuzzy C-means clustering to obtain the reduced data and Konohen SOM is applied to get initial values of the cluster centers. The proposed algorithm overcomes data sparsity since it extends data to the similar users and similar items. Also, it enables real time service on the mobile device since it reduces computing time by data clustering. Applying the suggested algorithm to the MovieLens data, we show that the suggested algorithm has reasonable performance in comparison with collaborative filtering. We developed Android-based smart-phone application, which recommends restaurants with coupons and restaurant information.