• Title/Summary/Keyword: similarity weight

Search Result 376, Processing Time 0.079 seconds

Measuring Similarity of Android Applications Using Method Reference Frequency and Manifest Information (메소드 참조 빈도와 매니페스트 정보를 이용한 안드로이드 애플리케이션들의 유사도 측정)

  • Kim, Gyoosik;Hamedani, Masoud Reyhani;Cho, Seong-je;Kim, Seong Baeg
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.15-25
    • /
    • 2017
  • As the value and importance of softwares are growing up, software theft and piracy become a much larger problem. To tackle this problem, it is highly required to provide an accurate method for detecting software theft and piracy. Especially, while software theft is relatively easy in the case of Android applications (apps), screening illegal apps has not been properly performed in Android markets. In this paper, we propose a method to effectively measure the similarity between Android apps for detecting software theft at the executable file level. Our proposed method extracts method reference frequency and manifest information through static analysis of executable Android apps as the main features for similarity measurement. Each app is represented as an n-dimensional vectors with the features, and then cosine similarity is utilized as the similarity measure. We demonstrate the effectiveness of our proposed method by evaluating its accuracy in comparison with typical source code-based similarity measurement methods. As a result of the experiments for the Android apps whose source file and executable file are available side by side, we found that our similarity degree measured at the executable file level is almost equivalent to the existing well-known similarity degree measured at the source file level.

Recommendation System using Associative Web Document Classification by Word Frequency and α-Cut (단어 빈도와 α-cut에 의한 연관 웹문서 분류를 이용한 추천 시스템)

  • Jung, Kyung-Yong;Ha, Won-Shik
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.282-289
    • /
    • 2008
  • Although there were some technological developments in improving the collaborative filtering, they have yet to fully reflect the actual relation of the items. In this paper, we propose the recommendation system using associative web document classification by word frequency and ${\alpha}$-cut to address the short comings of the collaborative filtering. The proposed method extracts words from web documents through the morpheme analysis and accumulates the weight of term frequency. It makes associative rules and applies the weight of term frequency to its confidence by using Apriori algorithm. And it calculates the similarity among the words using the hypergraph partition. Lastly, it classifies related web document by using ${\alpha}$-cut and calculates similarity by using adjusted cosine similarity. The results show that the proposed method significantly outperforms the existing methods.

A new method for automatic areal feature matching based on shape similarity using CRITIC method (CRITIC 방법을 이용한 형상유사도 기반의 면 객체 자동매칭 방법)

  • Kim, Ji-Young;Huh, Yong;Kim, Doe-Sung;Yu, Ki-Yun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.2
    • /
    • pp.113-121
    • /
    • 2011
  • In this paper, we proposed the method automatically to match areal feature based on similarity using spatial information. For this, we extracted candidate matching pairs intersected between two different spatial datasets, and then measured a shape similarity, which is calculated by an weight sum method of each matching criterion automatically derived from CRITIC method. In this time, matching pairs were selected when similarity is more than a threshold determined by outliers detection of adjusted boxplot from training data. After applying this method to two distinct spatial datasets: a digital topographic map and street-name address base map, we conformed that buildings were matched, that shape is similar and a large area is overlaid in visual evaluation, and F-Measure is highly 0.932 in statistical evaluation.

Implementation of Image Retrieval System using Complex Image Features (복합적인 영상 특성을 이용한 영상 검색 시스템 구현)

  • 송석진;남기곤
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.8
    • /
    • pp.1358-1364
    • /
    • 2002
  • Presently, Multimedia data are increasing suddenly in broadcasting and internet fields. For retrieval of still images in multimedia database, content-based image retrieval system is implemented in this paper that user can retrieve similar objects from image database after choosing a wanted query region of object. As to extract color features from query image, we transform color to HSV with proposed method that similarity is obtained it through histogram intersection with database images after making histogram. Also, query image is transformed to gray image and induced to wavelet transformation by which spatial gray distribution and texture features are extracted using banded autocorrelogram and GLCM before having similarity values. And final similarity values is determined by adding two similarity values. In that, weight value is applied to each similarity value. We make up for defects by taking color image features but also gray image features from query image. Elevations of recall and precision are verified in experiment results.

Development of Personalized Learning Course Recommendation Model for ITS (ITS를 위한 개인화 학습코스 추천 모델 개발)

  • Han, Ji-Won;Jo, Jae-Choon;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.21-28
    • /
    • 2018
  • To help users who are experiencing difficulties finding the right learning course corresponding to their level of proficiency, we developed a recommendation model for personalized learning course for Intelligence Tutoring System(ITS). The Personalized Learning Course Recommendation model for ITS analyzes the learner profile and extracts the keyword by calculating the weight of each word. The similarity of vector between extracted words is measured through the cosine similarity method. Finally, the three courses of top similarity are recommended for learners. To analyze the effects of the recommendation model, we applied the recommendation model to the Women's ability development center. And mean, standard deviation, skewness, and kurtosis values of question items were calculated through the satisfaction survey. The results of the experiment showed high satisfaction levels in accuracy, novelty, self-reference and usefulness, which proved the effectiveness of the recommendation model. This study is meaningful in the sense that it suggested a learner-centered recommendation system based on machine learning, which has not been researched enough both in domestic, foreign domains.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Extraction of Motor Modules by Autoencoder to Identify Trained Motor Control Ability

  • LEE, Jae-Hyuk
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.5 no.2
    • /
    • pp.15-19
    • /
    • 2022
  • Purpose: This pilot study aimed to clarify features of motor module during walking in exercise experts who experienced lately repeated training for sports skill. To identify motor modules, autoencoder machine learning algorithm was used, and modules were extracted from muscle activities of lower extremities. Research design, data and methodology: A total of 10 university students were participated. 5 students did not experience any sports training before, and 5 students did experience sports training more than 5 years. Eight muscle activities of dominant lower extremity were measured. After modules were extracted by autoencoder, the numbers of modules and spatial muscle weight values were compared between two groups. Results: There was no significant difference in the minimal number of motor modules that explain more than 90% of original data between groups. However, in similarity analysis, three motor modules were shown high similarity (r>0.8) while one module was shown low similarity (r<0.5). Conclusions: This study found not only common motor modules between exercise novice and expert during walking, but also found that a specific motor module, which would be associated with high motor control ability to distinguish the level of motor performance in the field of sports.

A Study on Prescription Similarity Analysis for Efficiency Improvement (처방 유사도 분석의 효율성 향상에 관한 연구)

  • Hwang, SuKyung;Woo, DongHyeon;Kim, KiWook;Lee, ByungWook
    • Journal of Korean Medical classics
    • /
    • v.35 no.4
    • /
    • pp.1-9
    • /
    • 2022
  • Objectives : This study aims to increase efficiency of the prescription similarity analysis method that uses drug composition ratio. Methods : The controlled experiment compared result generation time, generated data quantity, and accuracy of results between previous and new analysis method on the 12,598 formulas and 61 prescription groups. Results : The control group took 346 seconds on average and generated 768,478 results, while the test group took 24 seconds and generated 241,739 results. The test group adopted a selective calculation method that only used overlapping data between two formulas instead of analyzing all number of cases. It simplified the data processing process, reducing the quantity of data that is required to be processed, leading to better system speed, as fast as 14.47 times more than previous analysis method with equal results. Conclusions : Efficiency for similarity analysis could be improved by reducing data span and simplifying the calculation processes.

A Study on the Research Model for the Standardization of Software-Similarity-Appraisal Techniques (소프트웨어 복제도 감정기법의 표준화 모델에 관한 연구)

  • Bahng, Hyo-Keun;Cha, Tae-Own;Chung, Tai-Myoung
    • The KIPS Transactions:PartD
    • /
    • v.13D no.6 s.109
    • /
    • pp.823-832
    • /
    • 2006
  • The Purpose of Similarity(Reproduction) Degree Appraisal is to determine the equality or similarity between two programs and it is a system that presents the technical grounds of judgment which is necessary to support the resolution of software intellectual property rights through expert eyes. The most important things in proceeding software appraisal are not to make too much of expert's own subjective judgment and to acquire the accurate-appraisal results. However, up to now standard research and development for its systematic techniques are not properly made out and as different expert as each one could approach in a thousand different ways, even the techniques for software appraisal types have not exactly been presented yet. Moreover, in the analyzing results of all the appraisal cases finished before, through a practical way, we blow that there are some damages on objectivity and accuracy in some parts of the appraisal results owing to the problems of existing appraisal procedures and techniques or lack of expert's professional knowledge. In this paper we present the model for the standardization of software-similarity-appraisal techniques and objective-evaluation methods for decreasing a tolerance that could make different results according to each expert in the same-evaluation points. Especially, it analyzes and evaluates the techniques from various points of view concerning the standard appraisal process, setting a range of appraisal, setting appraisal domains and items in detail, based on unit processes, setting the weight of each object to be appraised, and the degree of logical and physical similarity, based on effective solutions to practical problems of existing appraisal techniques and their objective and quantitative standardization. Consequently, we believe that the model for the standardization of software-similarity-appraisal techniques will minimizes the possibility of mistakes due to an expert's subjective judgment as well as it will offer a tool for improving objectivity and reliability of the appraisal results.

A Study of Economical Sample Size for Reliability Test of One-Shot Device with Bayesian Techniques (베이지안 기법을 적용한 일회성 장비의 경제적 시험 수량 연구)

  • Lee, Youn Ho;Lee, Kye Shin;Lee, Hak Jae;Kim, Sang Moon;Moon, Ki Sung
    • Journal of Applied Reliability
    • /
    • v.14 no.3
    • /
    • pp.162-168
    • /
    • 2014
  • This paper discusses the application of Bayesian techniques with test data on similar products for performing the Economical Reliability Test of new one-shot device. Using the test data on similar products, reliability test required lower sample size currently being spent in order to demonstrate a target reliability with a specified confidence level. Furthermore, lower sample size reduces cost, time and various resources on reliability test. In this paper, we use similarity as calculating weight of similar products and analyze similarity between new and similar product for comparison of the essential function.