• Title/Summary/Keyword: 소프트웨어 저장소 마이닝

Search Result 5, Processing Time 0.02 seconds

Designing a Repository Independent Model for Mining and Analyzing Heterogeneous Bug Tracking Systems (다형의 버그 추적 시스템 마이닝 및 분석을 위한 저장소 독립 모델 설계)

  • Lee, Jae-Kwon;Jung, Woo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.103-115
    • /
    • 2014
  • In this paper, we propose UniBAS(Unified Bug Analysis System) to provide a unified repository model by integrating the extracted data from the heterogeneous bug tracking systems. The UniBAS reduces the cost and complexity of the MSR(Mining Software Repositories) research process and enables the researchers to focus on their logics rather than the tedious and repeated works such as extracting repositories, processing data and building analysis models. Additionally, the system not only extracts the data but also automatically generates database tables, views and stored procedures which are required for the researchers to perform query-based analysis easily. It can also generate various types of exported files for utilizing external analysis tools or managing research data. A case study of detecting duplicate bug reports from the Firfox project of the Mozilla site has been performed based on the UniBAS in order to evaluate the usefulness of the system. The results of the experiments with various algorithms of natural language processing and flexible querying to the automatically extracted data also showed the effectiveness of the proposed system.

An Empirical Study on Frequently used Python APIs in AI-Related Open Source Python Software Projects (인공지능과 관련된 오픈 소스 파이썬 소프트웨어 프로젝트에서 자주 사용되는 파이썬 API들에 대한 연구)

  • Jungil Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.19-22
    • /
    • 2024
  • 전통 소프트웨어 프로젝트 개발과 AI 관련된 소프트웨어 프로젝트 개발에 큰 차이가 있어서 AI 관련된 소프트웨어 프로젝트 개발 환경을 이해하려는 많은 노력이 있었지만 AI 관련 소프트웨어 프로젝트 개발에서 어떤 API들이 자주 사용되는지에 대해서 아직 충분히 조사되지 않았다. 본 논문에서는 "AI 관련 오픈 소스 소프트웨어 프로젝트에서 어떤 파이썬 API들이 자주 사용되는가?"에 대한 연구 질문의 해답을 알아보는 경험 연구를 소개한다. 이 경험 연구의 결과로 AI 관련 오픈 소스 소프트웨어 프로젝트에서 파이썬 표준 라이브러리와 관려된 API들이 가장 자주 사용된다는 것을 확인했다. 또한 기계 학습을 포함해서 데이터 처리, 이미지 처리, 테스팅, 웹 서비스와 관련된 라이브러리들에 있는 API들도 AI 관련 오픈 소스 소프트웨어 프로젝트들에 자주 사용된다는 것을 알아냈다.

  • PDF

Memory Improvement Method for Extraction of Frequent Patterns in DataBase (데이터베이스에서 빈발패턴의 추출을 위한 메모리 향상기법)

  • Park, In-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.2
    • /
    • pp.127-133
    • /
    • 2019
  • Since frequent item extraction so far requires searching for patterns and traversal for the FP-Tree, it is more likely to store the mining data in a tree and thus CPU time is required for its searching. In order to overcome these drawbacks, in this paper, we provide each item with its location identification of transaction data without relying on conditional FP-Tree and convert transaction data into 2-dimensional position information look-up table, resulting in the facilitation of time and spatial accessibility. We propose an algorithm that considers the mapping scheme between the location of items and items that guarantees the linear time complexity. Experimental results show that the proposed method can reduce many execution time and memory usage based on the data set obtained from the FIMI repository website.

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History (개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법)

  • Park, Seong Hun;Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.511-522
    • /
    • 2014
  • During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.

A Technique to Detect Change-Coupled Files Using the Similarity of Change Types and Commit Time (변경 유형의 유사도 및 커밋 시간을 이용한 파일 변경 결합도)

  • Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.65-72
    • /
    • 2014
  • Change coupling is a measure to show how strongly change-related two entities are. When two source files have been frequently changed together, they are regarded as change-coupled files and they will probably be changed together in the near future. In the previous studies, the change coupling between two files is defined with the number of common changed time, that is, common commit time of the files. However, the frequency-based technique has limitations because of 'tangled changes', which frequently happens in the development environments with version control systems. The tangled change means that several code hunks have been changed at the same time, though they have no relation with each other. In this paper, the change types of the code hunks are also used to define change coupling, in addition to the common commit time of target files. First, the frequency vector based on change types are defined with the extracted change types, and then, the similarity of change patterns are calculated using the cosine similarity measure. We conducted experiments on open source project Eclipse JDT and CDT for case studies. The result shows that the applicability of the proposed method, compared to the previous studies.