• Title/Summary/Keyword: attribute tree

Search Result 105, Processing Time 0.024 seconds

A Program-Plagiarism Checker using Abstract Syntax Tree (구문트리 비고를 통한 프로그램 유형 복제 검사)

  • 김영철;김성근;염세훈;최종명;유재우
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.792-802
    • /
    • 2003
  • Earlier program plagiarism check systems are performed by using simple text, attribute or token string base on match techniques. They have difficulties in checking program styles which have nothing to do with program syntax such as indentation, spacing and comments. This paper introduces a plagiarism check model which compares syntax-trees for the given programs. By using syntax-trees, this system can overcome the weekness of filtering program styles and have advantage of comparing the structure of programs by syntax and semantic analysis. Our study introduces syntactic tree creation, unparsing and similarity check algorithms about C/C++ program plagiarism checking for internet cyber education and estimate plagiarism pattern.

Text Extraction and Summarization from Web News (웹 뉴스의 기사 추출과 요약)

  • Han, Kwang-Rok;Sun, Bok-Keun;Yoo, Hyoung-Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.1-10
    • /
    • 2007
  • Many types of information provided through the web including news contents contain unnecessary clutters. These clutters make it difficult to build automated information processing systems such as the summarization, extraction and retrieval of documents. We propose a system that extracts and summarizes news contents from the web. The extraction system receives news contents in HTML as input and builds an element tree similar to DOM tree, and extracts texts while removing clutters with the hyperlink attribute in the HTML tag from the element tree. Texts extracted through the extraction system are transferred to the summarization system, which extracts key sentences from the texts. We implement the summarization system using co-occurrence relation graph. The summarized sentences of this paper are expected to be transmissible to PDA or cellular phone by message services such as SMS.

  • PDF

Case-Based Reasoning Cost Estimation Model Using Two-Step Retrieval Method

  • Lee, Hyun-Soo;Seong, Ki-Hoon;Park, Moon-Seo;Ji, Sae-Hyun;Kim, Soo-Young
    • Land and Housing Review
    • /
    • v.1 no.1
    • /
    • pp.1-7
    • /
    • 2010
  • Case-based reasoning (CBR) method can make estimators understand the estimation process more clearly. Thus, CBR is widely used as a methodology for cost estimation. In CBR, the quality of case retrieval affects the relevance of retrieved cases and hence the overall quality of the reminding capability of CBR system. Thus, it is essential to retrieve relevant past cases for establishing a robust CBR system. Case retrieval needs the following tasks to obtain appropriate case(s); indexing, search, and matching (Aamodt and Plaza 1994). However, the previous CBR researches mostly deal with matching process that has limits such as accuracy and efficiency of case retrieval. In order to address this issue, this research presents a CBR cost model for building projects that has two-step retrieval process: decision tree and nearest neighbor methods. Specifically, the proposed cost model has indexing, search and matching modules. Features in the model are divided into shape-based and scale-based attributes. Based on these, decision tree is established for facilitating the search task and nearest neighbor method was utilized for matching task. In regard to applying nearest neighbor method, attribute weights are assigned using GA optimization and similarity is calculated using the principle of distance measuring. Thereafter, the proposed CBR cost model is developed using 174 cases and validated using 12 test cases.

Spanning Tree Aggregation Using Attribute of Service Boundary Line (서비스경계라인 속성을 이용한 스패닝 트리 집단화)

  • Kwon, So-Ra;Jeon, Chang-Ho
    • The KIPS Transactions:PartC
    • /
    • v.18C no.6
    • /
    • pp.441-444
    • /
    • 2011
  • In this study, we present a method for efficiently aggregating network state information. It is especially useful for aggregating links that have both delay and bandwidth in an asymmetric network. Proposed method reduces the information distortion of logical link by integration process after similar measure and grouping of logical links in multi-level topology transformation to reduce the space complexity. It is applied to transform the full mesh topology whose Service Boundary Line (SBL) serves as its logical link into a spanning tree topology. Simulation results show that aggregated information accuracy and query response accuracy are higher than that of other known method.

Data Mining Algorithm Based on Fuzzy Decision Tree for Pattern Classification (퍼지 결정트리를 이용한 패턴분류를 위한 데이터 마이닝 알고리즘)

  • Lee, Jung-Geun;Kim, Myeong-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1314-1323
    • /
    • 1999
  • 컴퓨터의 사용이 일반화됨에 따라 데이타를 생성하고 수집하는 것이 용이해졌다. 이에 따라 데이타로부터 자동적으로 유용한 지식을 얻는 기술이 필요하게 되었다. 데이타 마이닝에서 얻어진 지식은 정확성과 이해성을 충족해야 한다. 본 논문에서는 데이타 마이닝을 위하여 퍼지 결정트리에 기반한 효율적인 퍼지 규칙을 생성하는 알고리즘을 제안한다. 퍼지 결정트리는 ID3와 C4.5의 이해성과 퍼지이론의 추론과 표현력을 결합한 방법이다. 특히, 퍼지 규칙은 속성 축에 평행하게 판단 경계선을 결정하는 방법으로는 어려운 속성 축에 평행하지 않는 경계선을 갖는 패턴을 효율적으로 분류한다. 제안된 알고리즘은 첫째, 각 속성 데이타의 히스토그램 분석을 통해 적절한 소속함수를 생성한다. 둘째, 주어진 소속함수를 바탕으로 ID3와 C4.5와 유사한 방법으로 퍼지 결정트리를 생성한다. 또한, 유전자 알고리즘을 이용하여 소속함수를 조율한다. IRIS 데이타, Wisconsin breast cancer 데이타, credit screening 데이타 등 벤치마크 데이타들에 대한 실험 결과 제안된 방법이 C4.5 방법을 포함한 다른 방법보다 성능과 규칙의 이해성에서 보다 효율적임을 보인다.Abstract With an extended use of computers, we can easily generate and collect data. There is a need to acquire useful knowledge from data automatically. In data mining the acquired knowledge needs to be both accurate and comprehensible. In this paper, we propose an efficient fuzzy rule generation algorithm based on fuzzy decision tree for data mining. We combine the comprehensibility of rules generated based on decision tree such as ID3 and C4.5 and the expressive power of fuzzy sets. Particularly, fuzzy rules allow us to effectively classify patterns of non-axis-parallel decision boundaries, which are difficult to do using attribute-based classification methods.In our algorithm we first determine an appropriate set of membership functions for each attribute of data using histogram analysis. Given a set of membership functions then we construct a fuzzy decision tree in a similar way to that of ID3 and C4.5. We also apply genetic algorithm to tune the initial set of membership functions. We have experimented our algorithm with several benchmark data sets including the IRIS data, the Wisconsin breast cancer data, and the credit screening data. The experiment results show that our method is more efficient in performance and comprehensibility of rules compared with other methods including C4.5.

Application of Decision Tree for the Classification of Antimicrobial Peptide

  • Lee, Su Yeon;Kim, Sunkyu;Kim, Sukwon S.;Cha, Seon Jeong;Kwon, Young Keun;Moon, Byung-Ro;Lee, Byeong Jae
    • Genomics & Informatics
    • /
    • v.2 no.3
    • /
    • pp.121-125
    • /
    • 2004
  • The purpose of this study was to investigate the use of decision tree for the classification of antimicrobial peptides. The classification was based on the activities of known antimicrobial peptides against common microbes including Escherichia coli and Staphylococcus aureus. A feature selection was employed to select an effective subset of features from available attribute sets. Sequential applications of decision tree with 17 nodes with 9 leaves and 13 nodes with 7 leaves provided the classification rates of $76.74\%$ and $74.66\%$ against E. coli and S. aureus, respectively. Angle subtended by positively charged face and the positive charge commonly gave higher accuracies in both E. coli and S. aureusdatasets. In this study, we describe a successful application of decision tree that provides the understanding of the effects of physicochemical characteristics of peptides on bacterial membrane.

A Classification Algorithm using Extended Representation (확장된 표현을 이용하는 분류 알고리즘)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.2
    • /
    • pp.27-33
    • /
    • 2017
  • To efficiently provide cloud computing services to users over the Internet, IT resources must be configured in the data center based on virtualization and distributed computing technology. This paper focuses specifically on the problem that new training data can be added at any time in a wide range of fields, and new attributes can be added to training data at any time. In such a case, rule generated by the training data with the former attribute set can not be used. Moreover, the rule can not be combined with the new data set(with the newly added attributes). This paper proposes further development of the new inference engine that can handle the above case naturally. Rule generated from former data set can be combined with the new data set to form the refined rule.

Design and Performance Measurement of a Genetic Algorithm-based Group Classification Method : The Case of Bond Rating (유전 알고리듬 기반 집단분류기법의 개발과 성과평가 : 채권등급 평가를 중심으로)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.32 no.1
    • /
    • pp.61-75
    • /
    • 2007
  • The purpose of this paper is to develop a new group classification method based on genetic algorithm and to com-pare its prediction performance with those of existing methods in the area of bond rating. To serve this purpose, we conduct various experiments with pilot and general models. Specifically, we first conduct experiments employing two pilot models : the one searching for the cluster center of each group and the other one searching for both the cluster center and the attribute weights in order to maximize classification accuracy. The results from the pilot experiments show that the performance of the latter in terms of classification accuracy ratio is higher than that of the former which provides the rationale of searching for both the cluster center of each group and the attribute weights to improve classification accuracy. With this lesson in mind, we design two generalized models employing genetic algorithm : the one is to maximize the classification accuracy and the other one is to minimize the total misclassification cost. We compare the performance of these two models with those of existing statistical and artificial intelligent models such as MDA, ANN, and Decision Tree, and conclude that the genetic algorithm-based group classification method that we propose in this paper significantly outperforms the other methods in respect of classification accuracy ratio as well as misclassification cost.

a Study on Using Social Big Data for Expanding Analytical Knowledge - Domestic Big Data supply-demand expectation - (분석지의 확장을 위한 소셜 빅데이터 활용연구 - 국내 '빅데이터' 수요공급 예측 -)

  • Kim, Jung-Sun;Kwon, Eun-Ju;Song, Tae-Min
    • Knowledge Management Research
    • /
    • v.15 no.3
    • /
    • pp.169-188
    • /
    • 2014
  • Big data seems to change knowledge management system and method of enterprises to large extent. Further, the type of method for utilization of unstructured data including image, v ideo, sensor data a nd text may determine the decision on expansion of knowledge management of the enterprise or government. This paper, in this light, attempts to figure out the prediction model of demands and supply for big data market of Korea trough data mining decision making tree by utilizing text bit data generated for 3 years on web and SNS for expansion of form for knowledge management. The results indicate that the market focused on H/W and storage leading by the government is big data market of Korea. Further, the demanders of big data have been found to put important on attribute factors including interest, quickness and economics. Meanwhile, innovation and growth have been found to be the attribute factors onto which the supplier puts importance. The results of this research show that the factors affect acceptance of big data technology differ for supplier and demander. This article may provide basic method for study on expansion of analysis form of enterprise and connection with its management activities.

  • PDF

Accountable Attribute-based Encryption with Public Auditing and User Revocation in the Personal Health Record System

  • Zhang, Wei;Wu, Yi;Xiong, Hu;Qin, Zhiguang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.1
    • /
    • pp.302-322
    • /
    • 2021
  • In the system of ciphertext policy attribute-based encryption (CP-ABE), only when the attributes of data user meets the access structure established by the encrypter, the data user can perform decryption operation. So CP-ABE has been widely used in personal health record system (PHR). However, the problem of key abuse consists in the CP-ABE system. The semi-trusted authority or the authorized user to access the system may disclose the key because of personal interests, resulting in illegal users accessing the system. Consequently, aiming at two kinds of existing key abuse problems: (1) semi-trusted authority redistributes keys to unauthorized users, (2) authorized users disclose keys to unauthorized users, we put forward a CP-ABE scheme that has authority accountability, user traceability and supports arbitrary monotonous access structures. Specifically, we employ an auditor to make a fair ruling on the malicious behavior of users. Besides, to solve the problem of user leaving from the system, we use an indirect revocation method based on trust tree to implement user revocation. Compared with other existing schemes, we found that our solution achieved user revocation at an acceptable time cost. In addition, our scheme is proved to be fully secure in the standard model.