DOI QR코드

DOI QR Code

속성 변동 최소화에 의한 러프집합 누락 패턴 부합

Missing Pattern Matching of Rough Set Based on Attribute Variations Minimization in Rough Set

  • 이영천 (호남대학교 컴퓨터공학과)
  • 투고 : 2015.05.12
  • 심사 : 2015.06.23
  • 발행 : 2015.06.30

초록

러프집합에서 누락된 속성 값들은 Reduct와 Core 계산, 더 나아가서 결정 트리 구축에 있어서 식별 불능의 패턴 부합 문제를 가진다. 현재 누락된 속성 값들의 추정과 관련하여 보편적인 속성 값으로의 대체, 속성들의 모든 가능한 값 할당, 이벤트 포장 방법, C4.5, 특수한 LEM2 알고리즘과 같은 접근방식들이 적용되고 있다. 그렇지만, 이들 접근방식은 결국 전형적으로 자주 등장하는 속성 값 혹은 가장 보편적인 속성 값으로의 단순 대체를 나타내기 때문에, 주요 속성 값들이 누락된 경우에 정보 손실이 큰 의사 결정 규칙들이 유도되기 때문에 의사결정 규칙들의 교차 검증에서 문제가 된다. 본 연구에서는 이러한 문제점을 개선시키기 위해 속성들간에 엔트로피 변동을 활용하여 정보 이득이 높은 방향으로 누락된 속성 값들을 대체하는 방식을 제안한다. 제안된 접근방식에 관한 타당성 검토는 비교적 가까운 유사 관계에 의해 누락 값 대체 방식을 적용하는 ROSE 프로그램과의 비교를 나타낸다.

In Rough set, attribute missing values have several problems such as reduct and core estimation. Further, they do not give some discernable pattern for decision tree construction. Now, there are several methods such as substitutions of typical attribute values, assignment of every possible value, event covering, C4.5 and special LEMS algorithm. However, they are mainly substitutions into frequently appearing values or common attribute ones. Thus, decision rules with high information loss are derived in case that important attribute values are missing in pattern matching. In particular, there is difficult to implement cross validation of the decision rules. In this paper we suggest new method for substituting the missing attribute values into high information gain by using entropy variation among given attributes, and thereby completing the information table. The suggested method is validated by conducting the same rough set analysis on the incomplete information system using the software ROSE.

키워드

참고문헌

  1. J. Bazan, M. Szczuka, and A. Wojna, "On the evolution of rough set exploration system," Proc. of the Rough Sets and Current Trends in Computing, Uppsala, Sweden, June, 2004, pp. 592 -601.
  2. G. Claeskens and N. L. Hjort, Model Selection and Model averaging. England Cambridge: Cambridge University Press. 2004.
  3. V. Dubois and M. Quafafou, "Concept learning with approximation: Rough version spaces," Rough Sets and Current Trends in Computing: Proc. of the 3-rd Int. Conf., Rough Sets and Current Trends in Computing, Malvern, PA, Oct. 2002, pp. 239-246.
  4. J. W. Grzymala-Busse, "Rough set strategies to data with missing attribute values," Proc. of the Workshop on Foundations and New Directions of Data Mining, the 3-rd Int. Conf. on Data Mining, Melbourne, FL, USA, Nov 19-22. 2003, pp. 56-63.
  5. J. W. Grzymala-Busse and M. Hu, "A comparison of several approaches to missing attribute values in data mining," Proc. of the Second Int. Conf. on Rough Sets and Current Trends in Computing Rough Sets and Current Trends in Computing, Banff, Canada, Oct. 2000, pp. 340-347.
  6. Z. Pawlak and A. Skowron, "Rudiments of rough sets," Information Sciences, 1 January 2007. vol. 177 no. 1, pp. 3-27. https://doi.org/10.1016/j.ins.2006.06.003
  7. J. Stefanowski and A. Tsoukias, " Incomplete information tables and rough classification," Computational Intelligence, vol. 17 no. 3, August 2001. pp. 545-566. https://doi.org/10.1111/0824-7935.00162
  8. J. T. Yao and Y. Y. Yao, "Induction of classification rules by granular computing," Proc. of the Third Int. Conf. on Rough Sets and Current Trends in Computing (TSCTC'02). London, UK, Sept. 2002, pp. 331-338.
  9. D. Cha, K. Ban and E. Kim,"Schema Mapping Method using Frequent Pattern Mining," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, no. 1, 2013, pp.96-99.
  10. S. Cho, "A Fuzzy-based Fusion Wireless Localization Method"J. of the Korea Institute of Electronic Communication Sciences, vol. 10, no.4, 2015, pp.508-510.