Search | Korea Science

신수용;장병탁
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.250-252
- /
- 2000
많은 최적화 문제에서 해답들의 구조는 서로 의존성을 가지고 있다. 이러한 경우 기존의 진화연산이 사용하는 빌딩 블록 개념으로는 문제를 해결하는데 많은 어려움을 겪게 된다. 이를 극복하기 위해서 헬름홀츠 머신(Helmholtz machine)을 이용해서 데이터의 분포를 예측한 후 최적화를 수행하는 방법을 제안한다. 기존의 진화 연산을 바탕으로 하지만 교차연산이나 돌연변이 연산을 사용하는 대신에, 헬름홀츠 머신을 이용해서 데이터의 분포를 파악하고, 이를 이용해서 새로운 데이터를 생성하는 과정을 통해 최적화 과정을 수행한다. 진화연산으로 해결하는데 곤란을 겪고 있는 여러 함수들을 해결하는 이를 검증하였다.
PDF

장정호;김유섭;장병탁
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04c
- /
- pp.440-442
- /
- 2003
문서 집합 내의 개념 또는 의미 관계의 자동 분석은 보다 효율적인 정보 획득과 단어수준 이상의 개념 수준에서의 운서 비교를 가능하게 한다. 본 논문에서는 은닉변수모델을 이용하여 문서 집합으로부터 단어들 간의 의미관계를 자동적으로 추출하고 이를 통해 문서간 유사도 측정을 효과적으로 하기 위한 방안을 제시한다. 은닉변수 모델로는 다중요인모델의 학습이 용이한 헬름홀츠 머신을 활용하묘 이의 학습 결과에 기반하여, 문서간 비교를 한 의미 커널(semantic kernel)을 구축한다. 2개의 문서 집합 HEDLINE과 CACM 데이터에 대한 검색 실험에서, 제안된 기법을 적응함으로써 기본 VSM(Vector Space Model) 에 비해 20% 이상의 평균 정확도 향상을 이를 수 있었다.
PDF

신수용;장병탁
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10b
- /
- pp.51-53
- /
- 2000
기존의 진화 연산의 한계를 극복하기 위해서 탐색점 분포 학습 알고리즘(Estimation of Distribution Algorithm)이 부각되고 있다. 탐색점 분포 학습 알고리즘은 데이터의 분포를 파악하고, 파악된 분포를 이용해서 새로운 학습 데이터를 생성하는 일련의 과정을 통하여 최적화 문제를 해결하는 방법이다. 그런데, 기존의 탐색점 분포 학습 알고리즘들은 대부분 이진 벡터값을 가지는 최적화 문제들만을 대상으로 하고 있다. 본 논문에서는 비감독 확률 신경망 모델인 헬름홀츠 머신을 이용해서 데이터의 분포를 학습하여 연속 함수 최적화 문제를 해결하는 방법을 개발하였다. 테스트 함수들에 대해서 실수 표현형을 사용한 유전자 알고리즘과 결과를 비교하여 제안하는 방법의 우수성을 검증하였다.
PDF

장정호;장병탁
- Journal of KIISE:Software and Applications
- /
- v.31 no.5
- /
- pp.595-604
- /
- 2004
Automatic analysis of concepts or semantic relations from text documents enables not only an efficient acquisition of relevant information, but also a comparison of documents in the concept level. We present a multiple cause model-based approach to text analysis, where latent topics are automatically extracted from document sets and similarity between documents is measured by semantic kernels constructed from the extracted topics. In our approach, a document is assumed to be generated by various combinations of underlying topics. A topic is defined by a set of words that are related to the same topic or cooccur frequently within a document. In a network representing a multiple-cause model, each topic is identified by a group of words having high connection weights from a latent node. In order to facilitate teaming and inferences in multiple-cause models, some approximation methods are required and we utilize an approximation by Helmholtz machines. In an experiment on TDT-2 data set, we extract sets of meaningful words where each set contains some theme-specific terms. Using semantic kernels constructed from latent topics extracted by multiple cause models, we also achieve significant improvements over the basic vector space model in terms of retrieval effectiveness.
PDF KSCI