통합 검색 | Korea Science

XMDR을 이용한 데이터웨어하우스 실시간 데이터 정제 시스템 설계 (Design of DatawareHouse Real-Time Cleansing System using XMDR)

송홍율;정계동;최영근
- 한국정보통신학회논문지
- /
- 제14권8호
- /
- pp.1861-1867
- /
- 2010
데이터웨어하우스는 기업에서 의사결정이나 기업의 정책을 결정하는데 사용하고 있다. 그러나 분산 환경에서 새로운 시스템이 추가되면 데이터 통합 측면에서 시스템간의 여러 가지 이질적인 특성으로 인해 많은 비용과 시간이 필요로 하게 된다. 따라서 이러한 이질적인 특성을 해결하기 위해 첫째, 데이터 구조의 이질성은 표준기관에서 제정한 표준스키마와 XMDR(eXtended Master Data Registry)를 이용하여 추상화된 쿼리를 생성하고, XMDR에 맞게 쿼리를 분리함으로써 구조적인 이질성을 해결한다. 둘째, 데이터 정의 및 표현의 이질성은 메타데이터에 대한 유사어와 데이터 값의 표현 방식을 정의한 메타데이터 사전을 이용함으로써 해결한다. 특히 본 논문에서는 XMDR을 이용하여 분산 시스템 통합시 로컬시스템의 영향을 최소화하고, 데이터웨어하우스의 정보를 실시간으로 생성하기 위해 분산된 환경에서 데이터 통합을 위한 표준화된 정보를 제공한다.
https://doi.org/10.6109/jkiice.2010.14.8.1861 인용 PDF KSCI

유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안 (Semantic Process Retrieval with Similarity Algorithms)

이홍주
- Asia pacific journal of information systems
- /
- 제18권1호
- /
- pp.79-96
- /
- 2008
One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.
PDF KSCI

검색결과 72건 처리시간 0.017초

XMDR을 이용한 데이터웨어하우스 실시간 데이터 정제 시스템 설계 (Design of DatawareHouse Real-Time Cleansing System using XMDR)

유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안 (Semantic Process Retrieval with Similarity Algorithms)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)