Extraction of similar XML data based on XML structure and processing unit

Park, Jong-Hyun;

doi:10.9708/jksci.2017.22.04.059

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

제22권4호
/
Pages.59-65
/
2017
/
1598-849X(pISSN)
/
2383-9945(eISSN)

한국컴퓨터정보학회 (Korean Society of Computer Information)

DOI QR Code

Extraction of similar XML data based on XML structure and processing unit

Park, Jong-Hyun (Dept. of Computer Engineering&Science, Chungnam National University)

투고 : 2017.03.14
심사 : 2017.04.14
발행 : 2017.04.28

https://doi.org/10.9708/jksci.2017.22.04.059 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

XML has established itself as the format for data exchange on the internet and the volume of its instance is large scale. Therefore, to extract similar information from XML instance is one of research topics but is insufficient. In this paper, we extract similar information from various kind of XML instances according to the same goal. Also we use only the structure information of XML instance for information extraction because some of XML instance is described without its schema. In order to efficiently extract similar information, we propose a minimum unit of processing and two approaches for finding the unit. The one is a structure-based method which uses only the structure information of XML instance and another is a measure-based method which finds a unit by numerical formula. Our two approaches can be applied to any application that needs the extraction of similar information based on XML data. Also the approach can be used for HTML instance.

키워드

참고문헌

Y. Yamada, N. Craswell, T. Nakatoh, and S. Hirokawa, "Testbed for information extraction from deep web," Proc, of WWW 2004, pp. 346-347, 2004.
S. Hirokawa, E. Itoh, and T. Miyahara, "Semi-Automatic Construction of Metadata from a Series of Web Documents," Proc. Australian Conference on Artificial Intelligence 2003, pp. 942-953, 2003.
T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, S. Hirokawa, K. Takahashi, and H. Ueda, "Extraction of Tag Tree Patterns with Contractible Variables from Irregular Semistructured Data," Proc, of PAKDD 2003, pp. 430-436, 2003.
P. T. Nguyen, and H. A. Le, "Finding Similar Artists from the Web of Data: A PageRank Based Semantic Similarity Metric," Proc. of FDSE 2015, pp. 98-108, 2015.
E. Iosif, and A. Potamianos, "Similarity computation using semantic networks created from web-harvested data," Natural Language Engineering Vol.21, No. 1, pp. 49-79, 2015. https://doi.org/10.1017/S1351324913000144
H. P. Leung, K. F. Chung and S. C. Chan, "A New Sequential Mining Approach to XML Document Similarity Computation," Proc, of PAKDD 2003, pp. 356-362, 2003.
J. Tekli, R. Chbeir, A. J. M. Traina, C. Traina Jr., and R. Fileto, "Approximate XML structure validation based on document-grammar tree similarity," Information Sciences, Volume 295, pp. 258-302, 2015. https://doi.org/10.1016/j.ins.2014.09.044
R. Periakaruppan, and R. Nadarajan, "A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents," Proc. Of OTM Workshops 2013, pp. 639-648, 2013.
C.Y. Wang, X.J. Wu, J. Li, and Y. Ge, "Structural Similarity Evaluation of XML Documents Based on Basic Statistics," Proc. of WISM 2012, pp. 698-705, 2012.
S. Flesca, G. Manco, E. Masciari, L. Pontieri and A. Pugliese, "Detecting Structural Similarities between XML Documents," Proc. of WebDB 2002, pp. 55-60, 2002.
J. Manuel, A. Jimenez, and A. Cuzzocrea, "SemSynX: Flexible Similarity Analysis of XML Data via Semantic and Syntactic Heterogeneity/Homogeneity Detection," Proc. of HAIS 2016, pp. 12-26, 2016.
K.Y. Lee, "A Study on the Development of Ontology based on the Jewelry Brand Information," Journal of the Korea Society of Computer and Information, Vol. 13, No. 7, pp. 247-256, 2008.
S. Akmala, L.H. Shihb, and R. Batres, "Ontology-based similarity for product information retrieval," Computers in Industry, Vol. 65, No. 1, pp.91-107, 2014. https://doi.org/10.1016/j.compind.2013.07.011

한국컴퓨터정보학회논문지 (Journal of the Korea Society of Computer and Information)

Extraction of similar XML data based on XML structure and processing unit

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)