Search | Korea Science

Automatic Information Extraction for Structured Web Documents (구조화된 웹 문서에 대한 자동 정보추출)

Yun, Bo-Hyun
- Journal of Internet Computing and Services
- /
- v.6 no.3
- /
- pp.129-145
- /
- 2005
This paper proposes the web information extraction system that extracts the pre-defined information automatically from web documents (i.e, HTML documents) and integrates the extracted information, The system recognizes entities without lables by the probabilistic based entity recognition method and extends the existing domain knowledge semiautomatically by using the extracted data, Moreover, the system extracts the sub-linked information linked to the basic page and integrates the similar results extracted from heterogeneous sources, The experimental result shows that the system extracts the sub-linked information and uses the probabilistic based entity recognition enhances the precision significantly against the system using only the domain knowledge, Moreover, the presented system can the more various information precisely due to applying the system with flexibleness according to domains, Because bath the semiautomatic domain knowledge expansion and the probabilistic based entity recognition improve the quality of the information, the system can increase the degree of user satisfaction at its maximum. Thus, this system can satisfy the intellectual curiosity of users from movie sites, performance sites, and dining room sites, We can construct various comparison shopping mall and contribute the revitalization of e-business.
PDF

Design and Implementation of Customer Information Retrieval System based on Semantic Web (시맨틱 웹 기반의 고객 정보 검색 시스템의 설계 및 구현)

Hwang Jeong-Hee;Gu Mi-Sug;Lee Hyun-Ah;Ryu Keun-Ho
- The KIPS Transactions:PartD
- /
- v.13D no.4 s.107
- /
- pp.525-534
- /
- 2006
Ontology specifies the knowledge in a specific domain and defines the concepts of knowledge and the relationships between concepts. It is possible to provide the service based on the semantic web through the ontology. Therefore, to specify and define the knowledge in a specific domain, it is required to generate the ontology which conceptualizes the knowledge. Accordingly, to search the information of potential customers for home-delivery marketing of post office, we design the specific domain to generate the ontology based on the semantic web in this paper. And we propose how to retrieve the information, using the generated ontology. We implement the data search robot which collects the information based on the generated ontology. Also, we confirm that the ontology and the search robot perform the information retrieval exactly.
https://doi.org/10.3745/KIPSTD.2006.13D.4.525 인용 PDF KSCI

Explanation-Based Data Mining in Data Warehouse (데이터웨어하우스 환경에서의 설명기반 데이터마이닝)

김현수;이창호
- Journal of Intelligence and Information Systems
- /
- v.5 no.2
- /
- pp.15-27
- /
- 1999
산업계 전반에 걸친 오랜 정보시스템 운용의 결과로 대용량의 데이터들이 축적되고 있다. 이러한 데이터로부터 유용한 지식을 추출하기 위해 여러 가지 데이터마이닝 기법들이 연구되어 왔다. 특히 데이터웨어하우스의 등장은 이러한 데이터마이닝에 있어 필요한 데이터 제공 환경을 주고 있다. 그러나 전문가의 적절한 판단과 해석을 거치지 않은 데이터마이닝의 결과는 당연한 사실이거나, 사실과 다른 가짜이거나 또한 관련성 없는(Trivial, Spurious and Irrelevant) 내용만 무수히 쏟아낼 수 있다. 그러므로 데이터마이닝의 결과가 비록 통계적 유의성을 가진다 하더라고 그 정당성과 유용성에 대한 검증과정과 방법론의 정립이 필요하다. 데이터마이닝의 가장 어려운 점은 귀납적 오류를 없애기 위해 사람이 직접 그 결과를 해석하고 판단하며 아울러 새로운 탐색 방향을 제시해야 한다는 것이다. 본 논문의 목적인 이러한 데이터마이닝에서 추출된 결과를 검증하고 아울러 새로운 지식 탐색 방향을 제시하는 방법론을 정립하는데 있다. 본 논문에서는 데이터마이닝 기법 중 연관규칙탐사(Associations)로 얻어진 결과를 설명가능성 여부의 판단을 통해 검증하는 기법을 제안하였고, 이를 위해 도메인 지식(Domain Knowledge)과 연관규칙탐사를 통해 얻어진 결과를 표현하기 위한 지식표현방법으로 관계형 술어논리(RPL : Relational Predicate Logic)를 개발하였다. 연관규칙탐사로 얻어진 결과를 설명하기 위한 방법으로는 연관규칙탐사로 얻어진 연관규칙에 대한 RPL로 표현된 도메인 지식으로서 설명됨을 보이게 한다. 또한 이러한 설명(Explanation)을 토대로 검증된 지식을 일반화하여 새로운 가설을 연역적으로 생성하고 이를 연관규칙탐사를 통해 검증한 후 새로운 지식을 얻는 설명기반 데이터마이닝 구조(Explanation-based Data Mining Architecture)를 제시하였다.
PDF

A Knowledge-based Wrapper Learning Agent for Semi-Structured Information Sources (준구조화된 정보소스에 대한 지식기반의 Wrapper 학습 에이전트)

Seo, Hee-Kyoung;Yang, Jae-Young;Choi, Joong-Min
- Journal of KIISE:Software and Applications
- /
- v.29 no.1_2
- /
- pp.42-52
- /
- 2002
Information extraction(IE) is a process of recognizing and fetching particular information fragments from a document. In previous work, most IE systems generate the extraction rules called the wrappers manually, and although this manual wrapper generation may achieve more correct extraction, it reveals some problems in flexibility, extensibility, and efficiency. Some other researches that employ automatic ways of generating wrappers are also experiencing difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources, and as a result, the real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents an agent-based information extraction system named XTROS that exploits the domain knowledge to learn from documents in a semi-structured information source. This system generates a wrapper for each information source automatically and performs information extraction and information integration by applying this wrapper to the corresponding source. In XTROS, both the domain knowledge and the wrapper are represented as XML-type documents. The wrapper generation algorithm first recognizes the meaning of each logical line of a sample document by using the domain knowledge, and then finds the most frequent pattern from the sequence of semantic representations of the logical lines. Eventually, the location and the structure of this pattern represented by an XML document becomes the wrapper. By testing XTROS on several real-estate information sites, we claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction and integration for heterogeneous and complex information sources.
PDF KSCI

An System Model Construction from the Ontology Model Using the Domain Model (도메인 모델을 이용한 온톨로지 모델로부터 시스템 모델 생성)

Nam, Swoong-Hwan;Lim, Jae-Hyun;Kim, Chi-Su
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.05a
- /
- pp.237-240
- /
- 2007
지식을 재사용하는 온톨로지 모델은 재사용 수준을 높여줄 수 있는 지식 모델이라 할 수 있다. 본 논문에서는 소프트웨어의 재사용 보다 지식을 재사용하기 위해 개발프로세스에서 지식과 소프트웨어모델 사이에 관련성 있는 매핑을 만들고자 한다. 또한 UML을 온톨로지 모델 언어로 사용하여 UML 기반 온톨로지 모델로부터 시스템 모델을 추출하기위해 온톨로지 도메인 시스템 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2007m05a.237 인용 PDF

A Study of Automatic Extraction of Domain Specified Dictionary (병렬 말뭉치를 이용한 도메인 특화 사전 자동 추출 연구)

Park, Eun-Jin;Hwang, Kum-Ha;Kim, Young-Gil
- Annual Conference on Human and Language Technology
- /
- 2009.10a
- /
- pp.237-241
- /
- 2009
본 논문에서는 도메인별 병렬 말뭉치를 이용하여 해당 도메인에 특화된 한영 대역쌍을 Moses Toolkit을 이용하여 자동 추출하였다. 이렇게 추출된 대역쌍은 도메인 특화 자동 번역 시스템의 번역 사전으로 사용하기에는 많은 오류가 포함되어 있기 때문에, 본 논문에서는 이를 효율적으로 제거할 수 있는 식을 제안하였다. 본 논문에서 제안한 식으로 오류를 제거한 결과, 임계값 0.5를 기준으로 추출된 한영 대역쌍이 1,098개였고, 이는 실험에 사용한 기업 분야 병렬 말뭉치 42,200문장 중에서 29,292문장(69.4%)에 영향을 주었다. 자동으로 추출한 도메인 특화 번역 지식을 기존 자동 번역 시스템의 번역 지식에 적용한 결과 BLEU가 0.0054 향상되었다.
PDF

Rertieval-Augmented Generation for Korean Open-domain Question Answering (RAG를 이용한 한국어 오픈 도메인 질의 응답)

Daewook Kang;Seung-Hoon Na;Tae-Hyeong Kim;Hwi-Jung Ryu;Du-Seong Chang
- Annual Conference on Human and Language Technology
- /
- 2022.10a
- /
- pp.105-108
- /
- 2022
오픈 도메인 질의 응답은 사전학습 언어모델의 파라미터에 저장되는 정보만을 사용하여 답하는 질의 응답 방식과 달리 대량의 문서 등에서 질의에 대한 정답을 찾는 문제이다. 최근 등장한 Dense Retrieval은 BERT 등의 모델을 사용해 질의와 문서들의 벡터 연산으로 질의와 문서간의 유사도를 판별하여 문서를 검색한다. 이러한 Dense Retrieval을 활용하는 방안 중 RAG는 Dense Retrieval을 이용한 외부 지식과 인코더-디코더 모델에 내재된 지식을 결합하여 성능을 향상시킨다. 본 논문에서는 RAG를 한국어 오픈 도메인 질의 응답 데이터에 적용하여 베이스라인에 비해 일부 향상된 성능을 보임을 확인하였다.
PDF

CAPP 지원을 위한 사례베이스의 구조화

김진백;김유일
- Proceedings of the Korea Association of Information Systems Conference
- /
- 1997.10a
- /
- pp.149-164
- /
- 1997
사례기반형 추론(CBR)은 과거의 경험을 이용해서 문제를 해결하려는 방법으로서 규칙기반형 추론(RBR)과 달리 문제해결경험이 풍부한 도메인에 적합한 방법이다. CBR은 정적인 측면에서 사례의 표현과 구조화문제가 중요시되며, 동적인 측면에서는 사례의 검색 절차와 수정이라는 해결안 생산과정이 중요시된다. 본 논문은 정적 측면에서 효과적인 CAPP 지원을 위해 사례베이스(CB)를 계층적으로 구조화하였다. 또한 CB의 구조화시 시스 템의 문제해결 능력을 향상시켜주기 위하여 CB를 응용도메인 종속적 CB(DDCB)와 독립적 CB(DICB)로 분리하여 과거의 문제해결 경험에 관한 지식은 DDCB에 나타내었으며, 도메인 전문가가 가지는 일반적인 문제해결 지식은 DICB에 나타내었다.
PDF

An Extraction of Property of Ontology Instance Using Stratification of Domain Knowledge (도메인지식의 계층화를 통한 온톨로지 인스턴스의 속성정보 추출)

Chang, Moon-Soo;Kang, Sun-Mee
- Journal of the Korean Institute of Intelligent Systems
- /
- v.17 no.3
- /
- pp.291-296
- /
- 2007
The ontology has been used widely in recent years with its aim to accumulate knowledge that machine can comprehend. We believe that machine can manage and analyze information on its own using the ontology. In this paper, we propose an algorithm that allows us to extract properties of ontology instances from structured information already existing in web documents. In particular, by stratification of the domain knowledge that is composed of property information, we were able to make the algorithm better and improve the quality of extraction results. In our experiments with 20 thousands targeted documents, we were able to extract property information with 83% confidence.
https://doi.org/10.5391/JKIIS.2007.17.3.291 인용 PDF KSCI

Intelligent Agent with Fuzzy Ontology (퍼지 온톨로지를 이용한 지능형 에이전트)

박종민;양형정;양재동
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10d
- /
- pp.376-378
- /
- 2002
현재 전자상거래 시스템에서 도메인에 대한 전문적 지식이 없는 사용자는 원하는 상품을 찾기 어렵다. 또한, 다양한 전자상거래 시스템간의 공통적인 상품 정보에 대한 표준이 부족하므로 원하는 상품을 찾기 위해선 많은 시간과 노력이 필요하다. 이를 위해 본 논문에서는 시맨틱 웹 기반의 온톨로지 언어를 확장하여 퍼지 상품 지식베이스를 구축하고 지능적 질의 처리가 가능한 지능형 에이전트를 제안한다. 상품 지식베이스를 퍼지 온톨로지를 통해 구축함으로서 도메인에 대한 전문적인 지식이 없는 사용자를 지원하고, 서로 다른 시스템간에 표준적인 상품 지식으로서 질의 처리에 사용될 수 있다.
PDF

Search Result 292, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)