Search | Korea Science

An Efficient Method for Logical Structure Analysis of HTML Tables (HTML 테이블의 논리적 구조분석을 위한 효율적인 방법)

Kim Yeon-Seok;Lee Kyong-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.06b
- /
- pp.334-336
- /
- 2006
본 논문에서는 웹 문서로부터 정보를 추출찰기 위한 목적의 일환으로 HTML 테이블의 논리적인 구조를 추출하여 XML 문서로 변환하는 효율적인 방법을 제안한다. 제안된 방법은 영역구문과 구조분석의 두 단계로 구성된다. 영역구분 단계에서는 테이블의 잡음영역을 제거하고 정규화한 후 시각적 및 의미적 일관성 검사를 통하여 테이블에 존재하는 속성 및 값 영역을 구분한다. 또한 구조분석 단계에서는 구분된 영역에 제안된 테이블 모델을 적용하여 계층구조를 추출하며, 이로부터 XML 문서를 생성한다. 제안된 영역구분 방법의 성능을 평가하기 위하여 1,180개의 테이블을 대상으로 실험한 결과, 평균적으로 86.7%의 정확률을 보여 기존 연구보다 우수하였다.
PDF

Analysis of High Dimensional Data using Low Dimensional Summary Tables (저차원 집계 테이블들을 사용한 고차원 데이터의 온라인 분석)

Choi, Hae-Jung;Kim, Myung
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10c
- /
- pp.16-18
- /
- 2002
다차원 데이터를 온라인으로 분석하기 위해서는 사전에 집계 테이블들을 계산해 둔다. 대용량 고차원 데이터의 경우는 집계 테이블의 분량이 천문학적으로 방대하기 때문에 사전 집계 계산이 현실적으로 불가능한 경우가 많다. 고차원 데이터 처리에 관한 연구로는 데이터의 차원 수를 감소시키거나 인덱스를 압축하여 질의처리 시간을 단축하려는 연구를 들 수 있는데, 이러한 방법들은 고차원 데이터의 온라인 분석시에 발생하는 데이터 폭발 현상을 근본적으로 해결하지는 못한다. 본 연구에서는 고차원 데이터가 분석될 때 실제로 저차원 집계 테이블들이 주로 사용된다는 점에 착안하여 데이터 폭발 현상을 감소시키면서 데이터를 분석하는 방안을 제시한다 이 방법은 사전 집계 연산을 할 때 크기가 방대한 고차원 집계 테이블들의 생성을 생략하고, 3-6차원 또는 그 이하 차원의 집계 테이블들만을 고속으로 동시에 생성하는 방법이다.
PDF

R³ : Open Domain Question Answering System Using Structure Information of Tables (R³ : 테이블의 구조 정보를 활용한 오픈 도메인 질의응답 시스템)

Deokhyung Kang;Gary Geunbae Lee
- Annual Conference on Human and Language Technology
- /
- 2022.10a
- /
- pp.455-460
- /
- 2022
오픈 도메인 질의 응답에서 질의에 대한 답변은 질의에 대한 관련 문서를 검색한 다음 질의에 대한 답변을 포함할 수 있는 검색된 문서를 분석함으로써 얻어진다. 문서내의 테이블이 질의와 관련이 있을 수 있음에도 불구하고, 기존의 연구는 주로 문서의 텍스트 부분만을 검색하는 데 초점을 맞추고 있었다. 이에 테이블과 텍스트를 모두 고려하는 질의응답과 관련된 연구가 진행되었으나 테이블의 구조적 정보가 손실되는 등의 한계가 있었다. 본 연구에서는 테이블의 구조적 정보를 모델의 추가적인 임베딩을 통해 활용한 오픈 도메인 질의응답 시스템인 R³를 제안한다. R³는 오픈 도메인 질의 응답 데이터셋인 NQ에 기반한 새로운 데이터셋인 NQ-Open-Multi를 이용해 학습 및 평가하였으며, 테이블의 구조적 정보를 활용하지 않은 시스템에 비해 더 좋은 성능을 보임을 확인할 수 있었다.
PDF

A Visual Programming Tool for Constructing Object-Oriented C＋＋ Class (객체 지향 C＋＋클래스 생성을 위한 시각 프로그래밍 도구)

Ha, Su-Cheol
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.1
- /
- pp.23-33
- /
- 1995
This paper describes a visual programming tool which provides novice programmer as well expert with the abilites to capture real physical world of problem domain and to manipulate it user-friendly using icons and symbols. Therefore, novice can understand object-oriented features of C＋＋ incrementally and construct classes easily. For this, we introduce some visual metaphors which are displayed as tables. The tables can not only represent objects and classes, but also be considered themselves as icons. We have named these tables as table-icons. Three levels of table-icons(i.e., Super Table-Icons, Intermediate Table-Icons and Detailed Table-Icons) are proposed to follow up appropriate evolution of object-oriented concepts. Table-icons are not simple pictographs but are activated and expanded to table forms. And then, developer can inset necessary entities into table body and delete useless entities from it. These table-icons are applied to a diagramming technique, C＋＋gram［18］, which is suggested for designing and implementing C＋＋ programs.
PDF

A Reactive Chord for Efficient Network Resource Utilization in Mobile P2P Environments (모바일 P2P 환경에서 효율적인 네트워크 자원 활용을 위한 반응적인 코드)

Yoon, Young-Hyo;Kwak, Hu-Keun;Kim, Cheong-Ghil;Chung, Kyu-Sik
- Journal of KIISE:Information Networking
- /
- v.36 no.2
- /
- pp.80-89
- /
- 2009
A DHT(Distributed Hash Table) based P2P is a method that compensates disadvantages of the existing unstructured P2P method. If a DHT algorithm is used, it can do fast data search and maintain search efficiency independent of the number of peers. The peers in a DHT method send messages periodically to keep the routing table updated. In a mobile environment, the peers in a DHT method should send messages more frequently to keep the routing table updated and reduce the failure of requests. However this results in increasing the overall network traffic. In this paper, we propose a method to reduce the update load of a routing table in the existing DHT by updating it in a reactive way. In the proposed reactive method, a routing table is updated only if a data request is coming whereas it is updated periodically in the existing proactive method. We perform experiments using Chord simulator(I3) made by UC Berkely. The experimental results show the performance improvement of the proposed method compared to the existing method.
PDF KSCI

Estimating The Number of Hierarchical Distinct Values using Arrays of Attribute Value Intervals (속성값 구간 배열을 이용한 계층 상이값 갯수의 계산 기법)

Song, Ha-Joo;Kim, Hyoung-Joo
- Journal of KIISE:Computing Practices and Letters
- /
- v.6 no.2
- /
- pp.265-273
- /
- 2000
In relational database management systems(RDBMS), a table consIn relational database management systems(RDBMS), a table consists of sets of records which are composed of a set of attributes. The number of distinct values(NDV) of an attribute denotes the number of distinct attribute values that actually appear in the database records, and is widely used in optimizing queries and supporting statistic queries. Object-relational database management systems(ORBBMSS), however, support the inheritance between tables which enforces an attribute defined in a super-table to be inherited in sub-tables automatically. Hence, in ORDBMSS, not only NDV of an attribute In a single table but also NDV of an attribute in multiple tables(HNDV) is needed. In this paper, we propose a method that calculates HNDV using arrays of attribute value intervals. In this method, an array of attribute value intervals is created for an attribute of interest In each table in a table hierarchy, and HNDV can be calculated or estimated by merging the arrays of attribute value intervals. The proposed method accurately calculates HNDV using small additional storage space and is efficient for an environment where only some of the tables in a table hierarchy are frequently updated.
PDF

Word Extraction from Table Regions in Document Images (문서 영상 내 테이블 영역에서의 단어 추출)

Jeong, Chang-Bu;Kim, Soo-Hyung
- The KIPS Transactions:PartB
- /
- v.12B no.4 s.100
- /
- pp.369-378
- /
- 2005
Document image is segmented and classified into text, picture, or table by a document layout analysis, and the words in table regions are significant for keyword spotting because they are more meaningful than the words in other regions. This paper proposes a method to extract words from table regions in document images. As word extraction from table regions is practically regarded extracting words from cell regions composing the table, it is necessary to extract the cell correctly. In the cell extraction module, table frame is extracted first by analyzing connected components, and then the intersection points are extracted from the table frame. We modify the false intersections using the correlation between the neighboring intersections, and extract the cells using the information of intersections. Text regions in the individual cells are located by using the connected components information that was obtained during the cell extraction module, and they are segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The experiment performed on In table images that are extracted from Korean documents, and shows $99.16\%$ accuracy of word extraction.
https://doi.org/10.3745/KIPSTB.2005.12B.4.369 인용 PDF KSCI

Query Optimization Techniques for Horizontal Tables in OLAP Environment (OLAP 환경의 수평적인 테이블에 대한 질의 최적화 방법)

Shin Sung-Hyun;Moon Yang-Sae;Kim Jin-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.06c
- /
- pp.70-72
- /
- 2006
데이터 웨어하우스는 방대한 이력 데이터들을 저장하는 저장소이며, 이를 다양한 관점에서 분석하기 위해 OLAP (On-Line Analytical Processing) 연산을 이용한다. 일반적으로 이러한 저장소는 데이터를 저장할 때 많은 열(columns) 을 기반으로 저장하는 와이드(wide) 형태의 테이블로 저장하게 된다. 하지만, 관계형 DBMS에서는 열 수의 제약(MS SQLServer, Oracle 등 열의 수는 1024개임)을 받게 되므로, 그 이상의 열들을 저장할 수 없다. 하지만, 열 기반(이하, 수평 테이블)으로 저장하는 것보다는 관계형 DBMS의 특징을 이용하여 행(row) 기반(이하, 수직 테이블)으로 저장하게 되면 많은 데이터를 효율적으로 저장할 수 있다. 이때, 저장 테이블의 스키마 구조가 변경되므로, 수평 테이블에 대한 질의도 저장된 수직 테이블에 적용 가능하도록 변화시켜야 한다. 또한, 사용자에게 빠른 질의 응답을 제공하기 위해 질의 최적화를 고려하여 실행전락을 세워야 한다. 따라서 본 연구에서는 경험(heuristic)을 근거로 각 연산(프로젝션, 실렉션, 조인 연산)을 위한 질의 트리를 생성하여 질의 최적화에 대한 여러 질의 경로를 고려하고, 다양한 실험을 통해 질의 최적화에 대한 접근 경로들을 분석한다. 이로써, 본 연구의 질의 경로 분석을 기반으로 최적화 실행 계획을 기대해 본다.
PDF

Implementation of Physis Engine for Interactions Representation of Object at Table Top Display Interface (테이블 탑 디스플레이 인터페이스에서 물체의 상호작용 표현을 위한 물리 엔진 구현)

Jeong, Jong-Mun;Kim, Man-Sun;Oh, Jin-Sik;Kim, Jeong-Sik;Yang, Hyung-Jeong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.11a
- /
- pp.127-130
- /
- 2007
테이블 탑 디스플레이는 인간과 컴퓨터간의 자연스러운 상호작용을 위하여 개발된 인터페이스 중 하나이다. 이것은 인간의 직관적인 도구인 손을 이용하여 컴퓨터와 상호작용을 하기 때문에 기존의 마우스를 이용하는 시스템에 비해 사용자의 흥미를 더욱 유발시킬 수 있으며 따라서 현재 이러한 시스템을 활용하기 위한 많은 컨텐츠들이 개발되고 있다. 본 논문에서는 테이블 탑 디스플레이 인터페이스에서 자연스러운 물체의 상호작용을 지원하는 물리엔진을 구현하였다. 이를 위해 테이블 탑 평면상에서 물체의 선택을 가능하게 하는 2차원 관점의 3차원 변환을 지원하고, 물체가 이동할 때와 물체와 물체의 충돌 시 나타나는 물리현상을 벡터연산을 통해 구현하였고, 네트워크를 통해 다중 사용자 환경에서 물리엔진이 구동되도록 하였다. 본 논문에서는 테이블 탑에서 이와 같은 기능들의 구현을 에어하키 게임을 통해 보인다. 에어 하키는 테이블 위에 퍽을 놓고 라켓으로 퍽을 쳐서 상대방의 골문에 넣어 점수를 얻는 게임이다. 본 논문에서 제안한 물리엔진을 이용함으로써 사용자는 보다 실감나는 인터페이스를 느낄 수 있다.
https://doi.org/10.3745/PKIPS.y2007m11a.127 인용 PDF

A Study on the ComputerAided Processing of SentenceLogic Rule (문장논리규칙의 컴퓨터프로세싱을 위한 연구)

Kum, Kyo-young;Kim, Jeong-mi
- Journal of Korean Philosophical Society
- /
- v.139
- /
- pp.1-21
- /
- 2016
To quickly and accurately grasp the consistency and the true/false of sentence description, we may require the help of a computer. It is thus necessary to research and quickly and accurately grasp the consistency and the true/false of sentence description by computer processing techniques. This requires research and planning for the whole study, namely a plan for the necessary tables and those of processing, and development of the table of the five logic rules. In future research, it will be necessary to create and develop the table of ten basic inference rules and the eleven kinds of derived inference rules, and it will be necessary to build a DB of those tables and the computer processing of sentence logic using server programming JSP and client programming JAVA over its foundation. In this paper we present the overall research plan in referring to the logic operation table, dividing the logic and inference rules, and preparing the listed process sequentially by dividing the combination of their use. These jobs are shown as a variable table and a symbol table, and in subsequent studies, will input a processing table and will perform the utilization of server programming JSP, client programming JAVA in the construction of subject/predicate part activated DB, and will prove the true/false of a sentence. In considering the table prepared in chapter 2 as a guide, chapter 3 shows the creation and development of the table of the five logic rules, i.e, The Rule of Double Negation, De Morgan's Rule, The Commutative Rule, The Associative Rule, and The Distributive Rule. These five logic rules are used in Propositional Calculus, Sentential Logic Calculus, and Statement Logic Calculus for sentence logic.
https://doi.org/10.20293/jokps.2016.139.1 인용

Search Result 2,305, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)