Search | Korea Science

HTML Text Extraction Using Tag Path and Text Appearance Frequency (태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출)

Kim, Jin-Hwan;Kim, Eun-Gyung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.12
- /
- pp.1709-1715
- /
- 2021
In order to accurately extract the necessary text from the web page, the method of specifying the tag and style attributes where the main contents exist to the web crawler has a problem in that the logic for extracting the main contents. This method needs to be modified whenever the web page configuration is changed. In order to solve this problem, the method of extracting the text by analyzing the frequency of appearance of the text proposed in the previous study had a limitation in that the performance deviation was large depending on the collection channel of the web page. Therefore, in this paper, we proposed a method of extracting texts with high accuracy from various collection channels by analyzing not only the frequency of appearance of text but also parent tag paths of text nodes extracted from the DOM tree of web pages.
https://doi.org/10.6109/jkiice.2021.25.12.1709 인용 PDF KSCI

Performance Evaluation of Tag Switching (태그 스위칭 기술 성능 분석)

오경희;이수경;손홍세;송주석
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10c
- /
- pp.560-562
- /
- 1999
인터넷의 등장과 네트워킹 기술의 빠른 발전은 다양한 응용의 등장 및 사용자의 증가에 의한 대역폭 요구량 증가 등의 변화를 가져왔다. 이러한 변화와 기존 라우터의 한계점으로 인해 스위칭과 라우팅 장비의 고성능화, 확장된 라우팅 기능의 제공 등이 필요하게 되었고, 이를 위하여 IETF는 현재 MPLS라는 Label switching 방식을 표준화 중이다. 이 표준화 작업에 기반이 된 기술 중의 하나가 태그 스위칭 기술이며, 본 논문에서는 이 기술에 대한 성능을 분석하였다. 표준 및 스위치 개발이 연구중인 현 시점에서, 태그 스위칭 기술의 성능 평가 결과는 특히, ATM의 스위칭 능력과 IP계층 능력의 효율적인 활용의 기반 자료가 될 것이다. 본 논문은 라우터와 태그 스위치를 포함하는 망을 구성하고 NLANR에서 제공하는 인터넷 트래픽을 입력 트래픽 소스로 하여 성능 평가를 수행하였으며, 태그 스위칭의 구조 및 ATM testbed에서의 구현 시, 이 기술이 갖는 스위칭의 기능성 및 성능을 분석하였다.
PDF

WCTT: Web Crawling System based on HTML Document Formalization (WCTT: HTML 문서 정형화 기반 웹 크롤링 시스템)

Kim, Jin-Hwan;Kim, Eun-Gyung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.4
- /
- pp.495-502
- /
- 2022
Web crawler, which is mainly used to collect text on the web today, is difficult to maintain and expand because researchers must implement different collection logic by collection channel after analyzing tags and styles of HTML documents. To solve this problem, the web crawler should be able to collect text by formalizing HTML documents to the same structure. In this paper, we designed and implemented WCTT(Web Crawling system based on Tag path and Text appearance frequency), a web crawling system that collects text with a single collection logic by formalizing HTML documents based on tag path and text appearance frequency. Because WCTT collects texts with the same logic for all collection channels, it is easy to maintain and expand the collection channel. In addition, it provides the preprocessing function that removes stopwords and extracts only nouns for keyword network analysis and so on.
https://doi.org/10.6109/jkiice.2022.26.4.495 인용 PDF KSCI

A Hybrid Approach to Arbitrate Tag Collisions in RFID systems (RFID 시스템에서 태그 충돌 중재를 위한 하이브리드 기법)

Ryu, Ji-Ho;Lee, Ho-Jin;Seok, Yong-Ho;Kwon, Tae-Kyoung;Choi, Yang-Hee
- Journal of KIISE:Information Networking
- /
- v.34 no.6
- /
- pp.483-492
- /
- 2007
In this paper, we propose a new hybrid approach based on query tree protocol to arbitrate tag collisions in RFID systems. The hybrid query tree protocol that combines a tree based query protocol with a slotted backoff mechanism. The proposed protocol decreases the average identification delay by reducing collisions and idle time. To reduce collisions, we use a 4-ary query tree instead of a binary query tree. To reduce idle time, we introduce a slotted backoff mechanism to reduce the number of unnecessary Query commands. Simulation and numerical analysis reveal that the proposed protocol achieves lower identification delay than existing tag collision arbitration protocols.
PDF KSCI

A Design and Implementation of Container Localization for Ubiquitous Logistics Environment (유비쿼터스 물류환경을 위한 컨테이너 위치 확인 시스템 설계 및 구현)

Jung Dongho;Jung Yeonsu;Kim Junghyo;Baek Yunju
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.11a
- /
- pp.205-207
- /
- 2005
유비쿼터스 컴퓨팅 응용으로 위치 추적 서비스, 산업용 제어 핀 관리 시스템, 홈 자동화 시스템 등 다양한 서비스들이 이미 사용되고 있거나 개발되고 있다. 물류 환경에서의 화물 컨테이너 관리 시스템은 선적하는 컨테이너의 안전을 보장하고 선적 경로와 현재의 위치 등을 제공하여 수출입업자에게 경제적인 이점을 제공할 수 있다. 본 논문에서는 유비쿼터스 물류환경에서 화물 컨테이너의 현재 위치를 확인할 수 있는 시스템을 설계하고 구현하였다. 컨테이너 위치 확인 시스템은 DV-hop 기법을 바탕으로 리더와 태그간의 홉 수를 계산하고 이를 이용하여 컨테이너의 정확한 위치를 결정한다. 시뮬레이션은 TOSSIM을 이용하여 리더와 태그의 정보 수집을 하였고, 서버 프로그램의 정보 분석 과정을 거쳐 태그의 위치를 계산하였다.
PDF

Backward Channel Protection Method For RFID Tag Security in the Randomized Tree Walking Algorithm (랜덤화된 트리워킹 알고리즘에서의 RFID 태그 보안을 위한 백워드 채널 보호 방식)

Choi Wonjoon;Roh Byeong-hee;Yoo S. W.;Oh Young Cheol
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.5C
- /
- pp.415-421
- /
- 2005
Passive RFID tag does not have its own power, so it has very poor computation abilities and it can deliver signals in very short range. From the facts, most RFID Tag security schemes assumed that the backward channel from tags to a reader is safe from eavesdropping. However, eavesdroppers near a tag can overhear message from a tag illegally. In this paper, we propose a method to protect the backward channel from eavesdropping by illegal readers. The proposed scheme can overcome the problems of conventional schemes such as randomized tree walking, which have been proposed to secure tag information in tree-walking algorithm as an anti-collision scheme for RFID tags. We showed the efficiency of our proposed method by using an analytical model, and it is also shown that the proposed method can provide the probability of eavesdropping in some standardized RFID tag system such as EPCglobal, ISO, uCode near to '0'.
PDF KSCI

Indoor Positioning Using RFID Technique (RFID 기술을 이용한 실내 위치 추적)

Yoon, Chang-sun;Kim, Tae-in;Kim, Hyeon-jin;Hong, Yeon-chan
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.1
- /
- pp.207-214
- /
- 2016
RFID technology is a technology perceiving information with the device called reader and tag which is now used in public transportation such as Hi-pass. In this paper, we design a system which tracks indoor location using this technology. GPS, the most frequently used location-tracking system, has a defect that its accuracy decreases when the device is indoor. In suggested experiment, we simulate signals according to the moving of located objects, then compare with the result of the experiment. Based on the extracted data, we inform data which is for the purpose of tracking system based on analysis of the route and errors. Simulations for the tracking were performed with relocation of real objects. In the real experiment, we arrange the readers around the room and move the tagged object that we like to know the location, then analyze the data from the equipment. This paper suggests the analyzed data for the future indoor tag tracking applications. We expect that the RFID based location positioning data will be used for other indoor positioning research and development.
https://doi.org/10.6109/jkiice.2016.20.1.207 인용 PDF KSCI

Detection of Protein Subcellular Localization based on Syntactic Dependency Paths (구문 의존 경로에 기반한 단백질의 세포 내 위치 인식)

Kim, Mi-Young
- The KIPS Transactions:PartB
- /
- v.15B no.4
- /
- pp.375-382
- /
- 2008
A protein's subcellular localization is considered an essential part of the description of its associated biomolecular phenomena. As the volume of biomolecular reports has increased, there has been a great deal of research on text mining to detect protein subcellular localization information in documents. It has been argued that linguistic information, especially syntactic information, is useful for identifying the subcellular localizations of proteins of interest. However, previous systems for detecting protein subcellular localization information used only shallow syntactic parsers, and showed poor performance. Thus, there remains a need to use a full syntactic parser and to apply deep linguistic knowledge to the analysis of text for protein subcellular localization information. In addition, we have attempted to use semantic information from the WordNet thesaurus. To improve performance in detecting protein subcellular localization information, this paper proposes a three-step method based on a full syntactic dependency parser and WordNet thesaurus. In the first step, we constructed syntactic dependency paths from each protein to its location candidate, and then converted the syntactic dependency paths into dependency trees. In the second step, we retrieved root information of the syntactic dependency trees. In the final step, we extracted syn-semantic patterns of protein subtrees and location subtrees. From the root and subtree nodes, we extracted syntactic category and syntactic direction as syntactic information, and synset offset of the WordNet thesaurus as semantic information. According to the root information and syn-semantic patterns of subtrees from the training data, we extracted (protein, localization) pairs from the test sentences. Even with no biomolecular knowledge, our method showed reasonable performance in experimental results using Medline abstract data. Our proposed method gave an F-measure of 74.53% for training data and 58.90% for test data, significantly outperforming previous methods, by 12-25%.
https://doi.org/10.3745/KIPSTB.2008.15-B.4.375 인용 PDF KSCI

A study on improve survivability of sensor node and design of protocol in RFID Middleware environment (RFID 미들웨어 환경에서 센서 노드의 생존성 향상과 효율적인 프로토콜 설계를 위한 연구)

Choi, Yong-Sik;John, Young-Jun;Park, Sang-Hyun;Han, Soo;Shin, Sung-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10d
- /
- pp.68-73
- /
- 2006
센서 노드의 송수신 상태를 분석하여 노드의 생존성 향상과 효율적인 프로토콜 설계를 하려고 한다. 센서 노드의 분석을 위한 실험 환경은 다음과 같다. 센서 노드의 생존성-가용 배터리, 센서 노드의 출력-검색 가능 영역, 센서 노드의 통신 경로-라우팅 테이블 생성, 센서 노드의 대역폭-송신 데이터의 크기이다. RFID 태그와 리더를 통한 관리 시스템과 재해방지를 위한 다양한 센서를 통한 정보 수집 시스템과 의사결정 시스템에 적용 될 수 있다. 그리고 다양한 센서 데이터로부터 수신된 데이터의 자료수집, 센서분류, 수신율 조절 시스템을 위한 프로토콜 설계 자료로 활용 가능하다.
PDF

A Dictionay Composition for Morphological Analyzer from Corpus (코퍼스로부터 형태소 분석을 위한 사전 구성)

Jung, Min-Su;Jung, Kyu-Chol;Cho, Won-Hong
- Annual Conference on Human and Language Technology
- /
- 1998.10c
- /
- pp.316-320
- /
- 1998
한국어나 일본어처럼 문법형태소의 기능에 의해 단어의 통사적, 의미적 역할이 결정되는 교착어에서는 형태소 분석이 통사 분석과 의미 분석에 미치는 영향이 크기 때문에 한국어의 분석에 있어서 형태소 분석은 아주 중요하다. 관형적 표현이 많은 한글은 문법 규칙만으론 분석하기가 쉽지 않고, 분기가 많이 생성되므로 오류가 발생할 확률도 높다. 이러한 문제점을 해결하기 위해 본 논문에선 사전을 중심으로 해결하고자 한다. 그러기 위해선 방대한 용량의 사전이 필요로 하게 되고 이를 구축하기 위한 시간과 노력이 요구되므로 이미 구성된 코퍼스를 이용해 사전을 구성하여 많은 시간과 노력을 줄일 수 있도록 한다. 그리고 생성되는 많은 분기 가운데 올바른 경로를 찾아 가기 위해 코퍼스내의 각 태그 결합정보를 추출하고 추출한 결합정보의 통계정보-코퍼스내에서 사용된 빈도수-포함하여 우선순위를 정하도록 한다.
PDF

Search Result 33, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)