• Title/Summary/Keyword: Electronic Data Collection

Search Result 218, Processing Time 0.027 seconds

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

A Study on Policy Researchers' Requirements for Policy Information Providing Service (정책정보제공서비스에 대한 정책연구자 요구분석에 관한 연구)

  • Noh, Younghee;Sim, Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.3
    • /
    • pp.137-168
    • /
    • 2014
  • This study proposed to seek the future development direction of policy information service in Korea by identifying users' needs for policy information and analyzing policy information users' behavior. For this purpose, we analyzed policy information users' needs and behavior through survey and interview methods, and the results are as follows. First, the most common purpose of policy information use was figuring out policy trends, and the Internet was the most common search method used in investigating policy information. Second, electronic resources showed a high rate of use, but the domestic material utilization ratio was higher, and the U.S. resources usage of overseas data was the highest. Third, users most often used materials produced within the last 2-5 years, and Web DB (journals, academic articles, etc.) and reports were the most used material types. Fourth, in the survey of opinion about methods for improving policy information utilization efficiency, cooperation between government agencies' libraries, cooperation between agencies producing the policy information resources, and the overall national collection of policy information were rated the highest.

A Study on the Roles of Academic Library for Supporting Class and Learning Activities in Korea (대학도서관의 수업·학습 활동 지원 역할에 관한 연구)

  • Lee, Yong-Jae;Lee, Ji-Wook
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.4
    • /
    • pp.359-379
    • /
    • 2019
  • This study aims to suggest the ways to reinforce academic library's supports for users' class·learning activities. For this purpose, this study collected the development plans of academic libraries in Korea, and analysed the plans for supporting class·learning activities. As a result, it is shown that the most libraries emphasized 'expansion of learning material' and marked it on development plan. As subsequent plans, libraries provided the action plans of 'expansion of reading education and reading programs', 'expansion of electronic materials', 'expansion of characterized materials' one after another. This study suggests 'user-centered collection development and expansion of learning materials', 'activation of library services making use of big data', 'enlargement of engagement services for handicapped and foreign students' as ways to strengthen the services of academic libraries to support class·learning activities of users.

Implementation of Virtual Reader and Tag Emulator System Using DSP Board (DSP 보드를 이용한 가상의 리더와 태그 에뮬레이터 시스템 구현)

  • Kim, Young-Choon;Joo, Hae-Jong;Choi, Hae-Gill;Cho, Moon-Taek
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.10
    • /
    • pp.3859-3865
    • /
    • 2010
  • Modeling a virtual reader and tags, the emulator system is realized by using a commercial signal generation device to make signal, a data collection equipment, and DSP board. By using a Virtual UHF RFID (860 ~ 960 [MHz]) reader/tags module, a developed RFID reader, protocol of tag, and properties of RF support to provide the way how to verify the suitability to international standards (ISO 18000-6 Type C, EPCglobal C1G2). In this paper, to implement a proposed model reader and tag model, Visual DSP is applied by using DSP board, composing the system's signal generators, signal analyzers and performance verification, the target readers or tags, RFID emulator control computesr and control programs.

A Best-Effort Control Scheme on FDDI-Based Real-Time Data Collection Networks (FDDI 기반 실시간 데이타 수집 네트워크에서의 최선노력 오류제어 기법)

  • Lee, Jung-Hoon;Kim, Ho-Chan
    • Journal of KIISE:Information Networking
    • /
    • v.28 no.3
    • /
    • pp.347-354
    • /
    • 2001
  • This paper proposes and analyzes an error control scheme which tries to recover the transmission error within the deadline of a message on FDDI networks. The error control procedure does not interfere other normal message transmissions by delivering retransmission request via asynchronous traffic as well as by delivering retarnsmitted message via overallocated bandwidth which is inevitably produced by the bandwidth allocation scheme for hard real-time guarantee. The receiver counts the number of tokens which it meets, determines the completion of message transmission, and finally sends error report. The analysis results along with simulation performed via SMPL show that the proposed scheme is able to enhance the deadline meet ratio of messages by overcoming the network errors. Using the proposed error control scheme, the hard real-time network can be built at cost lower than, but performacne comparable to the dual link network.

  • PDF

An Evaluation Study on the Copyright Protection Environment for Digital Libraries (디지털도서관의 저작권보호 환경 평가 연구)

  • 이종문
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.3
    • /
    • pp.211-326
    • /
    • 2002
  • This study is to analyze and evaluate copyright protection environment in digital reproduction and transmission, find out problems involved. and suggest recommendations for improvement. Data was collected 50 libraries to which digital reproduction and transmission is permitted under the Copyright Act were surveyed to examine the present states of digitalization and the systems employed. Also, library users, sampled from 5 libraries carrying out all the 6 technical measures obliged by the Copyright Act, were surveyed to examine their use of digital materials and perception on the copyright. After reviewing descriptive statistics, frequency and cross-tabulation analysis were made. The results of the analysis are most of the libraries. except for the industry university libraries, have implemented digital library systems, but the introduction rate of both digital reproduction system and transmission system are high (68.0% and 84.0%. respectively), while that of copyright protection system was low (26.0%). 84.0% of the libraries surveyed digitalize the full text, but the Libraries have digital collection less than 5,000 items, and only 33.3% of them digitalize materials with securing the copyrights. Regulations on copyright protections are not obliged properly. and it appeared that not only users' perception and perception of copyright as well as their use of electronic books are relatively low.

The Effects of Characteristics of User and System on the Perceived Cognition and the Continuous Use Intention of Fintech (핀테크(fintech) 사용자와 시스템 특성이 지각된 인식과 지속사용의도에 미치는 영향)

  • Lee, Jun-Sang;Park, Jun-Hong
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.1
    • /
    • pp.291-301
    • /
    • 2018
  • The purpose of this study is to investigate the factors that affect the perceived awareness and the intention of continuous use by FinTech users and system characteristics. Data collection was carried out by targeting and surveying 600 people living in Gwangju, and office workers using smartphones. As a result, first, self-efficacy, innovation, and fitness for Fin-Tech services were found to influence the degree of perceptual awareness and intent to use of Fin-tech service users. Second, the system characteristics have a positive effect on perceived awareness and intention of using FinTech service. Third, the hypothesis about the dangers in the user attributes and system properties were dismissed. It seems that the priority concern was regarding the leakage of personal information and security as privacy and the increasing damage cases of financial fraud by electronic financial transactions spill. Therefore, in order to spread FinTech services, it would be effective if a Fin-Tech service strategy could eliminate inconveniences such as the risk of hindering convenience and intention to use by the marketing strategy established by the company.

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

A Consideration on Connecting Operations among Freeway Management Companies (고속도로 관리자간 상호 연계체계 수립에 관한 고찰(한국토로공사가 관리하는 노선을 중심으로))

  • Lee, Gi-Yeong;Kim, Dong-Nyeong;Son, Ui-Yeong;Lee, Cheong-Won
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.4 s.90
    • /
    • pp.19-29
    • /
    • 2006
  • As the rapid increase of highway investments from Private sector the organizations of highway operator became diversified, and thus causing various unexpected problems. Highways invested by private capitals have different fare rates and managing systems. It is desirable to reduce drivers' inconvenience using more than two sections of highways spread over different jurisdictions. The main purposes of this research are i) data survey and problem statements ii) prediction of future problems and preparing appropriate countermeasures. This research are divided into two parts. They are management system and fare collection system. Major investigations of this study are as follows, optimum toll operation models depend on charging systems and interchange shapes of two interconnecting expressways. Assuming that the current payment and the new electronic payment system aye used concurrently for a while, some alternatives for the inter-operation to collect tolls ate suggested focusing on the efficiency to tollway corporations as well as convenience to drivers. The advantages and disadvantages of the alternatives, including their characteristics, are compared.

RFID based the SME algorithm for the multi-lane-supproted ETCS (다차선 서비스를 제공하는 자동요금징수시스템을 위한 RFID 기반 SME 알고리즘)

  • Cha, Jin;Jung, Jong-In;Jang, Sang-Woo;Lee, Sang-Sun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.1C
    • /
    • pp.8-16
    • /
    • 2012
  • In order for the support of the ETCS (Electronic Toll Collection System) model, which have been operated successfully till now, with multi-lane service, the wireless communication system of the multi-lane-supported ETCS based on RF-DSRC (Radio Frequency - Dedicated Short Range Communication) was used. In this paper, the SME algorithm attaching data flow and form into RFID communication technology was newly suggested to overcome technical problems on RF-DSRC communication system. In addition, in order to verify the SME algorithm, experiments based on ETCS and 900Mhz RFID were carried out. From the result of realization experiment of RFID dependent on the velocity and precision experiment of information inside of RFID, we can see that RFID is detected below 70Km/h and the precision of estimation is more than 90%.