• Title/Summary/Keyword: reading information processing

Search Result 154, Processing Time 0.02 seconds

HTML Tag Depth Embedding: An Input Embedding Method of the BERT Model for Improving Web Document Reading Comprehension Performance (HTML 태그 깊이 임베딩: 웹 문서 기계 독해 성능 개선을 위한 BERT 모델의 입력 임베딩 기법)

  • Mok, Jin-Wang;Jang, Hyun Jae;Lee, Hyun-Seob
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Recently the massive amount of data has been generated because of the number of edge devices increases. And especially, the number of raw unstructured HTML documents has been increased. Therefore, MRC(Machine Reading Comprehension) in which a natural language processing model finds the important information within an HTML document is becoming more important. In this paper, we propose HTDE(HTML Tag Depth Embedding Method), which allows the BERT to train the depth of the HTML document structure. HTDE makes a tag stack from the HTML document for each input token in the BERT and then extracts the depth information. After that, we add a HTML embedding layer that takes the depth of the token as input to the step of input embedding of BERT. Since tokenization using HTDE identifies the HTML document structures through the relationship of surrounding tokens, HTDE improves the accuracy of BERT for HTML documents. Finally, we demonstrated that the proposed idea showing the higher accuracy compared than the accuracy using the conventional embedding of BERT.

Effects of content and formal schema on reading comprehension (내용과 형식 스키마가 독해에 미치는 영향)

  • Yeon, Jun-Hum
    • English Language & Literature Teaching
    • /
    • no.3
    • /
    • pp.95-122
    • /
    • 1997
  • The purpose of this research was to investigate the effects of content and formal schema on reading comprehension. Five hundred fiftynine subjects from high school were assigned to one of the following levels and treatment conditions : (1) Higher level & Schema Activation, (2) Higher level & Non-schema Activation, (3) Lower level & Schema Activation, and (4) Lower level & Non-schema Activation. To evaluate the effects of schema activation. two experiments were conducted : one was related to the content schema and the other to the formal schema. To evaluate the effects of content schema, three different types of tests were conducted : (1) cloze test, (2) guessing the meanings of nonsense words, and (3) immediate recall test. To evaluate the effects of formal schema instruction, four kinds of tests were conducted : (1) sorting the sentences according to the importance, (2) identifying the signal words, (3) immediate recall test, and (4) identifying the specific information. For content schema condition, results indicated that the subjects given the titles or pictures before reading in "Content Schema Activation" treatment had better grades than those of the other treatment in all types of tests. regardless of their levels. Schema activation helped the subjects to increase the cognitive predictability of missing words and to participate in the tasks more actively with risk-taking. And it was also shown that good readers tend to process the words meaningfully, while poor readers tend to process the words phonetically or morphologically. Formal schema activation through teaching the text organization also had a significant influence on three types of tests: sorting the sentences according to the importance, identifying the signal words, and immediate recall test, but not on identifying the specific information. The implications from this study can be briefly noted as follows : (l) In teaching reading, the student's background knowledge should be activated as a pre-reading activity. (2) In reading, it is more important to emphasize the student's schema than the features of the text. (3) Various educational interventions should be introduced, especially for the lower level students. (4) Teaching text structures can be a powerful method for the top-down processing strategy.

  • PDF

The Applicability of Schema Theory to Scientific Texts

  • Im, Byung-Bin;Lee, Jong-Hee
    • English Language & Literature Teaching
    • /
    • v.10 no.1
    • /
    • pp.1-22
    • /
    • 2004
  • The primary purpose of this study is to investigate the applicability of content and formal schemata for processing the scientific texts which encompass the human knowledge of the physical world. In general, schema theory is based on the culture-oriented background of a text. From this point of view, the problem as to whether both content and formal schemata are applicable to the comprehension of a scientific text deserves a focal attention in terms of information processing modes. The results of empirical study indicate that whereas the universality of general knowledge content about the natural world attenuates the tenets of schema theory, the rhetorical organization of scientific texts encourages the application of the schema-based approach; the reader's familiarity with the structural patterns of a text facilitates his reading comprehension.

  • PDF

XML-based EDI Document Processing System with Binary Format Mapping Rules

  • Kim, Chang-Su;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.3
    • /
    • pp.258-263
    • /
    • 2012
  • Recently, the magnitude of electronic data interchange (EDI) document processing for the handling of port logistics is abruptly being increased. The existing system processes EDI documents in a script mode, but due to a complicated script preparation procedure and low document processing efficiency, it cannot meet the demand as the usage flow of documents increases. In this paper, an EDI electronic document processing system was designed and implemented in a document scanner and mapper, which are binary form electronic document processing tools and do not require script files during the conversion of extensible markup language (XML)-based electronic documents. This new system has the merits of XML features during reading and writing with improved speed, usage convenience, and good portability on systems when compared to the conventional ones.

The effects on intervening of dyslexia high-risk group middle and high school students in Childcare Facilities : Apply the intervention program improves auditory processing (아동 양육 시설의 난독증 고위험군 중·고등학생에 대한 중재 효과: 청각정보처리 개선 중재프로그램 적용)

  • Kim, Eun-Hee;Song, Sun-Hee
    • Journal of Digital Convergence
    • /
    • v.14 no.7
    • /
    • pp.1-10
    • /
    • 2016
  • The study researches regarding changing in reading and auditory perception ability after intervening dyslexia high-risk group of middle and high school students for improvement on auditory processing of neurological factors, and then aims to make a systematic intervention program for dyslexia students in verifying efficacy. The target of study is dyslexia high-risk group of 168 middle and high school students after a preliminary screening test in Childcare Facilities which are vulnerable social group in Gyeonggi province. They are tested deeply by reading and auditory perception ability, and then 24 students are selected. They are tested 20 times by auditory perception simulation and fluent reading train in TOMATIS Method for improving auditory system which transmits language information, and then are taken by post tests. In conclusion intervention efficacy is sure because reading, understanding, and auditory processing ability are in significant change of statistical analysis after preliminary and post tests. It is an important achievement to show necessity of neurological science access in diagnosis and intervention method of dyslexia through the research.

KorPatELECTRA : A Pre-trained Language Model for Korean Patent Literature to improve performance in the field of natural language processing(Korean Patent ELECTRA)

  • Jang, Ji-Mo;Min, Jae-Ok;Noh, Han-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.15-23
    • /
    • 2022
  • In the field of patents, as NLP(Natural Language Processing) is a challenging task due to the linguistic specificity of patent literature, there is an urgent need to research a language model optimized for Korean patent literature. Recently, in the field of NLP, there have been continuous attempts to establish a pre-trained language model for specific domains to improve performance in various tasks of related fields. Among them, ELECTRA is a pre-trained language model by Google using a new method called RTD(Replaced Token Detection), after BERT, for increasing training efficiency. The purpose of this paper is to propose KorPatELECTRA pre-trained on a large amount of Korean patent literature data. In addition, optimal pre-training was conducted by preprocessing the training corpus according to the characteristics of the patent literature and applying patent vocabulary and tokenizer. In order to confirm the performance, KorPatELECTRA was tested for NER(Named Entity Recognition), MRC(Machine Reading Comprehension), and patent classification tasks using actual patent data, and the most excellent performance was verified in all the three tasks compared to comparative general-purpose language models.

Spatio-temporal Load Analysis Model for Power Facilities using Meter Reading Data (검침데이터를 이용한 전력설비 시공간 부하분석모델)

  • Shin, Jin-Ho;Kim, Young-Il;Yi, Bong-Jae;Yang, Il-Kwon;Ryu, Keun-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.11
    • /
    • pp.1910-1915
    • /
    • 2008
  • The load analysis for the distribution system and facilities has relied on measurement equipment. Moreover, load monitoring incurs huge costs in terms of installation and maintenance. This paper presents a new model to analyze wherein facilities load under a feeder every 15 minutes using meter reading data that can be obtained from a power consumer every 15 minute or a month even without setting up any measuring equipment. After the data warehouse is constructed by interfacing the legacy system required for the load calculation, the relationship between the distribution system and the power consumer is established. Once the load pattern is forecasted by applying clustering and classification algorithm of temporal data mining techniques for the power customer who is not involved in Automatic Meter Reading(AMR), a single-line diagram per feeder is created, and power flow calculation is executed. The calculation result is analyzed using various temporal and spatial analysis methods such as Internet Geographic Information System(GIS), single-line diagram, and Online Analytical Processing (OLAP).

The Efficacy of Zoom Technology as an Educational Tool for English Reading Comprehension Achievement in EFL Classroom

  • Kim, HyeJeong
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.198-205
    • /
    • 2020
  • The purpose of this study is to investigate the effect of real-time remote video instruction using zoom on learners' English reading achievement. The study also sought to identify the efficiency of zoom video lectures and consider supplementing them by surveying learners' opinions and satisfaction regarding zoom video lectures. To this end, control and experimental groups were set up, and two achievement tests and a questionnaire were conducted. The study's results demonstrated that zoom video lectures have a positive effect on learners' English reading achievement. The questionnaire found that learners are satisfied with zoom video lectures for the following reasons: 'increased interest in and motivation towards learning', 'self-directed learning', 'active interaction', 'ease of access', 'ease of information retrieval'. At the same time, the questionnaire also found that some learners are dissatisfied with zoom video lectures due to 'mechanical errors or defects', 'poor audio quality', and 'the need to add customized functions for efficient classes'. In practice, zoom video lectures must be supplemented with automatic attendance processing, convenient data upload and download, and more efficient video screen management. Given the recent increase in online classes, we, as instructors, must develop teaching activities and/or strategies for video lectures that can encourage active participation by learners.

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

  • Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.233-243
    • /
    • 2016
  • Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.

Design and Implement of Power-Data Processing System with Optimal Sharding Method in Ethereum Blockchain Environments

  • Lee, Taeyoung;Park, Jaehyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.12
    • /
    • pp.143-150
    • /
    • 2021
  • In the recent power industry, a change is taking place from manual meter reading to remote meter reading using AMI(Advanced Metering Infrastructure). If such the power data generated from the AMI is recorded on the blockchain, integrity is guaranteed by preventing forgery and tampering. As data sharing becomes transparent, new business can be created. However, Ethereum blockchain is not suitable for processing large amounts of transactions due to the limitation of processing speed. As a solution to overcome such the limitation, various On/Off-Chain methods are being investigated. In this paper, we propose a interface server using data sharding as a solution for storing large amounts of power data in Etherium blockchain environments. Experimental results show that our power-data processing system with sharding method lessen the data omission rate to 0% that occurs when the transactions are transmitted to Ethereum and enhance the processing speed approximately 9 times.