• Title/Summary/Keyword: chunking

Search Result 70, Processing Time 0.022 seconds

Implicit Learning with Artificial Grammar : Simulations using EPAM IV (인공 문법을 사용한 암묵 학습: EPAM IV를 사용한 모사)

  • 정혜선
    • Korean Journal of Cognitive Science
    • /
    • v.14 no.1
    • /
    • pp.1-9
    • /
    • 2003
  • In implicit learning tasks, human participants learn grammatical letter strings better than random letter strings. After learning grammatical letter strings, participants were able to judge the grammaticality of new letter strings that they have never seen before. EPAM (Elementary Perceiver and Memorizer) IV, a rote learner without any rule abstraction mechanism, was used to simulate these results. The results showed that EPAM IV with a within-item chunking function was able to learn grammatical letter strings better than random letter strings and discriminate grammatical letter strings from non-grammatical letter strings. The success of EPAM IV in simulating human performance strongly indicated that recognition memory based on chunking plays a critical role in implicit learning.

  • PDF

The effect on problem solving according to mental demand of items and chunking. (문제의 요구주의력과 덩이지식화 효과가 문제해결에 미치는 영향)

  • Ahn, Soo-Young;Kwon, Jae-Sool
    • Journal of The Korean Association For Science Education
    • /
    • v.15 no.3
    • /
    • pp.263-274
    • /
    • 1995
  • The purpose of this study was to find out effect of problem solving by mental demand of items and chunking level of problem solver on the item. The principal findings of study were as follows ; 1) According to increase of mental demand of items. students' achievement score appeared to decrease and the more mental demand an item needed. the higher or at least the same hierarchical item was. These results showed that mental demand of item was main factor which decided difficulties of problem solving. 2) Though items have the same mental demand. students' achievement score were different between balance beam task and 2nd law task (achievement score of balance beam task < achievement score of 2nd law task). 3) Achievement score of LM group who used chunked knowledge to solve balance beam task were higher than non LM group who used non chunked knowledge. 4) The level of chunked knowledge was different between two tasks when non LM group solved items of two tasks. On the other hand, LM group used the same level of chunked knowledge to solve items of two tasks. 5) Achievement score of non LM group was the same between items of two tasks after treatment due to chunking effect by treatment. But achievement score of LM group didn't change before and after treatment. The chunking effect by treatment had an effect on non LM group, but it was not on LM group.

  • PDF

High Speed Korean Dependency Analysis Using Cascaded Chunking (다단계 구단위화를 이용한 고속 한국어 의존구조 분석)

  • Oh, Jin-Young;Cha, Jeong-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.1
    • /
    • pp.103-111
    • /
    • 2010
  • Syntactic analysis is an important step in natural language processing. However, we cannot use the syntactic analyzer in Korean for low performance and without robustness. We propose new robust, high speed and high performance Korean syntactic analyzer using CRFs. We treat a parsing problem as a labeling problem. We use a cascaded chunking for Korean parsing. We label syntactic information to each Eojeol at each step using CRFs. CRFs use part-of-speech tag and Eojeol syntactic tag features. Our experimental results using 10-fold cross validation show significant improvement in the robustness, speed and performance of long Korea sentences.

Chunking of Contiguous Nouns using Noun Semantic Classes (명사 의미 부류를 이용한 연속된 명사열의 구묶음)

  • Ahn, Kwang-Mo;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.10-20
    • /
    • 2010
  • This paper presents chunking strategy of a contiguous nouns sequence using semantic class. We call contiguous nouns which can be treated like a noun the compound noun phrase. We use noun pairs extracted from a syntactic tagged corpus and their semantic class pairs for chunking of the compound noun phrase. For reliability, these noun pairs and semantic classes are built from a syntactic tagged corpus and detailed dictionary in the Sejong corpus. The compound noun phrase of arbitrary length can also be chunked by these information. The 38,940 pairs of 'left noun - right noun', 65,629 pairs of 'left noun - semantic class of right noun', 46,094 pairs of 'semantic class of left noun - right noun', and 45,243 pairs of 'semantic class of left noun - semantic class of right noun' are used for compound noun phrase chunking. The test data are untrained 1,000 sentences with contiguous nouns of length more than 2randomly selected from Sejong morphological tagged corpus. Our experimental result is 86.89% precision, 80.48% recall, and 83.56% f-measure.

Content-based Chunking for Incremental Computation and Possible Hazards (Incremental Computation를 위한 Content-based Chunking 기법과 발생 가능한 위험요소 분석)

  • Joo, Young-Hyun;Kim, Jee-Hong;Eom, Young-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.27-29
    • /
    • 2012
  • 최근 온라인 상에는 분산 처리 환경을 바탕으로 대량의 데이터들이 생성, 수정 및 삭제가 되고 있다. 이러한 환경에서의 효율적인 데이터 처리를 위해 많은 연구들이 진행되고 있으며, 특히 데이터의 입력을 컨텐츠 단위의 청크(content-based chunk)로 분할하고, 이에 MapReduce를 적용하여 효율적으로 데이터 처리를 하는 incremental computation에 관한 연구가 주목 받고 있다. 본 논문에서는 위와 같은 연구에서 주로 이용되는 content-based chunking 기법에 대해 분석하고, 이러한 기법에서 발생 할 수 있는 위험요소에 대해서 기술한다.

CORE-Dedup: IO Extent Chunking based Deduplication using Content-Preserving Access Locality (CORE-Dedup: 내용보존 접근 지역성 활용한 IO 크기 분할 기반 중복제거)

  • Kim, Myung-Sik;Won, You-Jip
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.6
    • /
    • pp.59-76
    • /
    • 2015
  • Recent wide spread of embedded devices and technology growth of broadband communication has led to rapid increase in the volume of created and managed data. As a result, data centers have to increase the storage capacity cost-effectively to store the created data. Data deduplication is one way to save the storage space by removing redundant data. This work propose IO extent based deduplication schemes called CORE-Dedup that exploits content-preserving access locality. We acquire IO traces from block device layer in virtual machine host, and compare the deduplication performance of chunking method between the fixed size and IO extent based. At multiple workload of 10 user's compile in virtual machine environment, the result shows that 4 KB fixed size chunking and IO extent based chunking use chunk index 14500 and 1700, respectively. The deduplication rate account for 60.4% and 57.6% on fixed size and IO extent chunking, respectively.

A Hybrid of Rule based Method and Memory based Loaming for Korean Text Chunking (한국어 구 단위화를 위한 규칙 기반 방법과 기억 기반 학습의 결합)

  • 박성배;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.3
    • /
    • pp.369-378
    • /
    • 2004
  • In partially free word order languages like Korean and Japanese, the rule-based method is effective for text chunking, and shows the performance as high as machine learning methods even with a few rules due to the well-developed overt Postpositions and endings. However, it has no ability to handle the exceptions of the rules. Exception handling is an important work in natural language processing, and the exceptions can be efficiently processed in memory-based teaming. In this paper, we propose a hybrid of rule-based method and memory-based learning for Korean text chunking. The proposed method is primarily based on the rules, and then the chunks estimated by the rules are verified by memory-based classifier. An evaluation of the proposed method on Korean STEP 2000 corpus yields the improvement in F-score over the rules or various machine teaming methods alone. The final F-score is 94.19, while those of the rules and SVMs, the best machine learning method for this task, are just 91.87 and 92.54 respectively.

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection (자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상)

  • Hwang, Young-Sook;Chung, Hoo-jung;Park, So-Young;Kwak, Young-Jae;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

A Corpus-based Lexical Analysis of the Speech Texts: A Collocational Approach

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.151-170
    • /
    • 2009
  • Recently speech texts have been increasingly used for English education because of their various advantages as language teaching and learning materials. The purpose of this paper is to analyze speech texts in a corpus-based lexical approach, and suggest some productive methods which utilize English speaking or writing as the main resource for the course, along with introducing the actual classroom adaptations. First, this study shows that a speech corpus has some unique features such as different selections of pronouns, nouns, and lexical chunks in comparison to a general corpus. Next, from a collocational perspective, the study demonstrates that the speech corpus consists of a wide variety of collocations and lexical chunks which a number of linguists describe (Lewis, 1997; McCarthy, 1990; Willis, 1990). In other words, the speech corpus suggests that speech texts not only have considerable lexical potential that could be exploited to facilitate chunk-learning, but also that learners are not very likely to unlock this potential autonomously. Based on this result, teachers can develop a learners' corpus and use it by chunking the speech text. This new approach of adapting speech samples as important materials for college students' speaking or writing ability should be implemented as shown in samplers. Finally, to foster learner's productive skills more communicatively, a few practical suggestions are made such as chunking and windowing chunks of speech and presentation, and the pedagogical implications are discussed.

  • PDF

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages (SSD 기반 서버급 스토리지를 위한 지역성 기반 청킹 정책을 이용한 데이터 중복 제거 기법)

  • Lee, Seung-Kyu;Kim, Ju-Kyeong;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.2
    • /
    • pp.143-151
    • /
    • 2013
  • NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.