Search | Korea Science

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
- Proceedings of the IEEK Conference
- /
- summer
- /
- pp.382-386
- /
- 2004
Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].
PDF

Investigation into Longitudinal Writing Development Using Linear Mixed Effects Model (선형 혼합 모형을 통해 살펴본 쓰기 능력의 장기적인 발전 양상 탐색)

Lee, Young-Ju
- The Journal of the Convergence on Culture Technology
- /
- v.8 no.2
- /
- pp.315-319
- /
- 2022
This study investigates longitudinal writing development in terms of syntactic complexity using linear mixed effects (LME) model. This study employs essays written by four case study participants. Participants voluntarily wrote essays outside of the classroom and submitted the first and second drafts, after reflecting on the automated writing evaluation feedback (i.e., Criterion) every month over one year. A total of 48 first drafts were analyzed and syntactic complexity features were selected from Syntactic Complexity Analyzer. Results of LME showed that there was a significant positive linear relationship between time and mean length of T-unit and also between time and the ratio of dependent clauses to independent clauses, indicating that case study participants wrote longer T-units and also a higher proportion of dependent clauses over one year.
https://doi.org/10.17703/JCCT.2022.8.2.315 인용 PDF KSCI

Korean Parsing Model using Various Features of a Syntactic Object (문장성분의 다양한 자질을 이용한 한국어 구문분석 모델)

Park So-Young;Kim Soo-Hong;Rim Hae-Chang
- The KIPS Transactions:PartB
- /
- v.11B no.6
- /
- pp.743-748
- /
- 2004
In this paper, we propose a probabilistic Korean parsing model using a syntactic feature, a functional feature, a content feature, and a site feature of a syntactic object for effective syntactic disambiguation. It restricts grammar rules to binary-oriented form to deal with Korean properties such as variable word order and constituent ellipsis. In experiments, we analyze the parsing performance of each feature combination. Experimental results show that the combination of different features is preferred to the combination of similar features. Besides, it is remarkable that the function feature is more useful than the combination of the content feature and the size feature.
https://doi.org/10.3745/KIPSTB.2004.11B.6.743 인용 PDF KSCI

Probing Sentence Embeddings in L2 Learners' LSTM Neural Language Models Using Adaptation Learning

Kim, Euhee
- Journal of the Korea Society of Computer and Information
- /
- v.27 no.3
- /
- pp.13-23
- /
- 2022
In this study we leveraged a probing method to evaluate how a pre-trained L2 LSTM language model represents sentences with relative and coordinate clauses. The probing experiment employed adapted models based on the pre-trained L2 language models to trace the syntactic properties of sentence embedding vector representations. The dataset for probing was automatically generated using several templates related to different sentence structures. To classify the syntactic properties of sentences for each probing task, we measured the adaptation effects of the language models using syntactic priming. We performed linear mixed-effects model analyses to analyze the relation between adaptation effects in a complex statistical manner and reveal how the L2 language models represent syntactic features for English sentences. When the L2 language models were compared with the baseline L1 Gulordava language models, the analogous results were found for each probing task. In addition, it was confirmed that the L2 language models contain syntactic features of relative and coordinate clauses hierarchically in the sentence embedding representations.
https://doi.org/10.9708/jksci.2022.27.03.013 인용 PDF KSCI HTML

Processing Nominal Suffixes in Korean: Evidence from Priming Experiments

Ahn, Hee-Don;An, Duk-Ho;Choi, Jung-Yun;Hwang, Jong-Bai;Jeon, Moon-Gee;Kim, Ji-Hyon
- Language and Information
- /
- v.15 no.1
- /
- pp.1-12
- /
- 2011
This study investigates morphologically complex nouns in Korean through a series of priming studies. Two experiments examined whether morphological affixes on Korean nouns were decomposed or processed as a whole. Two types of morphological affixes were examined: morpho-syntactic case markers and the plural marker '-tul'. Results showed that priming occurred for the plural marker with SOAs of 80 ms and 160 ms, but no priming occurred for the morpho-syntactic case markers. These results suggest that the morphological processing for these two types of affixes differ. We argue that Korean nouns with the plural suffix are decomposed into the stem and affix, supporting the Decomposition Model (Pinker & Ullman, 2002). We suggest that while plural markers are truly morphological affixes, case markers in Korean are morpho-syntactic, and thus presuppose the existence of other syntactic elements, such as the matrix verb, hence the lack of priming effects.
PDF

Continuous Speech Recognition using Syntactic Analysis and One-Stage DMS/DP (구문 분석과 One-Stage DMS/DP를 이용한 연속음 인식)

안태옥
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.41 no.3
- /
- pp.201-207
- /
- 2004
This paper is a study on the recognition of continuous speech and uses a method of speech recognition using syntactic analysis and one-stage DMS/DP. In order to perform the speech recognition, first of all, we make DMS model by section division algorithm and let continuous speech data be recognized through One-stage DMS/DP method using syntactic analysis. Besides the speech recognition experiments of proposed method, we experiment the conventional one-stage DP method under the equivalent environment of data and conditions. From the recognition experiments, it is shown that Ole-stage DMS/DP using syntactic analysis is superior to conventional method.
PDF KSCI

Fuzzy Syntactic Pattern Recognition Approach for Extracting and Classifying Flaw Patterns from and Eddy-Current Signal Waveform

Kang, Soon-Ju
- Journal of Electrical Engineering and information Science
- /
- v.2 no.4
- /
- pp.59-65
- /
- 1997
In this paper, a general fuzzy syntactic method for recognition of flaw patterns and for the measurement of flaw characteristic parameters for a non-destructive inspections signal, called eddy-current, is presented. Solutions are given to the subtasks of primitive pattern selection, signal to symbol transformation, pattern grammar formulation, and event-synchronous flaw pattern extraction based on the grammars. Fuzzy attribute grammars are used as the model for the pattern grammar because of their descriptive power in the face of uncertain constraints caused by nose or distortion in the signal waveform, due to their ability to handle syntactic as well as semantic information. This approach has been implemented and the performance of eh resultant system has been evaluated using a library of law patterns obtained from steam generator tubes in nuclear power plants by an eddy current-based non-destructive inspection method.
PDF

Korean Semantic Role Labeling using Stacked Bidirectional LSTM-CRFs (Stacked Bidirectional LSTM-CRFs를 이용한 한국어 의미역 결정)

Bae, Jangseong;Lee, Changki
- Journal of KIISE
- /
- v.44 no.1
- /
- pp.36-43
- /
- 2017
Syntactic information represents the dependency relation between predicates and arguments, and it is helpful for improving the performance of Semantic Role Labeling systems. However, syntax analysis can cause computational overhead and inherit incorrect syntactic information. To solve this problem, we exclude syntactic information and use only morpheme information to construct Semantic Role Labeling systems. In this study, we propose an end-to-end SRL system that only uses morpheme information with Stacked Bidirectional LSTM-CRFs model by extending the LSTM RNN that is suitable for sequence labeling problem. Our experimental results show that our proposed model has better performance, as compare to other models.
https://doi.org/10.5626/JOK.2017.44.1.36 인용 KSCI

Generalized LR Parser with Conditional Action Model(CAM) using Surface Phrasal Types (표층 구문 타입을 사용한 조건부 연산 모델의 일반화 LR 파서)

곽용재;박소영;황영숙;정후중;이상주;임해창
- Journal of KIISE:Software and Applications
- /
- v.30 no.1_2
- /
- pp.81-92
- /
- 2003
Generalized LR parsing is one of the enhanced LR parsing methods so that it overcome the limit of one-way linear stack of the traditional LR parser using graph-structured stack, and it has been playing an important role of a firm starting point to generate other variations for NL parsing equipped with various mechanisms. In this paper, we propose a conditional Action Model that can solve the problems of conventional probabilistic GLR methods. Previous probabilistic GLR parsers have used relatively limited contextual information for disambiguation due to the high complexity of internal GLR stack. Our proposed model uses Surface Phrasal Types representing the structural characteristics of the parse for its additional contextual information, so that more specified structural preferences can be reflected into the parser. Experimental results show that our GLR parser with the proposed Conditional Action Model outperforms the previous methods by about 6-7% without any lexical information, and our model can utilize the rich stack information for syntactic disambiguation of probabilistic LR parser.
PDF KSCI

A comparative study of Entity-Grid and LSA models on Korean sentence ordering (한국어 텍스트 문장정렬을 위한 개체격자 접근법과 LSA 기반 접근법의 활용연구)

Kim, Youngsam;Kim, Hong-Gee;Shin, Hyopil
- Korean Journal of Cognitive Science
- /
- v.24 no.4
- /
- pp.301-321
- /
- 2013
For the task of sentence ordering, this paper attempts to utilize the Entity-Grid model, a type of entity-based modeling approach, as well as Latent Semantic analysis, which is based on vector space modeling, The task is well known as one of the fundamental tools used to measure text coherence and to enhance text generation processes. For the implementation of the Entity-Grid model, we attempt to use the syntactic roles of the nouns in the Korean text for the ordering task, and measure its impact on the result, since its contribution has been discussed in previous research. Contrary to the case of German, it shows a positive result. In order to obtain the information on the syntactic roles, we use a strategy of using Korean case-markers for the nouns. As a result, it is revealed that the cues can be helpful to measure text coherence. In addition, we compare the results with the ones of the LSA-based model, discussing the advantages and disadvantages of the models, and options for future studies.
PDF

Search Result 100, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)