Search | Korea Science

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

Modi, Deepa;Nain, Neeta;Nehra, Maninder
- Journal of Multimedia Information System
- /
- v.5 no.3
- /
- pp.147-154
- /
- 2018
Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.
https://doi.org/10.9717/JMIS.2018.5.3.147 인용 PDF KSCI

A implementation and evaluation of Rule-Based Reverse-Engineering Tool (규칙기반 역공학 도구의 구현 및 평가)

Bae Jin Young
- Journal of the Korea Society of Computer and Information
- /
- v.9 no.3
- /
- pp.135-141
- /
- 2004
With the diversified and enlarged softwares, the issue of software maintenance became more complex and difficult and consequently, the cost of software maintenance took up the highest portion in the software life cycle. We design Reverse Engineering Tool for software restructuring environment to object-oriented system. We design Rule - Based Reverse - Engineering using Class Information. We allow the maintainer to use interactive query by using Prolog language. We use similarity formula, which is based on relationship between variables and functions, in class extraction and restructuring method in order to extract most appropriate class. The visibility of the extracted class can be identified automatically. Also, we allow the maintainer to use query by using logical language. So We can help the practical maintenance. Therefore, The purpose of this paper is to suggest reverse engineering tool and evaluation reverse engineering tool.
PDF

Some Issues on Causative Verbs in English

Cho, Sae-Youn
- Language and Information
- /
- v.13 no.1
- /
- pp.77-92
- /
- 2009
Geis (1973) has provided various properties of the subjects and by + Gerund Phrase (GerP) in English causative constructions. Among them, the two main issues of Geis's analysis are as follows: unlike Lakoff (1965; 1966), the subject of English causative constructions, including causative-inchoative verbs such as liquefy, first of all, should be acts or events, not persons, and the by + GerP in the construction is a complement of the causative verbs. In addition to these issues, Geis has provided various data exhibiting other idiosyncratic properties and proposed some transformational rules such as the Agent Creation Rule and rule orderings to explain them. Against Geis's claim, I propose that English causative verbs require either Proper nouns or GerP subjects and that the by + GerP in the constructions as a Verbal Modifier needs Gerunds, whose understood Affective-agent subject is identical to the subject of causative verbs with respect to the semantic index value. This enables us to solve the two main issues. At the same time, the other properties Geis mentioned also can be easily accounted for in Head-driven Phrase Structure Grammar (HPSG) by positing a few lexical constraints. On this basis, it is shown that given the few lexical constraints and existing grammatical tools in HPSG, the constraint-based analysis proposed here gives a simpler explanation of the properties of English causative constructions provided by Geis without transformational rules and rule orderings.
PDF

차세대 웹을 위한 SWRL 기반 역방향 추론엔진 SMART-B 의 개발

Song, Yong-Uk;Hong, Jun-Seok;Kim, U-Ju;Lee, Seong-Gyu;Yun, Suk-Hui
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2005.11a
- /
- pp.488-496
- /
- 2005
현재의 웹이 HTML을 바탕으로 인간 사용자와의 인터페이스에 초점을 맞추고 있는데 비하여, 차세대 웹은 XML 및 XML 기반 각종 표준들을 바탕으로 소프트웨어 에이전트와의 인터페이스에 초점을 맞추어 나가고 있다. 차세대 웹에서 소프트웨어 에이전트의 두뇌 역할을 수행하기 위하여 추론엔진은 차세대 웹의 표준 언어인 시맨틱 웹(Semantic Web)을 충실히 이해할 수 있어야 한다. 이를 위한 기초 작업의 일환으로 OWL(Web Ontology Language)과 RuleML(Rule Markup Language)이 W3C에 제안된 바 있다. 본 연구에서는 SWRL을 규칙 표현 방법으로 사용하고, OWL을 사실 표현 방법으로 사용하는 역방향 추론엔진인 SMART-B(SeMantic web Agent Reasoning Tools - Backward chaining inference engine)을 개발하고자 한다. 이를 위하여 SWRL 기반 역방향 추론을 위한 요구 기능을 분석하고, 기존 역방향 추론 알고리즘에 차세대 시맨틱 웹을 요구 기능을 반영한 역방향 추론 알고리즘을 설계하였다. 또한, 유비쿼터스 환경에서의 각종 플랫폼의 독립성과 이식성을 확보하고 기기 간의 성능 차이를 극복할 수 있도록 사실 베이스 및 규칙 베이스의 관리도구와 역방향 추론 엔진 등을 Java 프로그래밍 언어를 이용하여 단위 컴포넌트의 형태로 개발 중에 있다.
PDF

A Semantics of Sequence of Tense without a Sequence-of-tense Rule

Song, Mean-Young
- Language and Information
- /
- v.4 no.2
- /
- pp.93-105
- /
- 2000
I argue in this paper that the sequence of tense (SOT) phenomenon can be accounted for without positing a SOT rule, focusing on the contrast between the past under-past sentences which lead to ambiguity and those sentences which do not. The different interpreta- tion of past under past sentences depends on whether the stative or then non-stative predicates occur in the complement clauses in the propositional attitude verbs. Based on this, I also argue that the embedded past tense does not contribute to the seman- tics past tense in the complement clause. Instead, it is due to the occurrence of the stative or non-stative predicates in the complement clauses. The stative predicates are associated with the temporal precedence or the overlap relation, whereas the non-stative predicates the precedence relation only. This fact triggers the contrast in past- under- past sentences.(Korea University)
PDF

A Study on the Development of CAD System for VFD Element Tools (형광 표시관 부품의 금형 자동설계 시스템에 관한 연구)

박상봉
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 1997.04a
- /
- pp.724-728
- /
- 1997
A CAD system of grid element for vacuum fluorescent disply has been developed. In order to reduce design man-houre and human erros, it is used to automate the design process using a knowledge base system. In the case of VFD product design, the most important consideration is the short-life cycle. So the development of CAD system for VFD product is needed. The developed system is based on the knowledge base system which is involved in a lot of expert's technology in the practice field. Using C-language under the HP-UNIX system, CIS customer language of the EXCESS CAD/CAM is used as the overall CAD environment. Results of this system will provide effective aids to the designer in this field
PDF

An Expert System of the Very Thin Sheet Metal Press Die Automated Design for VFD Grid (진공형광소자 전극의 극박판 프레스 금형 자동설계 전문가 시스템)

박상봉
- Journal of the Korean Society for Precision Engineering
- /
- v.15 no.5
- /
- pp.50-58
- /
- 1998
A proper model of expert system for the very thin sheet metal press die design has been suggested. Using the suggested model, an expert system of the very thin sheet metal press die has been developed. This study contains that the results from the developed system for three kinds of specimens have the adaptability in the actual site. In addition, the possibility for expansion of this system has been discussed. The developed system, which is based on the knowledge base, has been included in a lot of expert's technology in the practice field. C-language under the HP-UNIX system and CIS customer language of the EXCESS CAD/CAM system have been used as the overall CAD environment. Results from this system will provide effective aids to the designer in this field.
PDF

Transition and Parsing State and Incrementality in Dynamic Syntax

Kobayashi, Masahiro;Yoshimoto, Kei
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.249-258
- /
- 2007
This paper presents an implementation of a gramar of Dynamic Syntax for Japanese. Dynamic Syntax is a grammar formalism which enables a parser to process a sentence in an incremental fashion, establishing the semantic representation. Currently the application of lexical rules and transition rules in Dynamic Syntax is carried out arbitrarily and this leads to inefficient parsing. This paper provides an algorithm of rule application and partitioned parsing state for efficient parsing with special reference to processing Japanese, which is one of head-final languages. At the present stage the parser is still small but can parse scrambled sentences, relative clause constructions, and embedded clauses. The parser is written in Prolog and this paper shows that the parser can process null arguments in a complex sentence in Japanese.
PDF

Knowledge-Based Approach for Computer-Aided Simulation Modeling (컴퓨터에 의해 수행되어지는 시뮬레이션 모델링을 위한 지식베이스 접근방법)

Lee, Young-Hae;Kim, Nam-Young
- IE interfaces
- /
- v.2 no.2
- /
- pp.51-62
- /
- 1989
A computer-aided simulation modeling system has been developed to allow the automatic construction of complete discrete simulation models for queueing systems. Three types of knowledge are used in the specification and construction of a simulation modeling: Knowledge of queueing system, simulation modeling, and a target simulation language. This knowledge has been incorporated into the underlying rule base in the form of extraction and construction rule, and implemented via the expert system building tool, OPS5. This paper suggested a knowledge based approach for automatic programming to enable a user who lacks modeling knowledge and simulation language expertize to quickly build executable models.
PDF

Internet Governance & Politics of Expertise (인터넷 거버넌스와 전문성의 정치)

Kim, Ji-Yeon
- Review of Korean Society for Internet Information
- /
- v.14 no.3
- /
- pp.5-20
- /
- 2013
ICANN has been governing the Domain Name System(DNS) "technically" since 1998. The architecture is called Internet Governance, and it brings about many different discourses; "What does that govern?", "Who delegate its role to ICANN?"," How could the regime ensure fairness?" etc. This article will analyze on Internet Governance by applying the government approach of Foucault, and try to compare two parts, the 'core' and the 'edge' of Internet Governance for method. Whereas the 'core' of it refers the site that be governed by the formal contract directly, the 'edge' as the rest of it means informal friendly relations with ICANN. The 'core' rule was stemmed from technological community such as IAB or IETF historically. They had invented new world and its population to integrate the technical order as protocol and the semiotic order as language, that be based on new government mode. On the other hand, ".KR" domain, one of the 'edges', has been evolved into more heterogeneous system, through contest and conflict between traditional state and Internet Governance. The governed object of ".KR" domain is situated in the crossing of each other the 'protocol user', the 'language-semiotic user' and the' geographical resident'. Here the 'geographical resident' rule was weird for DNS, so that shows the internal lack of Internet Governance. It needs to move to the concept of 'Hangeul(Korean-language) user' rather than the 'geographical resident'.
PDF

Search Result 399, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)