• Title/Summary/Keyword: Syntactic Analysis

Search Result 261, Processing Time 0.032 seconds

An Effective Method for Comparing Control Flow Graphs through Edge Extension (에지 확장을 통한 제어 흐름 그래프의 효과적인 비교 방법)

  • Lim, Hyun-Il
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.8
    • /
    • pp.317-326
    • /
    • 2013
  • In this paper, we present an effective method for comparing control flow graphs which represent static structures of binary programs. To compare control flow graphs, we measure similarities by comparing instructions and syntactic information contained in basic blocks. In addition, we also consider similarities of edges, which represent control flows between basic blocks, by edge extension. Based on the comparison results of basic blocks and edges, we match most similar basic blocks in two control flow graphs, and then calculate the similarity between control flow graphs. We evaluate the proposed edge extension method in real world Java programs with respect to structural similarities of their control flow graphs. To compare the performance of the proposed method, we also performed experiments with a previous structural comparison for control flow graphs. From the experimental results, the proposed method is evaluated to have enough distinction ability between control flow graphs which have different structural characteristics. Although the method takes more time than previous method, it is evaluated to be more resilient than previous method in comparing control flow graphs which have similar structural characteristics. Control flow graph can be effectively used in program analysis and understanding, and the proposed method is expected to be applied to various areas, such as code optimization, detection of similar code, and detection of code plagiarism.

Comparative Analysis and Implications of Command and Control(C2)-related Information Exchange Models (지휘통제 관련 정보교환모델 비교분석 및 시사점)

  • Kim, Kunyoung;Park, Gyudong;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.59-69
    • /
    • 2022
  • For effective battlefield situation awareness and command resolution, information exchange without seams between systems is essential. However, since each system was developed independently for its own purposes, it is necessary to ensure interoperability between systems in order to effectively exchange information. In the case of our military, semantic interoperability is guaranteed by utilizing the common message format for data exchange. However, simply standardizing the data exchange format cannot sufficiently guarantee interoperability between systems. Currently, the U.S. and NATO are developing and utilizing information exchange models to achieve semantic interoperability further than guaranteeing a data exchange format. The information exchange models are the common vocabulary or reference model,which are used to ensure the exchange of information between systems at the content-meaning level. The information exchange models developed and utilized in the United States initially focused on exchanging information directly related to the battlefield situation, but it has developed into the universal form that can be used by whole government departments and related organizations. On the other hand, NATO focused on strictly expressing the concepts necessary to carry out joint military operations among the countries, and the scope of the models was also limited to the concepts related to command and control. In this paper, the background, purpose, and characteristics of the information exchange models developed and used in the United States and NATO were identified, and comparative analysis was performed. Through this, we intend to present implications when developing a Korean information exchange model in the future.

Analysis of the Sixth Graders' Strategies and Errors of Division-With-Remainder Problems (나머지가 있는 나눗셈 문장제에 대한 초등학교 6학년 학생들의 해결 전략 및 오류 분석)

  • Ha, Mihyun;Chang, Hyewon
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.20 no.4
    • /
    • pp.717-735
    • /
    • 2016
  • For teaching division-with-remainder(DWR) problems, it is necessary to know students' strategies and errors about DWR problems. The purpose of this study is to investigate and analyze students' strategies and errors of DWR problems and to make some meaningful suggestions for teaching various methods of solving DWR problems. We constructed a test which consists of fifteen DWR problems to investigate students' solving strategies and errors. These problems include mathematical as well as syntactic structures. To apply this test, we selected 177 students from eight elementary schools in various districts of Seoul. The results were analyzed both qualitatively and quantitatively. The sixth graders' strategies can be classified as follows : Single strategies, Multi strategies and Assistant strategies. They used Division(D) strategy, Multiplication(M) strategy, and Additive Approach(A) strategy as sub-strategies. We noticed that frequently used strategies do not coincide with strategies for their success. While students in middle group used Assistant strategies frequently, students in higher group used Single strategies frequently. The sixth graders' errors can be classified as follows : Formula error(F error), Calculation error(C error), Calculation Product error(P error) and Interpretation error(I error). In this study, there were 4 elements for syntaxes in problems : large number, location of divisor and dividend, divisor size, vocabularies. When students in lower group were solving the problems, F errors appeared most frequently. However, in case of higher group, I errors appeared most frequently. Based on these results, we made some didactical suggestions.

PPEditor: Semi-Automatic Annotation Tool for Korean Dependency Structure (PPEditor: 한국어 의존구조 부착을 위한 반자동 말뭉치 구축 도구)

  • Kim Jae-Hoon;Park Eun-Jin
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.63-70
    • /
    • 2006
  • In general, a corpus contains lots of linguistic information and is widely used in the field of natural language processing and computational linguistics. The creation of such the corpus, however, is an expensive, labor-intensive and time-consuming work. To alleviate this problem, annotation tools to build corpora with much linguistic information is indispensable. In this paper, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. The most ideal way is to fully automatically create the corpus without annotators' interventions, but as a matter of fact, it is impossible. The proposed tool is semi-automatic like most other annotation tools and is designed to edit errors, which are generated by basic analyzers like part-of-speech tagger and (partial) parser. We also design it to avoid repetitive works while editing the errors and to use it easily and friendly. Using the proposed annotation tool, 10,000 Korean sentences containing over 20 words are annotated with dependency structures. For 2 months, eight annotators have worked every 4 hours a day. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.

Anaphora Resolution System for Natural Language Requirements Document in Korean based on Syntactic Structure (한국어 자연어 요구문서에서 구문 구조 기반의 조응어 처리 시스템)

  • Park, Ki-Seon;An, Dong-Un;Lee, Yong-Seok
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.255-262
    • /
    • 2010
  • When a system is developed, requirements document is generated by requirement analysts and then translated to formal specifications by specifiers. If a formal specification can be generated automatically from a natural language requirements document, system development cost and system fault from experts' misunderstanding will be decreased. A pronoun can be classified in personal and demonstrative pronoun. In the characteristics of requirements document, the personal pronouns are almost not occurred, so we focused on the decision of antecedent for a demonstrative pronoun. For the higher accuracy in analysis of requirements document automatically, finding antecedent of demonstrative pronoun is very important for elicitation of formal requirements automatically from natural language requirements document via natural language processing. The final goal of this research is to automatically generate formal specifications from natural language requirements document. For this, this paper, based on previous research [3], proposes an anaphora resolution system to decide antecedent of pronoun using natural language processing from natural language requirements document in Korean. This paper proposes heuristic rules for the system implementation. By experiments, we got 92.45%, 69.98% as recall and precision respectively with ten requirements documents.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

Analysis of the Continuity of Reading Passages in the 5th and 6th Grade Elementary School English Textbooks Based on Readability (이독성을 통한 초등학교 5, 6학년 영어 교과서 읽기 지문의 연계성 분석)

  • Jang, Hankyeol;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.6
    • /
    • pp.116-124
    • /
    • 2022
  • The purpose of this study is to examine the vertical and horizontal continuity between grades and publishers, respectively, by analyzing the readability of reading passages included in English textbooks for 5th and 6th grades of elementary school. In order to do so, a corpus was constructed with the reading passages contained in 10 textbooks, and the reading passages in each textbook were analyzed through Coh-Metrix. Also, it was examined whether there was a statistically significant difference between grades and publishers in readability through one-way ANOVA. The results are as follows. First, as a result of analyzing the difference in readability between publishers within the same grade, there was a statistically significant difference between fifth-grade textbooks in the L2 readability index. Second, as a result of analyzing the vertical continuity between grades within the publisher, the difficulty of textbook A was higher in grade 6 than grade 5 based on FRE and FKGL, which showed a statistically significant difference. On the other hand, when L2 readability was used as the standard, the difficulty of textbook B was lower in 6th grade than in 5th grade. This result seems to be because FRE and FKGL calculate readability based on sentence and word length, whereas L2 readability is based on content word overlap, word frequency, and syntactic similarity of sentences.

Characteristics and Meanings of the SF Genre in Korea - From Propaganda of Modernization to Post-Human Discourse (한국 SF의 장르적 특징과 의의 -근대화에 대한 프로파간다부터 포스트휴먼 담론까지)

  • Lee, Ji-Yong
    • Journal of Popular Narrative
    • /
    • v.25 no.2
    • /
    • pp.33-69
    • /
    • 2019
  • This thesis aims to reveal the meanings of SF as a genre in Korea. Most of the studies on the characteristics of SF novels in Korea have revealed the meanings of characteristic elements of SF, or peripherally reviewed the characteristics of works. However, these methodologies have a limitation, such as analysis through the existing methodologies, while overlooking the identity of SF texts with the characteristics as a genre. To clearly define the value of texts in the SF genre, an understanding of the customs and codes of the genre is first needed. Thus, this thesis aims to generally handle matters like the historical context in which Korean SF was accepted by Korean society, and the meanings and characteristics when they were created and built up relationships with readers. In addition to fully investigating SF as a popular narrative & genre narrative that has not been fully handled by academic discourses, this thesis aims to practically reconsider the present/future possibilities of SF, which is currently being reconsidered given that the scientific imagination is regarded as important in the 21st century. This thesis considers the basic signification of Korean SF texts in academic discourses. Through this work, numerous Korean SF that have not been fully handled in the area of literature and cultural phenomena will be evaluated for their significance within the academic discourses, and also reviewed through diverse research afterwards. As a result, this work will be helpful for the development of discourse and the expansion of the Korean narrative area that has been diversely changed since the 21st century.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.