• Title/Summary/Keyword: lexical information

Search Result 324, Processing Time 0.022 seconds

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

A SDL Hardware Compiler for VLSI Logic Design Automation (VLSI의 논리설계 자동화를 위한 SDL 하드웨어 컴파일러)

  • Cho, Joung Hwee;Chong, Jong Wha
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.23 no.3
    • /
    • pp.327-339
    • /
    • 1986
  • In this paper, a hardware compiler for symbolic description language(SDL) is proposed for logic design automation. Lexical analysis is performed for SDL which describes the behavioral characteristics of a digital system at the register transfer level by the proposed algorithm I. The algorithm I is proposed to get the expressions for the control unit and for the data transfer unit. In order to obtain the network description language(NDL) expressions equivalent to gate-level logic circuits, another algorithm, the the algorithm II, is proposed. Syntax analysis for the data formed by the algorithm I is also Performed using circuit elements such as D Flip-Flop, 2-input AND, OR, and NOT gates. This SDL hardware compiler is implemented in the programming language C(VAX-11/750(UNIX)), and its efficiency is shown by experiments with logic design examples.

  • PDF

A Study on the Extension of Disaster Safety Information Service based on Linked Open Data (LOD기반의 재난안전 정보서비스 확장에 관한 연구)

  • Kim, Tae-Young;Gang, Ju-Yeon;Kim, Hye-Young;Kim, Yong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.3
    • /
    • pp.163-188
    • /
    • 2017
  • This study aims to propose disaster safety information service model based on LOD for effective management and dissemination of the information. To achieve the aim of this study, current state of disaster safety information was analyzed through online search and face-to-face interviews, and then the information was divided into 6 types. Finally, this study proposed specific process of building disaster safety information LOD service with considerations reflecting the information characteristics. The process for building LOD was based on Guidelines for Building Linked Data written by National Information Society Agency. Especially, ontology concept model was defined by using standard lexical resources and modeling tools based on 6 types of disaster safety information, and classes and properties were proposed. The results of this study will make disaster safety information more useful for common people.

Verification of the Usefulness of the Mock TOEIC Test using Corpus Indices : Focusing on the Analysis of Difficulty and Discrimination (코퍼스 지표를 활용한 모의 토익시험의 유용성 검증 : 난이도와 변별도 분석을 중심으로)

  • Lee, Yena
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.10
    • /
    • pp.576-593
    • /
    • 2021
  • In this study, in order to investigate the factors that affect the percentage of correct answers and the degree of discrimination of the TOEIC test, a regression analysis was performed using corpus indicators that influence correct answer rate and the degree of discrimination for each part derived from the item analysis. The basic calculation word_length, consistency index LSA_overlap_adjacent_sentences, lexical diversity MTLD_VOCD, conjunction All_logical_causal_connectives_incidence, situational model casual_particles_causal_verbs_Ratio, syntactic complexity Left_embeddedness, and syntactic pattern density Infinitive_density were found to have negative effects. These factors that lower the correct answer rate can be utilized when setting learning goals. Vocabulary diversity index MTLD_VOCD, conjunction Additive_connectives_incidence, syntactic pattern density Infinitive_density, and lexical information person1_2_pronoun_incidence were found to have a positive effect. Factors influencing the increase in discrimination may provide important information for developing a learning program.

An Example-Based Engligh Learing Environment for Writing

  • Miyoshi, Yasuo;Ochi, Youji;Okamoto, Ryo;Yano, Yoneo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.292-297
    • /
    • 2001
  • In writing learning as a second/foreign language, a learner has to acquire not only lexical and syntactical knowledge but also the skills to choose suitable words for content which s/he is interested in. A learning system should extrapolate learner\\`s intention and give example phrases that concern with the content in order to support this on the system. However, a learner cannot always represent a content of his/her desired phrase as inputs to the system. Therefore, the system should be equipped with a diagnosis function for learner\\`s intention. Additionally, a system also should be equipped with an analysis function to score similarity between learner\\`s intention and phrases which is stored in the system on both syntactic and idiomatic level in order to present appropriate example phrases to a learner. In this paper, we propose architecture of an interactive support method for English writing learning which is based an analogical search technique of sample phrases from corpora. Our system can show a candidate of variation/next phrases to write and an analogous sentence that a learner wants to represents from corpora.

  • PDF

Generalized Binary Second-order Recurrent Neural Networks Equivalent to Regular Grammars (정규문법과 동등한 일반화된 이진 이차 재귀 신경망)

  • Jung Soon-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.1
    • /
    • pp.107-123
    • /
    • 2006
  • We propose the Generalized Binary Second-order Recurrent Neural Networks(GBSRNNf) being equivalent to regular grammars and ?how the implementation of lexical analyzer recognizing the regular languages by using it. All the equivalent representations of regular grammars can be implemented in circuits by using GSBRNN, since it has binary-valued components and shows the structural relationship of a regular grammar. For a regular grammar with the number of symbols m, the number of terminals p, the number of nonterminals q, and the length of input string k, the size of the corresponding GBSRNN is $O(m(p+q)^2)$ and its parallel processing time is O(k) and its sequential processing time, $O(k(p+q)^2)$.

  • PDF

A Swearword Filter System for Online Game Chatting (온라인게임 채팅에서의 비속어 차단시스템)

  • Lee, Song-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1531-1536
    • /
    • 2011
  • We propose an automatic swearword filter system for online game chatting by using Support Vector Machines(SVM). We collected chatting sentences from online games and tagged them as normal sentences or swearword included sentences. We use n-gram syllables and lexical-part of speech (POS) tags of a word as features and select useful features by chi square statistics. Each selected feature is represented as binary weight and used in training SVM. SVM classifies each chatting sentence as swearword included one or not. In experiment, we acquired overall 90.4% of F1 accuracy.

Extraction of Protein-Protein Interactions based on Convolutional Neural Network (CNN) (Convolutional Neural Network (CNN) 기반의 단백질 간 상호 작용 추출)

  • Choi, Sung-Pil
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.194-198
    • /
    • 2017
  • In this paper, we propose a revised Deep Convolutional Neural Network (DCNN) model to extract Protein-Protein Interaction (PPIs) from the scientific literature. The proposed method has the merit of improving performance by applying various global features in addition to the simple lexical features used in conventional relation extraction approaches. In the experiments using AIMed, which is the most famous collection used for PPI extraction, the proposed model shows state-of-the art scores (78.0 F-score) revealing the best performance so far in this domain. Also, the paper shows that, without conducting feature engineering using complicated language processing, convolutional neural networks with embedding can achieve superior PPIE performance.

A Method of Function-word Recognition by Relative Frequency (상대빈도를 이용한 문법형태소의 인식 방법)

  • 강승식
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.11-16
    • /
    • 1999
  • It is expected that some Josa/Eomi's are frequently used and others are not in the Korean documents. In this paper. we confirm it through the experiment and show that such information is very useful for Korean language processing. In case of Josa. most frequent 9 Josa's occupied 70% of total Josa's and 20. 32. 69 Josa's occupied 90%. 95%. and 99% respectively. Similarly, most frequent 10 numbers of Eomi's occupied 70% of total Eomi's and 33. 54. 117 Eomi's occupied 90%. 95%. and 99% respectively. We propose a dictionary construction method for Josa/Eomi dictionary that is classified by the frequency information. Furthermore. Josa/Eomi frequency results are very useful for the identification of unregistered morphemes and the disambiguation of lexical ambiguities.

  • PDF

Detection of Porno Sites on the Web using Fuzzy Inference (퍼지추론을 적용한 웹 음란문서 검출)

  • 김병만;최상필;노순억;김종완
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.419-425
    • /
    • 2001
  • A method to detect lots of porno documents on the internet is presented in this parer. The proposed method applies fuzzy inference mechanism to the conventional information retrieval techniques. First, several example sites on porno arc provided by users and then candidate words representing for porno documents are extracted from theme documents. In this process, lexical analysis and stemming are performed. Then, several values such as tole term frequency(TF), the document frequency(DF), and the Heuristic Information(HI) Is computed for each candidate word. Finally, fuzzy inference is performed with the above three values to weight candidate words. The weights of candidate words arc used to determine whether a liven site is sexual or not. From experiments on small test collection, the proposed method was shown useful to detect the sexual sites automatically.

  • PDF