• Title/Summary/Keyword: Compound words

Search Result 97, Processing Time 0.027 seconds

The Philosophy and Linguistics of Dao : the Ancient Chinese Philosophy and Language (도의 철학과 도의 언어학 -고대 중국의 철학과 언어-)

  • 정재현
    • Lingua Humanitatis
    • /
    • v.5
    • /
    • pp.109-126
    • /
    • 2003
  • The aim of this paper is to elucidate ancient Chinese philosophy and linguistics through the concept of the Dao. Ancient Chinese thought had developed together with ancient Chinese theories of language and the linguistic features of Classical Chinese. The concept of the Dao served as an intermediary among them. The Dao which ancient Chinese philosophers sought for has several characteristics: ethical normativity, wholeness, dynamicity, non-reducibility. Linguistic studies also revealed them. The following linguistic features of Classical Chinese are the cause and/or the effect of such Dao-based philosophy and linguistics: No explicit subject-predicate sentential structure, no parts of speech, heavy reliance on the word order and context for meaning determination, no explicit distinction between compound words and a sentence, the pictographic or the ideographic features of Chinese graphs, and non-existence of a copula.

  • PDF

Biodegradation of and comparison of adaptability to dectergents (미생물에 의한 계면활성제의 분해능과 적응력의 비교)

  • 이혜주;홍순우
    • Korean Journal of Microbiology
    • /
    • v.18 no.4
    • /
    • pp.155-160
    • /
    • 1980
  • Microorgansims utilizing anionic detergent as their carbon and sulfur sources were isolated from soils and sewages. Alkyl benzene sulfonate (Hiti) and sodium dodecyl sulfonate (SDS) were the detergent compound tested. Three of these isolated microorganisms were identified as Pseudomonas spp. and the others asKlbsiella, Enterobacter and Acinetobacter. Biodegradation rate of the detergents and growth rate of Acinetobacter Strain II-8, Pseudomonas strain H-3-1 and 554 among six isolated microorganisms were investigated with colorimetric, warburg manometric, and ultraviolet absorption analyses. By performance of 4 serial successive tranfer to new culture broth for the purpose of adaptation method, ABS and SDS could be degraded to far more than 40%-60% and 70%-75%, respectively. However the employment of nonadaptation method, ABS and SDS were degraded to 30%-45% and 45%-65%, respectively. In another words, detergents degradation ability was increased to a certain extent by successive transfer to the new minimal media. We would conclude that the development of adaptation was effective in the removal of recalcitrant compounds.

  • PDF

A Method for Extracting Equipment Specifications from Plant Documents and Cross-Validation Approach with Similar Equipment Specifications (플랜트 설비 문서로부터 설비사양 추출 및 유사설비 사양 교차 검증 접근법)

  • Jae Hyun Lee;Seungeon Choi;Hyo Won Suh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.2
    • /
    • pp.55-68
    • /
    • 2024
  • Plant engineering companies create or refer to requirements documents for each related field, such as plant process/equipment/piping/instrumentation, in different engineering departments. The process-related requirements document includes not only a description of the process but also the requirements of the equipment or related facilities that will operate it. Since the authors and reviewers of the requirements documents are different, there is a possibility that inconsistencies may occur between equipment or parts design specifications described in different requirement documents. Ensuring consistency in these matters can increase the reliability of the overall plant design information. However, the amount of documents and the scattered nature of requirements for a same equipment and parts across different documents make it challenging for engineers to trace and manage requirements. This paper proposes a method to analyze requirement sentences and calculate the similarity of requirement sentences in order to identify semantically identical sentences. To calculate the similarity of requirement sentences, we propose a named entity recognition method to identify compound words for the parts and properties that are semantically central to the requirements. A method to calculate the similarity of the identified compound words for parts and properties is also proposed. The proposed method is explained using sentences in practical documents, and experimental results are described.

XML Document Analysis based on Similarity (유사성 기반 XML 문서 분석 기법)

  • Lee, Jung-Won;Lee, Ki-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.367-376
    • /
    • 2002
  • XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval and document management. In this paper, we propose a new methodology for computing similarity considering XML semantics - meanings of the elements and nested structures of XML documents. We generate extended-element vectors, using thesaurus, to normalize synonyms, compound words, and abbreviations and build similarity matrix using them. And then we compute similarity between XML elements. We also discover and minimize XML structure using automata(NFA(Nondeterministic Finite Automata) and DFA(Deterministic Finite automata). We compute similarity between XML structures using similarity matrix between elements and minimized XML structures. Our methodology considering XML semantics shows 100% accuracy in identifying the category of real documents from on-line bookstore.

A Harmful Site Judgement Technique based on Text (문자 기반 유해사이트 판별 기법)

  • Jung, Kyu-Cheol;Lee, Jin-Kwan;Lee, Taehun;Park, Kihong
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.5
    • /
    • pp.83-91
    • /
    • 2004
  • Through this research, it was possible to set up classification system between 'Harmful information site' and 'General site' that badly effect to teenagers emotional health. To intercept those entire harmful information sites, it using contents basis isolating. Instead of using existing methods, it picks most frequent using composed key words and adds all those harmful words' harmfulness degree point by using 'ICEC(Information Communication Ethics Committee)' suggested harmful word classification. To testify harmful information blocking system, to classify the harmful information site, set standard harmfulness degree point as 3.5 by the result of a fore study, after that pick up a hundred of each 'Harmful information site' and 'General site' randomly to classify them through new classification system. By this classification could found this new classification system classified 78% of 'Harmful Site' to "Harmful information site' and 96% of 'General Site' to 'General site'. As a result, successfully confirm validity of this new classification system.

  • PDF

Related Term Extraction with Proximity Matrix for Query Related Issue Detection using Twitter (트위터를 이용한 질의어 관련 이슈 탐지를 위한 인접도 행렬 기반 연관 어휘 추출)

  • Kim, Je-Sang;Jo, Hyo-Geun;Kim, Dong-Sung;Kim, Byeong Man;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.1
    • /
    • pp.31-36
    • /
    • 2014
  • Social network services(SNS) including Twitter and Facebook are good resources to extract various issues like public interest, trend and topic. This paper proposes a method to extract query-related issues by calculating relatedness between terms in Twitter. As a term that frequently appears near query terms should be semantically related to a query, we calculate term relatedness in retrieved documents by summing proximity that is proportional to term frequency and inversely proportional to distance between words. Then terms, relatedness of which is bigger than threshold, are extracted as query-related issues, and our system shows those issues with a connected network. By analyzing single transitions in a connected network, compound words are easily obtained.

Using Multimedia to Improve Listening Comprehension in the EFL Classroom

  • Park, Seung-Won
    • English Language & Literature Teaching
    • /
    • v.8 no.2
    • /
    • pp.105-115
    • /
    • 2003
  • The four skills of a language are basically required for a communication. They are very important for a learner to develop the balanced language acquisition. Today both listening and speaking skills are emphasized in the global era rather than reading and writing proficiencies. The reason is really why the learners' communicative competence is more needed than the accurate knowledge of a structure in the language. For this reason, the listening comprehension should be taught effectively using the following strategies. First, the sound difference of a language must be taught. Language is a complicated process to convey the comprehensive meaning combined with the internal and external factors of a language. In other words, the meaning for the sound of language should be transmitted by the unit of vocabulary and syntax. Second, a good listening comprehension requires the familiarity and much experience with a lot of English words to understand English sentences unconsciously. Third, as understanding the structure of language is effective for the listening comprehension, the better listening comprehension can be possible through the meaningful exercise. Fourth, the compound process of listening comprehension requires the comprehensive understanding of language, but not the separate understanding of language. Fifth, the appropriate application of the multimedia courseware helps improve the listening comprehension better than that of the existing audio, video, tape recorder and so on. Using multimedia courseware is useful as follows: A learner is able to take as much lesson as he/she wants. It does take little time to repeat about what he/she takes a lesson. It gives the lively picture with the native speakers' voices. It gives him/her(a learner) a feedback effect continuously through the interaction of computer. It controls his/her lesson in accordance with the level of a learner.

  • PDF

Korean Head-Tail Tokenization and Part-of-Speech Tagging by using Deep Learning (딥러닝을 이용한 한국어 Head-Tail 토큰화 기법과 품사 태깅)

  • Kim, Jungmin;Kang, Seungshik;Kim, Hyeokman
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.199-208
    • /
    • 2022
  • Korean is an agglutinative language, and one or more morphemes are combined to form a single word. Part-of-speech tagging method separates each morpheme from a word and attaches a part-of-speech tag. In this study, we propose a new Korean part-of-speech tagging method based on the Head-Tail tokenization technique that divides a word into a lexical morpheme part and a grammatical morpheme part without decomposing compound words. In this method, the Head-Tail is divided by the syllable boundary without restoring irregular deformation or abbreviated syllables. Korean part-of-speech tagger was implemented using the Head-Tail tokenization and deep learning technique. In order to solve the problem that a large number of complex tags are generated due to the segmented tags and the tagging accuracy is low, we reduced the number of tags to a complex tag composed of large classification tags, and as a result, we improved the tagging accuracy. The performance of the Head-Tail part-of-speech tagger was experimented by using BERT, syllable bigram, and subword bigram embedding, and both syllable bigram and subword bigram embedding showed improvement in performance compared to general BERT. Part-of-speech tagging was performed by integrating the Head-Tail tokenization model and the simplified part-of-speech tagging model, achieving 98.99% word unit accuracy and 99.08% token unit accuracy. As a result of the experiment, it was found that the performance of part-of-speech tagging improved when the maximum token length was limited to twice the number of words.

Korean Word Segmentation and Compound-noun Decomposition Using Markov Chain and Syllable N-gram (마코프 체인 밀 음절 N-그램을 이용한 한국어 띄어쓰기 및 복합명사 분리)

  • 권오욱
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.274-284
    • /
    • 2002
  • Word segmentation errors occurring in text preprocessing often insert incorrect words into recognition vocabulary and cause poor language models for Korean large vocabulary continuous speech recognition. We propose an automatic word segmentation algorithm using Markov chains and syllable-based n-gram language models in order to correct word segmentation error in teat corpora. We assume that a sentence is generated from a Markov chain. Spaces and non-space characters are generated on self-transitions and other transitions of the Markov chain, respectively Then word segmentation of the sentence is obtained by finding the maximum likelihood path using syllable n-gram scores. In experimental results, the algorithm showed 91.58% word accuracy and 96.69% syllable accuracy for word segmentation of 254 sentence newspaper columns without any spaces. The algorithm improved the word accuracy from 91.00% to 96.27% for word segmentation correction at line breaks and yielded the decomposition accuracy of 96.22% for compound-noun decomposition.

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.