• 제목/요약/키워드: linguistic features

검색결과 181건 처리시간 0.025초

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권3호
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

행위공동체 내의 언어·사회·문화: 영어간판 속 텍스트의 언어적 특성과 사회·문화적 양상에 관한 인식의 고찰 (The Language·Society·Culture in a Community of Practice: The Linguistic Features and Students' Perspectives on English Signboards)

  • 이영화
    • 한국콘텐츠학회논문지
    • /
    • 제18권6호
    • /
    • pp.364-373
    • /
    • 2018
  • 본 연구의 목적은 국내 도시 지역을 중심으로 영어간판의 언어적 특성과 영어간판에 관한 대학생들의 인식을 통해 사회 문화적 양상을 살펴보는 것이다. 연구 방법으로 해당지역의 영어간판 촬영과 학생들로부터의 설문이 포함되었다. 분석 결과, 영어간판의 55.4%는 '영어'로만 표기되어 있고, 주로 주류 음료 및 의류업에 몰려 있었다. 텍스트 구조는 영어로만 표기된 것은 '2-3단어'(43%), 영어+국어 혼용의 경우 '4-5' 단어(25%)로 전체의 약 68%였다. 영어간판의 약 70%는 주류 음료(27%), 음식점(23%), 의류업(21%)에서 사용되고 있었으며, 이러한 간판들이 주변 환경과 조화를 이룬다는 의견은 42%에 불과하였다. 좋은 영어간판의 요건으로는 '시각성(27%)', '업종 표현(23%)', '세련 고급스러움(19%)', '디자인과 창의성(15%)'을 들 수 있으며 이를 충족하는 간판은 신촌 지역에 가장 많았다. 한편, 부정적인 영어간판은 의류업에 가장 많았다. 현재의 영어간판은 전반적으로 매우 미흡함을 보이고 있어 아름답고 조화로운 영어간판 문화 조성을 위한 정책적, 제도적 노력이 요구된다.

'과학동아' 지구과학 기사의 언어적 특성으로 본 과학 잡지의 과학 대중화 기제 (Science Popularizing Mechanism of a Science Magazine in terms of the Linguistic Features of Earth Science Articles in 'Science Donga')

  • 함석진;맹승호;김찬종
    • 한국지구과학회지
    • /
    • 제31권1호
    • /
    • pp.51-62
    • /
    • 2010
  • 과학 잡지가 과학자와 일반인 사이의 접점 역할을 수행하며 과학의 대중화에 기여할 수 있게 해 주는 기제를 과학 잡지 텍스트의 언어적 특성에서 찾기 위하여 과학 동아에 수록된 지구과학 관련 기사 12편을 선정하였다. 선정된 과학 잡지 텍스트의 언어적 특성을 규명하기 위하여 레지스터 분석을 실시하였다. 연구 결과, 1) 기자가 쓴 글은 과학자의 사유와 대화를 표현하는 인식적 술어와 발화적 술어의 비율이 높았다. 2) 기자의 글에는 문장의 주체가 사람으로 드러나는 비율이 높지만 과학자의 글에는 사람은 거의 드러나지 않으며 주체가 생략된 경우도 많았다. 3) 과학자의 글은 대부분 평서형 서법을 사용하지만, 기자가 쓴 글에서는 의문형과 청유형 서법, 생략 등의 비평서형 문장이 많았다. 4) 문장 속에 포함된 절의 밀도는 기자의 글이나 과학자의 글 모두 비슷하였다. 5) 정보의 구조 또한, 기자의 글은 단순한 정보의 발전 형태를 띠었고, 과학자의 글은 그에 비해 다소 복잡한 정보 구조를 나타내었다. 과학 잡지 텍스트의 언어적 특징을 통해 과학 잡지는 기자의 글을 통해 어려운 과학 내용을 쉽고 친숙한 텍스트로 제시함으로써 과학의 대중화에 기여할 뿐 아니라, 과학자가 쓴 순화된 과학 텍스트를 통해 일반인들을 과학의 언어라는 과학의 문화에 친숙하게 함으로써 과학의 대중화에 기여하게 됨을 밝혀 내었다.

인도의 언어이론과 파니니 (Linguistic Theory in India and Panini)

  • 김형엽
    • 인문언어
    • /
    • 제1권2호
    • /
    • pp.123-139
    • /
    • 2001
  • In the history of linguistics in the world the scholars in India could be regarded as the representative linguists, who had provided the cornerstone of the academic development at linguistics. Without looking into the contents of Indian linguistic theories devised and developed in the past it would be almost impossible to account for the origin of descriptive linguistics and historical linguistics. These linguistics trends became full-fledged in 19 and 20 century and are still accepted by a lot of researchers in order to analyze newly revealed languages and train students only coming up the toddling level of linguistic studies. In this paper I will show how far the influence of Indian linguistics has colored the flow of linguistic growth historically. Especially through the analysis of Panini grammar I will prove the intimate relationship between the Indian linguistic theory and the generative grammar - it is the most active theory at present. The methods that Panini applied to constitute the rules like sutra include lots of information, that also could be discovered at the rules postulated in the generative grammar. One of the common features found at both linguistic theories is the simplicity of rule representation. At the generative grammar a rule has to be established without any redundancy. When certain number of sounds like p, b, m show the same phonological. change relevant to lips (labial in linguistic term) different rules need not to be given for each sound separately. It is better to find a way of putting the sounds together in a rule with grouping the 3 sounds with the shared phonetic feature 'labial'. In Panini grammar the form of a rule was decided based on the simplicity, too. For example, sutra 6.1.77 shows the phonological connection between the vowels i, u r 1 and the semi-vowels y, v, r, 1. However, it does not require to postulate 4 individual rules respectively. Instead a rule in which the vowels and the semi-vowels are involved is suggested, and linguistically the rule make it clear that the more simpler the rules will be the better they can reflect the efficiency of human language acquisition. Although the systems introduced at Panini grammar have some sense of distance from the language education itself we cannot deny the fact that the grammar formulates the a turning point of linguistic development. It is essential for us to think over the grammar from the view point of the modem linguistic theories to understand their root and trunk more thoroughly. It will also help us to predict in which way linguistic tendency will proceed to in future.

  • PDF

A Study on the Emotional Evaluation of fabric Color Patterns

  • Koo, Hyun-Jin;Kang, Bok-Choon;Um, Jin-Sup;Lee, Joon-Whan
    • 감성과학
    • /
    • 제5권3호
    • /
    • pp.11-20
    • /
    • 2002
  • There are Two new models developed for objective evaluation of fabric color patterns by applying a multiple regression analysis and an adaptive foray-rule-based system. The physical features of fabric color patterns are extracted through digital image processing and the emotional features are collected based on the psychological experiments of Soen[3, 4]. The principle physical features are hue, saturation, intensity and the texture of color patterns. The emotional features arc represented thirteen pairs of adverse adjectives. The multiple regression analyses and the adaptive fuzzy system are used as a tool to analyze the relations between physical and emotional features. As a result, both of the proposed models show competent performance for the approximation and the similar linguistic interpretation to the Soen's psychological experiments.

  • PDF

"Say Hello to Vietnam!": A Multimodal Analysis of British Travel Blogs

  • Thuy T.H. Tran
    • 수완나부미
    • /
    • 제15권2호
    • /
    • pp.91-129
    • /
    • 2023
  • This paper reports the findings of a multimodal study conducted on 10 travel blog posts about Vietnam by seven British professional travel bloggers. The study takes a sociolinguistic view to tourism by seeing travel blogs as a source for linguistic and other semiotic materials while considering language as situated practice for the social construction of fundamental categories such as "human," "society," and "nation." It borrows concepts from Halliday's Systemic Functional Linguistics for interpersonal metafunction to develop an analytical framework to study how the co-occurrence of text and still images in these travel blog posts formulated the portrayal of Vietnam as a tourism destination and indicated the main sociolinguistic features of the blogs. The analysis of appreciation values and interactive qualities encoded in evaluative adjectives and still images show that Vietnam is generally portrayed as a country of identity and diversity. It provides tourists with positive experiences in terms of places of interest, food and local lifestyles and is cost-competitive. Strangerhood and authenticity are two outstanding sociolinguistic features exhibited in these travel blog posts. The findings of this study also underline the co-contribution of the linguistic sign, in this case evaluative adjectives, and the visual sign, in this case still images, as interpersonal meaning-making resources. To portray Vietnam, still images served as integral elements to evidence the credibility of verbal narrations. To unveil sociolinguistic characteristics of travel blogs, still images supported the linguistic realizations of authenticity and strangerhood on the posts, and in some case delivered an even stronger message than words. Not only does the study present a source of feedback from international travelers to tourism practice in Vietnam, but it also provides insights into multimodal analysis of tourism discourse which remains an under-researched area in Vietnam.

Differentiation of Aphasic Patients from the Normal Control Via a Computational Analysis of Korean Utterances

  • Kim, HyangHee;Choi, Ji-Myoung;Kim, Hansaem;Baek, Ginju;Kim, Bo Seon;Seo, Sang Kyu
    • International Journal of Contents
    • /
    • 제15권1호
    • /
    • pp.39-51
    • /
    • 2019
  • Spontaneous speech provides rich information defining the linguistic characteristics of individuals. As such, computational analysis of speech would enhance the efficiency involved in evaluating patients' speech. This study aims to provide a method to differentiate the persons with and without aphasia based on language usage. Ten aphasic patients and their counterpart normal controls participated, and they were all tasked to describe a set of given words. Their utterances were linguistically processed and compared to each other. Computational analyses from PCA (Principle Component Analysis) to machine learning were conducted to select the relevant linguistic features, and consequently to classify the two groups based on the features selected. It was found that functional words, not content words, were the main differentiator of the two groups. The most viable discriminators were demonstratives, function words, sentence final endings, and postpositions. The machine learning classification model was found to be quite accurate (90%), and to impressively be stable. This study is noteworthy as it is the first attempt that uses computational analysis to characterize the word usage patterns in Korean aphasic patients, thereby discriminating from the normal group.

Lexical Bundles in Computer Science Research Articles: A Corpus-Based Study

  • Lee, Je-Young;Lee, Hye Jin
    • International Journal of Contents
    • /
    • 제14권4호
    • /
    • pp.70-75
    • /
    • 2018
  • The purpose of this corpus-based study was to find 4-word lexical bundles in computer science research articles. As the demand for research articles (RAs) for international publication increases, the need for acquiring field-specific writing conventions for this academic genre has become a burning issue. Particularly, one area of burgeoning interest in the examination of rhetorical structures and linguistic features of RAs is the use of lexical bundles, the indispensable building blocks that make up an academic discourse. To illustrate, different academic discourses rely on distinctive repertoires of lexical bundles. Because lexical bundles are often acquired as a whole, the recurring multi-word sequences can be retrieved automatically to make written discourse more fluent and natural. Therefore, the proper use of rhetorical devices specific to a particular discipline can be a vital indicator of success within the discourse communities. Hence, to identify linguistic features that make up specific registers, this corpus-based study examines the types and usage frequency of lexical bundles in the discipline of CS, one of the most in-demand fields world over. Given that lexical bundles are empirically-derived formulaic multi-word units, identifying core lexical bundles used in RAs, they may provide insights into the specificity of particular CS text types. This will in turn provide empirical evidence of register specificity and technicality within the academic discourse of computer science. As in the results, pedagogical implications and suggestions for future research are discussed.

도시경관분석을 위한 경관형용사 목록 작성 (A Study on the Landscape Adjectives for Urban Landscape Analysis)

  • 주신하;임승빈
    • 한국조경학회지
    • /
    • 제31권1호
    • /
    • pp.1-10
    • /
    • 2003
  • The purpose of this study is to categorize a landscape adjective list for urban landscape analysis. For this purpose, four methods are used. The first method is to survey the foreign landscape adjective lists such as Feimer's EACL & LACL, VRM suggested vocabulary, and IEA and LI's aesthetic factors, which are commonly used in domestic research. The second method is to analyze vocabulary in Korean linguistic textbook the third is to investigate Korean adjective lists from 36 domestic research. The last is to survey adjectives used to express the urban landscapes. 24 landscapes from BunDdang, GwaCheon, YakSoo and ApGuJeong were presented to 40 subjects, whose responses were collected and categorized. The frequency analysis of the adjectives and landscape factors were processed by SJTOOL, which was programmed for Korean vocabulary analysis. The results of this study can be summarized as follows: Foreign adjective lists were mainly focused on the physical features of landscapes and they also had linguistic problems caused by the translations. Therefore, it is undesirable to use the foreign adjective list directly to analyze Korean urban landscapes. The vocabulary from the linguistic textbook has more variety, but it includes many adjectives irrelevant to the urban landscape. More types of adjectives were used in the researches(890 adjectives/295 types), compared with the result of response survey(1,406 adjectives/270 types). Because some adjectives were partly confusing, it is desirable to categorize the adjectives. The categorized adjectives could therefore be more useful and practical for urban landscape analysis.

코퍼스에 기반한 문학텍스트 분석 (Corpus-Based Literary Analysis)

  • 하명정
    • 한국콘텐츠학회논문지
    • /
    • 제13권9호
    • /
    • pp.440-447
    • /
    • 2013
  • 코퍼스 언어학이 연구방법의 한 분야로서 최근 그 입지를 급격하게 넓혀온 가운데, 언어학적 현상과 함께 문학텍스트의 이해를 깊게 하는데 기여를 해 왔다. 최근 코퍼스 언어학의 급속한 저변확대에도 불구하고 문학텍스트 코퍼스를 기반으로 한 고전 및 문학작품의 재해석에 대한 시도는 국내언어학계에서 매우 미미한 실정에 머물러 있다. 이에 본 연구는 코퍼스 언어학의 분석도구인 컴퓨터 콘코던스 프로그램인 워드스미스를 이용하여 방대한 전자텍스트로 이루어져 있는 문학작픔의 문체적 특성과 주요테마를 조사하고자 하였다. 특히 본 연구는 텍스트의 주요한 특성을 나타내는 키워드(keyword)에 초점을 두고 세익스피어의 비극작품인 로미오와 줄리엣을 코퍼스 언어학적 분석기법으로 접근하여 작품세계를 재조명하여 학문적 의의가 크다고 생각되며 앞으로 관련된 후속연구가 이어질 것으로 기대된다.