• Title/Summary/Keyword: Linguistic Features

Search Result 178, Processing Time 0.026 seconds

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

The Language·Society·Culture in a Community of Practice: The Linguistic Features and Students' Perspectives on English Signboards (행위공동체 내의 언어·사회·문화: 영어간판 속 텍스트의 언어적 특성과 사회·문화적 양상에 관한 인식의 고찰)

  • Lee, Younghwa
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.364-373
    • /
    • 2018
  • This study aims to investigate the linguistic features of English signboards (ES) and socio-cultural aspects in Korea through university students' perspectives on the ES. The data comprised pictures and questionnaire on the ES from the students. The findings showed that ES reached to 55.4% for mainly the business of drink beverage and clothes. The text written by 'only English' included 2-3 words (43%), and that of 'combination of English and Korean' had 4-5 words (25%), which reached to 68% of the all. The 70% of ES were used for the business of drink beverage, food, and clothes, but these were not in harmony with the neighborhood, showing 42% of agreement. Good ES required 'visual factors (27%)', 'expression of business (23%)', 'elegant and luxurious style (19%)', and 'design and creativity (15%)', and these ES were the most in Shinchon areas. Overall, the present ES culture was insufficient to make harmonious atmosphere in Korea, which requires the support of policies and systems.

Science Popularizing Mechanism of a Science Magazine in terms of the Linguistic Features of Earth Science Articles in 'Science Donga' ('과학동아' 지구과학 기사의 언어적 특성으로 본 과학 잡지의 과학 대중화 기제)

  • Ham, Seok-Jin;Maeng, Seung-Ho;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.1
    • /
    • pp.51-62
    • /
    • 2010
  • The purpose of this study was to investigate how a science magazine played a role in filling the gap between scientists and the general public, and how it contributed to science popularization. We analyzed the linguistic features of the texts used in a science magazine. We used 12 articles (six written by journalists, and six written by scientists) from the Science Donga. Register analysis was conducted in order to define the linguistic features of the texts in terms of ideational meaning, interpersonal meaning and, textual meaning. Results of this study are as follows: (1) the articles written by journalists used a higher mental and verbal processes in which the conversations and thoughts of scientists were expressed. (2) Human agents were relatively explicit in the journalists' articles. However, they were implicit or omitted in the articles of scientists. (3) Interrogative sentences and inclusive imperative sentences, and even omissions were frequently found in the journalists' articles whereas scientists' articles mainly used declarative statements. (4) The clause density of journalist' articles and scientists' were similarly lower than that of science textbooks. (5) The information structure revealed by the patterns of Theme and Rheme that the journalists' articles used in science magazines was simpler than that of science textbooks, while the structure of scientists' articles was more complex than that of journalists'. Based on the linguistic features of the texts used in science magazines, we found that a science magazine contributes to science popularization in two faces: One is in that the articles of journalists present science contents in a way that the readers can follow with ease and feel well-acquainted. The other is that the modified articles of scientists help the general public get familiar with the culuture of science in terms of use of science language.

Linguistic Theory in India and Panini (인도의 언어이론과 파니니)

  • 김형엽
    • Lingua Humanitatis
    • /
    • v.1 no.2
    • /
    • pp.123-139
    • /
    • 2001
  • In the history of linguistics in the world the scholars in India could be regarded as the representative linguists, who had provided the cornerstone of the academic development at linguistics. Without looking into the contents of Indian linguistic theories devised and developed in the past it would be almost impossible to account for the origin of descriptive linguistics and historical linguistics. These linguistics trends became full-fledged in 19 and 20 century and are still accepted by a lot of researchers in order to analyze newly revealed languages and train students only coming up the toddling level of linguistic studies. In this paper I will show how far the influence of Indian linguistics has colored the flow of linguistic growth historically. Especially through the analysis of Panini grammar I will prove the intimate relationship between the Indian linguistic theory and the generative grammar - it is the most active theory at present. The methods that Panini applied to constitute the rules like sutra include lots of information, that also could be discovered at the rules postulated in the generative grammar. One of the common features found at both linguistic theories is the simplicity of rule representation. At the generative grammar a rule has to be established without any redundancy. When certain number of sounds like p, b, m show the same phonological. change relevant to lips (labial in linguistic term) different rules need not to be given for each sound separately. It is better to find a way of putting the sounds together in a rule with grouping the 3 sounds with the shared phonetic feature 'labial'. In Panini grammar the form of a rule was decided based on the simplicity, too. For example, sutra 6.1.77 shows the phonological connection between the vowels i, u r 1 and the semi-vowels y, v, r, 1. However, it does not require to postulate 4 individual rules respectively. Instead a rule in which the vowels and the semi-vowels are involved is suggested, and linguistically the rule make it clear that the more simpler the rules will be the better they can reflect the efficiency of human language acquisition. Although the systems introduced at Panini grammar have some sense of distance from the language education itself we cannot deny the fact that the grammar formulates the a turning point of linguistic development. It is essential for us to think over the grammar from the view point of the modem linguistic theories to understand their root and trunk more thoroughly. It will also help us to predict in which way linguistic tendency will proceed to in future.

  • PDF

A Study on the Emotional Evaluation of fabric Color Patterns

  • Koo, Hyun-Jin;Kang, Bok-Choon;Um, Jin-Sup;Lee, Joon-Whan
    • Science of Emotion and Sensibility
    • /
    • v.5 no.3
    • /
    • pp.11-20
    • /
    • 2002
  • There are Two new models developed for objective evaluation of fabric color patterns by applying a multiple regression analysis and an adaptive foray-rule-based system. The physical features of fabric color patterns are extracted through digital image processing and the emotional features are collected based on the psychological experiments of Soen[3, 4]. The principle physical features are hue, saturation, intensity and the texture of color patterns. The emotional features arc represented thirteen pairs of adverse adjectives. The multiple regression analyses and the adaptive fuzzy system are used as a tool to analyze the relations between physical and emotional features. As a result, both of the proposed models show competent performance for the approximation and the similar linguistic interpretation to the Soen's psychological experiments.

  • PDF

"Say Hello to Vietnam!": A Multimodal Analysis of British Travel Blogs

  • Thuy T.H. Tran
    • SUVANNABHUMI
    • /
    • v.15 no.2
    • /
    • pp.91-129
    • /
    • 2023
  • This paper reports the findings of a multimodal study conducted on 10 travel blog posts about Vietnam by seven British professional travel bloggers. The study takes a sociolinguistic view to tourism by seeing travel blogs as a source for linguistic and other semiotic materials while considering language as situated practice for the social construction of fundamental categories such as "human," "society," and "nation." It borrows concepts from Halliday's Systemic Functional Linguistics for interpersonal metafunction to develop an analytical framework to study how the co-occurrence of text and still images in these travel blog posts formulated the portrayal of Vietnam as a tourism destination and indicated the main sociolinguistic features of the blogs. The analysis of appreciation values and interactive qualities encoded in evaluative adjectives and still images show that Vietnam is generally portrayed as a country of identity and diversity. It provides tourists with positive experiences in terms of places of interest, food and local lifestyles and is cost-competitive. Strangerhood and authenticity are two outstanding sociolinguistic features exhibited in these travel blog posts. The findings of this study also underline the co-contribution of the linguistic sign, in this case evaluative adjectives, and the visual sign, in this case still images, as interpersonal meaning-making resources. To portray Vietnam, still images served as integral elements to evidence the credibility of verbal narrations. To unveil sociolinguistic characteristics of travel blogs, still images supported the linguistic realizations of authenticity and strangerhood on the posts, and in some case delivered an even stronger message than words. Not only does the study present a source of feedback from international travelers to tourism practice in Vietnam, but it also provides insights into multimodal analysis of tourism discourse which remains an under-researched area in Vietnam.

Differentiation of Aphasic Patients from the Normal Control Via a Computational Analysis of Korean Utterances

  • Kim, HyangHee;Choi, Ji-Myoung;Kim, Hansaem;Baek, Ginju;Kim, Bo Seon;Seo, Sang Kyu
    • International Journal of Contents
    • /
    • v.15 no.1
    • /
    • pp.39-51
    • /
    • 2019
  • Spontaneous speech provides rich information defining the linguistic characteristics of individuals. As such, computational analysis of speech would enhance the efficiency involved in evaluating patients' speech. This study aims to provide a method to differentiate the persons with and without aphasia based on language usage. Ten aphasic patients and their counterpart normal controls participated, and they were all tasked to describe a set of given words. Their utterances were linguistically processed and compared to each other. Computational analyses from PCA (Principle Component Analysis) to machine learning were conducted to select the relevant linguistic features, and consequently to classify the two groups based on the features selected. It was found that functional words, not content words, were the main differentiator of the two groups. The most viable discriminators were demonstratives, function words, sentence final endings, and postpositions. The machine learning classification model was found to be quite accurate (90%), and to impressively be stable. This study is noteworthy as it is the first attempt that uses computational analysis to characterize the word usage patterns in Korean aphasic patients, thereby discriminating from the normal group.

Lexical Bundles in Computer Science Research Articles: A Corpus-Based Study

  • Lee, Je-Young;Lee, Hye Jin
    • International Journal of Contents
    • /
    • v.14 no.4
    • /
    • pp.70-75
    • /
    • 2018
  • The purpose of this corpus-based study was to find 4-word lexical bundles in computer science research articles. As the demand for research articles (RAs) for international publication increases, the need for acquiring field-specific writing conventions for this academic genre has become a burning issue. Particularly, one area of burgeoning interest in the examination of rhetorical structures and linguistic features of RAs is the use of lexical bundles, the indispensable building blocks that make up an academic discourse. To illustrate, different academic discourses rely on distinctive repertoires of lexical bundles. Because lexical bundles are often acquired as a whole, the recurring multi-word sequences can be retrieved automatically to make written discourse more fluent and natural. Therefore, the proper use of rhetorical devices specific to a particular discipline can be a vital indicator of success within the discourse communities. Hence, to identify linguistic features that make up specific registers, this corpus-based study examines the types and usage frequency of lexical bundles in the discipline of CS, one of the most in-demand fields world over. Given that lexical bundles are empirically-derived formulaic multi-word units, identifying core lexical bundles used in RAs, they may provide insights into the specificity of particular CS text types. This will in turn provide empirical evidence of register specificity and technicality within the academic discourse of computer science. As in the results, pedagogical implications and suggestions for future research are discussed.

A Study on the Landscape Adjectives for Urban Landscape Analysis (도시경관분석을 위한 경관형용사 목록 작성)

  • 주신하;임승빈
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.31 no.1
    • /
    • pp.1-10
    • /
    • 2003
  • The purpose of this study is to categorize a landscape adjective list for urban landscape analysis. For this purpose, four methods are used. The first method is to survey the foreign landscape adjective lists such as Feimer's EACL & LACL, VRM suggested vocabulary, and IEA and LI's aesthetic factors, which are commonly used in domestic research. The second method is to analyze vocabulary in Korean linguistic textbook the third is to investigate Korean adjective lists from 36 domestic research. The last is to survey adjectives used to express the urban landscapes. 24 landscapes from BunDdang, GwaCheon, YakSoo and ApGuJeong were presented to 40 subjects, whose responses were collected and categorized. The frequency analysis of the adjectives and landscape factors were processed by SJTOOL, which was programmed for Korean vocabulary analysis. The results of this study can be summarized as follows: Foreign adjective lists were mainly focused on the physical features of landscapes and they also had linguistic problems caused by the translations. Therefore, it is undesirable to use the foreign adjective list directly to analyze Korean urban landscapes. The vocabulary from the linguistic textbook has more variety, but it includes many adjectives irrelevant to the urban landscape. More types of adjectives were used in the researches(890 adjectives/295 types), compared with the result of response survey(1,406 adjectives/270 types). Because some adjectives were partly confusing, it is desirable to categorize the adjectives. The categorized adjectives could therefore be more useful and practical for urban landscape analysis.

Corpus-Based Literary Analysis (코퍼스에 기반한 문학텍스트 분석)

  • Ha, Myung-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.9
    • /
    • pp.440-447
    • /
    • 2013
  • Recently corpus linguistic analyses enable researchers to examine meanings and structural features of data, that is not detected intuitively. While the potential of corpus linguistic techniques has been established and demonstrated for non-literary data, corpus stylistic analyses have been rarely performed in terms of the analysis of literature. Specifically this paper explores keywords and their role in text analysis, which is primary part of corpus linguistic analyses. This paper focuses on the application of techniques from corpus linguistics and the interpretation of results. This paper addresses the question of what is to be gained from keyword analysis by scrutinizing keywords in Shakespeare's Romeo and Juliet.