• 제목/요약/키워드: Short English Text

검색결과 16건 처리시간 0.017초

Research on Keyword-Overlap Similarity Algorithm Optimization in Short English Text Based on Lexical Chunk Theory

  • Na Li;Cheng Li;Honglie Zhang
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.631-640
    • /
    • 2023
  • Short-text similarity calculation is one of the hot issues in natural language processing research. The conventional keyword-overlap similarity algorithms merely consider the lexical item information and neglect the effect of the word order. And some of its optimized algorithms combine the word order, but the weights are hard to be determined. In the paper, viewing the keyword-overlap similarity algorithm, the short English text similarity algorithm based on lexical chunk theory (LC-SETSA) is proposed, which introduces the lexical chunk theory existing in cognitive psychology category into the short English text similarity calculation for the first time. The lexical chunks are applied to segment short English texts, and the segmentation results demonstrate the semantic connotation and the fixed word order of the lexical chunks, and then the overlap similarity of the lexical chunks is calculated accordingly. Finally, the comparative experiments are carried out, and the experimental results prove that the proposed algorithm of the paper is feasible, stable, and effective to a large extent.

"단편소설집의 사이클"로서 단 리의 『옐로우』 연구 (Reading Don Lee's Yellow as a Short Story Cycle)

  • 이수미
    • 영어영문학
    • /
    • 제57권5호
    • /
    • pp.727-755
    • /
    • 2011
  • In this paper, I'll try to read Don Lee's Yellow intertextually with a more canonical text, Sherwood Anderson's Winesburg, Ohio, in order to see what kind of traditions and techniques Yellow references and/or rewrites as a way of tracking this production. Yellow's formal properties as a short story cycle are established through its use of particular conventions. For instance, Yellow follows the short story cycle model that includes the assemblage of recurring characters into one locale. Yellow's characters are all connected to and at some point located in the fictional small town of Rosarita Bay, California. The text form aligns it with established literary conventions and traditions and suggests the author's reliance upon or trust in those modes. Yellow's setting in a small town alludes to and has often been compared to Anderson's Winesburg, Ohio, which is perhaps one of the most well-known and extensively discussed short story cycles in American literature. Also following convention is Lee's construction of Rosarita Bay and the text's third person narrator as a member of that town. Both Rosarita Bay and the narrator become important figures through the related-tale nature of the text. The method of story-telling is similar to how the town Winesburg and its "seemingly sympathetic and non-overtly judgmental" narrator are operational in Anderson's text. In sum, Yellow is opportune for intertextual reading largely because it is a collection of stories that create a linked series.

정제 알고리즘을 이용한 한국인 화자의 영어 발화 자동 진단 시스템 (Automatic Pronunciation Diagnosis System of Korean Students' English Using Purification Algorithm)

  • 양일호;김민석;유하진;한혜승;이주경
    • 말소리와 음성과학
    • /
    • 제2권2호
    • /
    • pp.69-75
    • /
    • 2010
  • We propose an automatic pronunciation diagnosis system to evaluate the pronunciation of a foreign language without the uttered text. We recorded English utterances spoken by native and Korean speakers, and utterances spoken by Koreans are evaluated by native speakers based on three criteria: fluency, accuracy of phones and intonation. The system evaluates the utterances of test Korean speakers based on the differences of log-likelihood given two models: one is trained by English speech uttered by native speakers, and the other is trained by English speech uttered by Korean speakers. We also applied purification algorithm to increase class differentiability. The purification can detect and eliminate the non-speech frames such as short pauses, occlusive silences that do not help to discriminate between utterances. As the results, our proposed system has higher correlation with the human scores than the baseline system.

  • PDF

An Equal Pair: The Dialogic Narrative Scheme in Bleak House

  • Kim, Myungjin
    • 영어영문학
    • /
    • 제55권6호
    • /
    • pp.993-1011
    • /
    • 2009
  • Generally, the parts narrated by Esther in Bleak House has been considered less convincing and reliable than those by the anonymous narrator for some problematic qualities in her character and narration. However, Esther's narrative shows Dickens' masterly depiction of emotional deprivation, the psychic consequences of the Victorian sexual repression on its victim. Therefore, to restore the reliability of Esther's narrative is the prerequisite for claiming its value as an appropriate locus of the meanings of the text. On the other hand, the anonymous narrator is not so omniscient as he has been regarded. As the chapters proceed, his omniscient power and authority is conspicuously weakened, and even transferred to other characters such as Esther and Mr. Bucket. This shows that the identity of the omniscient voice is unstable and that Dickens does not intend his voice to be the sole center of meanings of the text. In short, these two narratives are the necessary partners in imagining and understanding the society in its wholeness. Alternating and sometimes intersecting each other throughout the novel, these opposing viewpoints make us see the contradictory multi-leveledness of the Victorian society. The equality of them implies Dickens' notion that more than single unified voice is needed to portray ideological conflicts of his age.

Investigating Predictive Features for Authorship Verification of Arabic Tweets

  • Alqahtani, Fatimah;Dohler, Mischa
    • International Journal of Computer Science & Network Security
    • /
    • 제22권6호
    • /
    • pp.115-126
    • /
    • 2022
  • The goal of this research is to look into different techniques to solve the problem of authorship verification for Arabic short writings. Despite the widespread usage of Twitter among Arabs, short text research has so far focused on authorship verification in languages other than Arabic, such as English, Spanish, and Greek. To the best of the researcher's knowledge, no study has looked into the task of verifying Arabic-language Twitter texts. The impact of Stylometric and TF-IDF features of very brief texts (Arabic Twitter postings) on user verification was explored in this study. In addition, an analytical analysis was done to see how meta-data from Twitter tweets, such as time and source, can help to verify users perform better. This research is significant on the subject of cyber security in Arabic countries.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.

미완의 만보자 -도시의 시선으로 『군중의 남자』 읽기 (The Unfulfilled Journey of a Flâneur: Reading "The Man of the Crowd" through the Eyes of the City)

  • 남수영
    • 영어영문학
    • /
    • 제56권4호
    • /
    • pp.617-635
    • /
    • 2010
  • This paper argues that what Edgar Allan Poe pursues in ;The Man of the Crowd" (1840) is not a story that can be told but an active reading that must be mediated. It is not only because the subject of the pursuit, the secret of the flaneur, remains veiled until the end, but also because the story proves itself to be a reading of various kinds of other texts: that is, the contemporary urban texts as well as the city itself. Although the 'man of the crowd' and his double (i.e. the narrator) embrace the figure of a modern flaneur, it is highly questionable whether the image of flaneur in the story fully qualifies itself as that of an ideal stroller, who can represent the free spirit of a detached collector. Rather, the narrator's flaneur reflects a panoptic perspective, systematically hierarchizing the constituents of the city. Still, it should be noted that ;The Man of the Crowd" raises questions about the idea of creation and appropriation, observation and originality, and reading and storytelling by ascertaining the impossibility of reading and through assimilating to the contemporary texts not without subtle acknowledgement. In short, this novella tries a new way of storytelling, of which meaning is not to be found in creation but to be mediated in modern experiences.

A Deeping Learning-based Article- and Paragraph-level Classification

  • Kim, Euhee
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.31-41
    • /
    • 2018
  • Text classification has been studied for a long time in the Natural Language Processing field. In this paper, we propose an article- and paragraph-level genre classification system using Word2Vec-based LSTM, GRU, and CNN models for large-scale English corpora. Both article- and paragraph-level classification performed best in accuracy with LSTM, which was followed by GRU and CNN in accuracy performance. Thus, it is to be confirmed that in evaluating the classification performance of LSTM, GRU, and CNN, the word sequential information for articles is better than the word feature extraction for paragraphs when the pre-trained Word2Vec-based word embeddings are used in both deep learning-based article- and paragraph-level classification tasks.

등가를 통한 번역의 이론과 구성 요소 분석 (Equivalence in Translation and its Components)

  • 박정준
    • 비교문화연구
    • /
    • 제19권
    • /
    • pp.251-270
    • /
    • 2010
  • The subject of the paper is to discern the validity of the translation theory put forward by the ESIT(Ecole Sup?rieur d'Interpr?tes et de Tranducteurs, Universit? Paris III) and how it differentiates from the other translation theories. First, the paper will analyze the theoretical aspects put forward by examining the equivalence that may be discerned between the french and korean translation in relation to the original english text that is being translated. Employing the equivalence in translation may shed new insights into the unterminable discussions we witness today between the literal translation and the free translation. Contrary to the formal equivalence the dynamic equivalence by Nida suggests that the messages retain the same meanings whether it be the original or a translated text to the/for the reader. In short, the object of the dynamic equivalence is to identify the closest equivalence to the suggested source language. The concept of correspondence and equivalence defined by theoriticians of translation falls to the domain of dynamic equivalence suggested by Nida. In translation theory the domain of usage of language and the that of discourse is denoted separately. by usage one denotes the translation through symbols that make up language itself. In contrast to this, the discourse is suggestive of defining the newly created expressions which may be denoted as being a creative equivalence which embodies the original message for the singular situation at hand. The translator will however find oneself incorporating the two opposing theories in translating. Translation falls under the criteria of text and not of language, thus one cannot regulate or foresee any special circumstances that may arise in translation of discourse, the translation to reflect this condition should always be delimited. All other translation should be subject to translation by equivalence. The interpretation theory in translation (of ESIT) in effect is relative to both the empirical and philosophical approach and is suggestive of new perspective in translation. In conclusion, the above suggested translation theory is different from the skopos theory and the polysystem theory in that it only takes in to account the elements that are in close relation to the original text, and also that it was developed for educational purposes opening new perspectives in the domain of translation theories.

Deep Learning Based Rumor Detection for Arabic Micro-Text

  • Alharbi, Shada;Alyoubi, Khaled;Alotaibi, Fahd
    • International Journal of Computer Science & Network Security
    • /
    • 제21권11호
    • /
    • pp.73-80
    • /
    • 2021
  • Nowadays microblogs have become the most popular platforms to obtain and spread information. Twitter is one of the most used platforms to share everyday life event. However, rumors and misinformation on Arabic social media platforms has become pervasive which can create inestimable harm to society. Therefore, it is imperative to tackle and study this issue to distinguish the verified information from the unverified ones. There is an increasing interest in rumor detection on microblogs recently, however, it is mostly applied on English language while the work on Arabic language is still ongoing research topic and need more efforts. In this paper, we propose a combined Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to detect rumors on Twitter dataset. Various experiments were conducted to choose the best hyper-parameters tuning to achieve the best results. Moreover, different neural network models are used to evaluate performance and compare results. Experiments show that the CNN-LSTM model achieved the best accuracy 0.95 and an F1-score of 0.94 which outperform the state-of-the-art methods.