• Title/Summary/Keyword: Stop Words

Search Result 107, Processing Time 0.022 seconds

Building Hybrid Stop-Words Technique with Normalization for Pre-Processing Arabic Text

  • Atwan, Jaffar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.65-74
    • /
    • 2022
  • In natural language processing, commonly used words such as prepositions are referred to as stop-words; they have no inherent meaning and are therefore ignored in indexing and retrieval tasks. The removal of stop-words from Arabic text has a significant impact in terms of reducing the size of a cor- pus text, which leads to an improvement in the effectiveness and performance of Arabic-language processing systems. This study investigated the effectiveness of applying a stop-word lists elimination with normalization as a preprocessing step. The idea was to merge statistical method with the linguistic method to attain the best efficacy, and comparing the effects of this two-pronged approach in reducing corpus size for Ara- bic natural language processing systems. Three stop-word lists were considered: an Arabic Text Lookup Stop-list, Frequency- based Stop-list using Zipf's law, and Combined Stop-list. An experiment was conducted using a selected file from the Arabic Newswire data set. In the experiment, the size of the cor- pus was compared after removing the words contained in each list. The results showed that the best reduction in size was achieved by using the Combined Stop-list with normalization, with a word count reduction of 452930 and a compression rate of 30%.

Aspects of the word-final stop releasing in reading the English isolated words enumerated (영어 나열형 고립 단에 읽기에서 어말 폐쇄음의 파열 양상)

  • Rhee Seok-Chae;Kang Sooha;Park Jihyun;Hwang Sunmin
    • MALSORI
    • /
    • no.46
    • /
    • pp.13-24
    • /
    • 2003
  • This experimental study shows that, in reading of the English isolated words that are enumerated, the releasing of the word-final stop is employed for signaling enumeration in company with the well-known intonational pattern for it. Furthermore, this study tries to find the aspects of the releasing of the stops in the word-final positions, focusing on the association of the stop releasing/nonreleasing with i) the POA (Place of Articulation) distinction of the word-final stop, ii) the various qualities of the vowel before the final stop, and iii) the voice distinction of the stop in the word-final position.

  • PDF

Temporal Structures of Word-initial /s/ Plus Stop Sequences in English Words Produced by Korean Learners

  • Seo, Mi-Sun;Kim, Hee-Sung;Shin, Ji-Young;Kim, Kee-Ho
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.43-54
    • /
    • 2006
  • The purpose of this study is to examine temporal structures of English words beginning with an /s/ plus stop sequence through production experiments with native speakers of Korean learning English and native speakers of English. According to the results of our production experiment, both a beginner and an advanced group of Korean English learners produced /s/ shorter than a following stop, while the opposite pattern was observed in English native speakers' production. An advanced group of Korean English learners were good at producing a stop after /s/ as unaspirated, but their production of a stop following /s/ was different from English native speakers' production in that the closure duration of the stop was much longer.

  • PDF

Effects of Word Frequency on a Lenition Process: Evidence from Stop Voicing and /h/ Reduction in Korean

  • Choi, Tae-Hwan;Lim, Nam-Sil;Han, Jeong-Im
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.35-48
    • /
    • 2006
  • The present study examined whether words with higher frequency have more exposure to the lenition process such as intervocalic stop voicing or /h/ reduction in the production of the Korean speakers. Experiment 1 and Experiment 2 tested if word-internal intervocalic voicing and /h/ reduction occur more often in the words with higher frequency than less frequent words respectively. Results showed that the rate of voicing was not significantly different between the high frequency group and the low frequency group; rather both high and low frequency words were shown to be fully voiced in this prosodic position. However, intervocalic /h/s were deleted more in high frequency words than in low frequency words. Low frequency words showed that other phonetic variants such as [h] and [w] were found more often than in high frequency group. Thus the results of the present study are indefinitive as to the relationship between the word frequency and lenition with the data at hand.

  • PDF

Acquisition of English Voiced Stop in Word Initial Position : Correlation with Vowel Height

  • Yoon, Su-yeon;Seo, Min-kyong;Song, Yoon-Kyoung
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.199-199
    • /
    • 2000
  • Korean stops are 3 system: aspirated, fortis, lenis, whereas English stops are 2 system: voiced, voiceless. Because in Korean, lenis stop is realized by slight aspirated voiceless stop, it is likely to produce English word initial voiced stop as voiceless stop. We divide subjects into three group-native, experienced, unexperienced- and investigate differences between group. VOT of experienced group IS same as native group, but VOT of unexperienced group is longer than native group. VOt of unexperienced group is 1.8 times than native group. We survey whether the height of following vowel influences VOT of initial stop. As a result, for all group, VOT followed by low vowel is shorter than VOT followed by high vowel. But this tendency is more salient in unexperienced group. For high vowel, VOT of unexperienced group is 2.05 times than native group, whereas for low vowel, it is just 1.55 times. The unexperienced pronounce well English word initial voiced stop followed by low vowel than high vowel. Samples are divided into two group according to type of coda consonant- nasal and voiceless stop. But average of VOT is similar and there is no significant difference between two groups. There is no influence by type of coda consonant. The average of phrases is compared to the average of isolated words. In the case of natives and experienced, there is no significant differences between phrases and words, but in the case of unexperienced, VOT of phrases becomes shorter than words. But VOT of unexperienced is still longer than native group.

  • PDF

A study of the preconsonantal vowel shortening in Chinese

  • Yun, Ilsung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.39-44
    • /
    • 2018
  • This study aimed to examine whether preconsonantal vowel shortening, which occurs in many languages, exists in Chinese. To this end, we compared 15 pairs of Chinese bi-syllabic words with intervocalic unaspirated/aspirated stops. The results revealed that (1) the effect of the feature aspiration of the following stop on the preceding vowel (V1) was neither significant nor consistent though V1 tends to be a little longer before an unaspirated stop; (2) the following unaspirated stop closure (C) was similar to or longer than its aspirated cognate; (3) the durational sum of V1 and C was longer when the stop is unaspirated, and V1 and C had no compensatory relationship; (4) Voice Onset Time (VOT) was significantly longer when the stop is aspirated than unaspirated; (5) the vowel (V2) following VOT was significantly longer when the stop is unaspirated, so the differentials in VOT were partially compensated; (6) despite the partial compensation, the sum of VOT and V2 was longer when the stop is aspirated; (7) words with an intervocalic aspirated stop were longer than those with its unaspirated cognate. It is concluded that while VOT is the most important factor for deciding the timing structure of Chinese words with intervocalic stops, closure duration is crucial for Korean and many other languages.

A Study on the Production of a Stop Plus Nasal Sequence in English Words by Korean Learners

  • Seo, Mi-Sun;Kim, Hee-Sung;Shin, Ji-Young;Kim, Kee-Ho
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.165-173
    • /
    • 2005
  • This paper investigates the influence of the Korean phonology on the production of English words including a stop plus nasal sequence through production experiments with a beginner and an advanced group of Korean English learners. The results of the production experiments show that both the beginner and the advanced group of Korean English learners were under the influence of the Korean phonological rule realizing a stop as a nasal before a nasal when they pronounced a stop plus nasal sequences in English words. The extent of L1 interference was greater in the beginner group than in the advanced group.

  • PDF

Aspects of the word-final stop releasing and its phonetic correlates in reading the English isolated words enumerated (영어 나열형 고립 단어 읽기에서 어말 폐쇄음의 파열 양상 및 그 음성적 상관성)

  • Rhee Seok-Chae;Kang Sooha;Park Jihyun;Hwang Sunmin
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.61-68
    • /
    • 2003
  • This experimental research shows that, in reading of the English isolated words that are enumerated, the releasing of the word-final stop is employed for signaling enumeration in company with the well-known intonational pattern for it. Furthermore, this study tries to find the conceivable phonetic correlates of the releasing of the stop in word-final position, focusing on the association of the stop releasing/nonreleasing with i) the POA (Place of Articulation) distinction of the word-final stop, ii) the various qualities of the preceding vowel placed before the final stop, and iii) the voice distinction of the stop in the word-final position.

  • PDF

Closure Duration and Pitch as Phonetic Cues to Korean Stop Identity in AP-medial Position: Perception Test

  • Kang, Hyun-Sook;Dilley, Laura
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.25-39
    • /
    • 2007
  • The present study investigated some perceptual phonetic attributes of two Korean stop types, aspirated and lax, in medial position of an accentual phrase. The intonational pattern across syllables (Jun, 1993) is argued to depend on the type of stop (aspirated vs. lax) only in the initial position of an accentual phrase. In Kang & Dilley (2007), we showed that significant differences between aspirated and lax stops in medial position of an accentual phrase exist in closure duration, voice-onset time, and fundamental frequency (F0) values for post-stop vowels. In the present perception experiment, we investigated whether these phonetic attributes contribute to the perception of these two types of stops: The closure durations and/or F0's of post-stop vowels on accentual-phrase medial words were altered and twenty native Korean speakers then judged these words as beginning with an aspirated or lax stop. Both closure duration and F0 significantly affected judgments of stop identity. These results indicate that a wider range of acoustic cues that distinguish aspirated and lax Korean stops in production also plays a role in perception. To account for these results we suggest some phonetic and phonological models of consonant-tone interactions for Korean.

  • PDF

The Effect of Prosodic Position and Word Type on the Production of Korean Plosives

  • Jang, Mi
    • Phonetics and Speech Sciences
    • /
    • v.3 no.4
    • /
    • pp.71-81
    • /
    • 2011
  • This paper investigated how prosodic position and word type affect the phonetic structure of Korean coronal stops. Initial segments of prosodic domains were known to be more strongly articulated and longer relative to prosodic domain-medial segments. However, there are few studies examining whether the properties of prosodic domain-initial segments are affected by the information content of words (real vs. nonsense words). In addition, since the scope of domain-initial effect was known to be local to the initial consonant and the effects on the following vowel have been found to be limited, it is thus worth examining whether the prosodic domain-initial effect extends into the vowel after the initial consonant in a systematic way across different prosodic domains. The acoustic properties of Korean coronal stops (lenis /t/, aspirated /$t^h$/, and tense /t'/) were compared across Intonational Phrase, Phonological Phrase and Word-initial positions both in real and nonsense words. The durational intervals such as VOT and CV duration were cumulatively lengthened for /t/ and /$t^h$/ in the higher prosodic domain-initial positions. However, tense stop /t'/ did not show any variation as a function of prosodic position and word type. The domain-initial lenis stop showed significantly longer duration in nonsense words than in real words. But the prosodic domain-initial effect was not found in the properties of F0 and [H1-H2] of the vowel after initial stops. The present study provided evidence that speakers tend to enhance speech clarity when there is less contextual information as in prosodic domain-initial position and in nonsense words.

  • PDF