Search | Korea Science

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
- Journal of Advanced Navigation Technology
- /
- v.15 no.2
- /
- pp.287-297
- /
- 2011
As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.
https://doi.org/10.12673/jant.2011.15.2.287 인용 PDF KSCI

A Study of Segmental and Syllabic Intervals of Canonical Babbling and Early Speech

Chen, Xiaoxiang;Xiao, Yunnan
- Cross-Cultural Studies
- /
- v.28
- /
- pp.115-139
- /
- 2012
Interval or duration of segments, syllables, words and phrases is an important acoustic feature which influences the naturalness of speech. A number of cross-sectional studies regarding acoustic characteristics of children's speech development found that intervals of segments, syllables, words and phrases tend to change with the growing age. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991), it has been supported by quite a number of researches on the basis of cross-sectional studies (Tingley & Allen,1975; Kent & Forner,1980; Chermak & Schneiderman, 1986), but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). Researchers seem to come up with conflicting postulations and inconsistent results about the change trends concerning intervals of segments, syllables, words and phrases, leaving it as an issue unresolved. Most acoustic investigations of children's speech production have been conducted via cross-sectional designs, which involves studying several groups of children. So far, there are only a few longitudinal studies. This issue needs more longitudinal investigations; moreover, the acoustic measures of the intervals of child speech are hardly available. All former studies focus on word stages excluding the babbling stages especially the canonical babbling stage, but we need to find out when concrete changes of intervals begin to occur and what causes the changes. Therefore, we conducted an acoustic study of interval characteristics of segments and words concerning Canonical Babble ( CB) and early speech in an infant aged from 0;9 to 2;4 acquiring Mandarin Chinese. The current research addresses the following two questions: 1. Whether decreases in interval would be greater when children were younger and smaller when they were older or vice versa? 2. Whether the child speech concerning the acoustic features of interval drifts in the direction of the language they are exposed to? The female infant whose L1 was Southern Mandarin living in Changsha was audio- and video-taped at her home for about one hour almost on a weekly basis during her age range from 0;9 to 2;4 under natural observation by us investigators. The recordings were digitized. Parts of the digitized material were labeled. All the repetitions were excluded. The utterances were extracted from 44 sessions ranging from 30 minutes to one hour. The utterances were divided into segments as well as syllable-sized units. Age stages are 0;9-1;0,1;1-1;5, 1;6-2;0, 2;1-2;4. The subject was a monolingual normal child from parents with a good education. The infant was audio-and video-taped in her home almost every week. The data were digitized, segments and syllables from 44 sessions spanning the transition from babble to speech were transcribed in narrow IPA and coded for analysis. Babble was coded from age 0;9-1;0, and words were coded from 1;0 to 2;4, the data has been checked by two professionally trained persons who majored in phonetics. The present investigation is a longitudinal analysis of some temporal characteristics of the child speech during the age periods of 0;9-1;0, 1;1-1;5, 1;6-2;0, 2;1-2;4. The answer to Research Question 1 is that our results are in agreement with neither of the hypotheses. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991); but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). On the whole, there is a tendency of decrease in segmental and syllabic duration with the growing age, but the changes are not drastic and abrupt. For example, /a/ after /k/ in Table 1 has greater decrease during 1;1-1;5, while /a/ after /p/, /t/ and /w/ has greater decrease during 2;1-2;4. /ka/ has greater decrease during 1;1-1;5, while /ta/ and /na/ has greater decrease during 2;1-2;4.Across the age periods, interval change experiences lots of fluctuation all the time. The answer to Research Question 2 is yes. Babbling stage is a period in which the children's acoustic features of intervals of segments, syllables, words and phrases is shifted in the direction of the language to be learned, babbling and children's speech emergence is greatly influenced by ambient language. The phonetic changes in terms of duration would go on until as late as 10-12 years of age before reaching adult-like levels. Definitely, with the increase of exposure to ambient language, the variation would be less and less until they attain the adult-like competence. Via the analysis of the SPSS 15.0, the decrease of segmental and syllabic intervals across the four age periods proves to be of no significant difference (p>0.05). It means that the change of segmental and syllabic intervals is continuous. It reveals that the process of child speech development is gradual and cumulative.

One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)

Hyuk Kim;Hyun-Taek Kim
- Korean Journal of Cognitive Science
- /
- v.35 no.1
- /
- pp.49-95
- /
- 2024
Polygraph examination, statement validity analysis and P300-based concealed information test are major three examination tools, which are use to determine a person's truthfulness and credibility in criminal procedure. Although polygraph examination is most common in criminal procedure, but it has little admissibility of evidence due to the weakness of scientific basis. In 1990s to support the weakness of scientific basis about polygraph, Farwell and Donchin proposed the P300-based concealed information test technique. The P300-based concealed information test has two strong points. First, the P300-based concealed information test is easy to conduct with polygraph. Second, the P300-based concealed information test has plentiful scientific basis. Nevertheless, the utilization of P300-based concealed information test is infrequent, because of the quantity of probe stimulus. The probe stimulus contains closed information that is relevant to the crime or other investigated situation. In tradition P300-based concealed information test protocol, three or more probe stimuli are necessarily needed. But it is hard to acquire three or more probe stimuli, because most of the crime relevant information is opened in investigative situation. In addition, P300-based concealed information test uses oddball paradigm, and oddball paradigm makes imbalance between the number of probe and irrelevant stimulus. Thus, there is a possibility that the unbalanced number of probe and irrelevant stimulus caused systematic underestimation of P300 amplitude of irrelevant stimuli. To overcome the these two limitation of P300-based concealed information test, one-probe P300-based concealed information test protocol is explored with various machine learning algorithms. According to this study, parameters of the modified one-probe protocol are as follows. In the condition of female and male face stimuli, the duration of stimuli are encouraged 400ms, the repetition of stimuli are encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. In the condition of two-syllable word stimulus, the duration of stimulus is encouraged 300ms, the repetition of stimulus is encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. It was also conformed that the logistic regression (LR), linear discriminant analysis (LDA), K Neighbors (KNN) algorithms were probable methods for analysis of P300 amplitude. The one-probe P300-based concealed information test with machine learning protocol is helpful to increase utilization of P300-based concealed information test, and supports to determine a person's truthfulness and credibility with the polygraph examination in criminal procedure.
https://doi.org/10.19066/cogsci.2024.35.1.003 인용 PDF

Legibility Evaluation of Two and Three Syllable Words Used in Pesticides According to Font, Thickness, Gender, and Visual Acuity (시력, 폰트, 굵기, 성별에 따른 2음절 및 3음절 농약 제품 표시글자의 가독성 평가)

Hwang, Hae-Young;Song, Young-Woong
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.8
- /
- pp.3444-3451
- /
- 2012
Safety and health related information for the proper use and handing of pesticides is usually printed on the surface of the pesticide products in the form of texts. But, the guidelines or standards for the appropriate presentation of the texts for the pesticide products are most vague or not practical. Thus, this study aimed to provide the preliminary guidelines for the text sizes based on the legibility experiments. To achieve the objective legibility evaluation experiments were conducted to test the effects of different near vision (0.6, ${\geq}0.8$), gender, font type(thick gothic-type and fine gothic-type), thickness of font(plain and bold), and number of syllables(2 and 3 syllables) in the same age group of 20s. The results showed that legibility was different according to the visual acuity (p<0.05), and no other main effects showed statistically significant effects. The 'maximum illegible size' to read at least one word correctly in all the text conditions was 2 pt when the near vision was ${\geq}0.8$, and 2 pt or 3 pt when the near vision was 0.6. The 'minimum legible size' for 100% correct answer was 9 pt for the near vision of 0.6, and 5.3 pt for ${\geq}0.8$, respectively. Mean character size does not read any discomfort in 0.6 was 15.5 pt in both male and female but male was mean 8.5 pt, female was 10 pt in ${\geq}0.8$. Considering these experimental results, it was recommended that the 16 pt or larger characters should be used the important information such as 'Pesticides' or toxicity, and the minimum character size was 9 pt for the less important information.
https://doi.org/10.5762/KAIS.2012.13.8.3444 인용 PDF KSCI

Search Result 84, Processing Time 0.019 seconds

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

A Study of Segmental and Syllabic Intervals of Canonical Babbling and Early Speech

One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)

Legibility Evaluation of Two and Three Syllable Words Used in Pesticides According to Font, Thickness, Gender, and Visual Acuity (시력, 폰트, 굵기, 성별에 따른 2음절 및 3음절 농약 제품 표시글자의 가독성 평가)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)