• Title/Summary/Keyword: In Word Probability

Search Result 115, Processing Time 0.022 seconds

Intonational Pattern Frequency of Seoul Korean and Its Implication to Word Segmentation

  • Kim, Sa-Hyang
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.21-30
    • /
    • 2008
  • The current study investigated distributional properties of the Korean Accentual Phrase and their implication to word segmentation. The properties examined were the frequency of various AP tonal patterns, the types of tonal patterns that are imposed upon content words, and the average number and temporal location of content words within the AP. A total of 414 sentences from the Read speech corpus and the Radio corpus were used for the data analysis. The results showed that the 84% of the APs contained one content word, and that almost 90% of the content words are located in AP-initial position. When the AP-initial onset was not an aspirated or tense consonant, the most common AP patterns were LH, LHH, and LHLH (78%), and 88% of the multisyllabic content words start with a rising tone in AP-initial position. When the AP-initial onset was an aspirated or tense consonant, the most common AP patterns were HH, HHLH, and HHL (72%), and 74% of the multisyllabic content words start with a level H tone in AP-initial position. The data further showed that 84.1% of APs end with the final H tone. The findings provide valuable information about the prosodic pattern and structure of Korean APs, and account for the results of a previous study which showed that Korean listeners are sensitive to AP-initial rising and AP-final high tones (Kim, 2007). This is in line with other cross-linguistic research which has revealed the correlation between prosodic probability and speech processing strategy.

  • PDF

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

Brainstorming using TextRank algorithms and Artificial Intelligence (TextRank 알고리즘 및 인공지능을 활용한 브레인스토밍)

  • Sang-Yeong Lee;Chang-Min Yoo;Gi-Beom Hong;Jun-Hyuk Oh;Il-young Moon
    • Journal of Practical Engineering Education
    • /
    • v.15 no.2
    • /
    • pp.509-517
    • /
    • 2023
  • The reactive web service provides a related word recommendation system using the TextRank algorithm and a word-based idea generation service selected by the user. In the related word recommendation system, the method of weighting each word using the TextRank algorithm and the probability output method using SoftMax are discussed. The idea generation service discusses the idea generation method and the artificial intelligence reinforce-learning method using mini-GPT. The reactive web discusses the linkage process between React, Spring Boot, and Flask, and describes the overall operation method. When the user enters the desired topic, it provides the associated word. The user constructs a mind map by selecting a related word or adding a desired word. When a user selects a word to combine from a constructed mind-map, it provides newly generated ideas and related patents. This web service can share generated ideas with other users, and improves artificial intelligence by receiving user feedback as a horoscope.

Color Recommendation for Text Based on Colors Associated with Words

  • Liba, Saki;Nakamura, Tetsuaki;Sakamoto, Maki
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.1
    • /
    • pp.21-29
    • /
    • 2012
  • In this paper, we propose a new method to select colors representing the meaning of text contents based on the cognitive relation between words and colors, Our method is designed on the previous study revealing the existence of crucial words to estimate the colors associated with the meaning of text contents, Using the associative probability of each color with a given word and the strength of color association of the word, we estimate the probability of colors associated with a given text. The goal of this study is to propose a system to recommend the cognitively plausible colors for the meaning of the input text. To build a versatile and efficient database used by our system, two psychological experiments were conducted by using news site articles. In experiment 1, we collected 498 words which were chosen by the participants as having the strong association with color. Subsequently, we investigated which color was associated with each word in experiment 2. In addition to those data, we employed the estimated values of the strength of color association and the colors associated with the words included in a very large corpus of newspapers (approximately 130,000 words) based on the similarity between the words obtained by Latent Semantic Analysis (LSA). Therefore our method allows us to select colors for a large variety of words or sentences. Finally, we verified that our system cognitively succeeded in proposing the colors associated with the meaning of the input text, comparing the correct colors answered by participants with the estimated colors by our method. Our system is expected to be of use in various types of situations such as the data visualization, the information retrieval, the art or web pages design, and so on.

Weighted Bayesian Automatic Document Categorization Based on Association Word Knowledge Base by Apriori Algorithm (Apriori알고리즘에 의한 연관 단어 지식 베이스에 기반한 가중치가 부여된 베이지만 자동 문서 분류)

  • 고수정;이정현
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.2
    • /
    • pp.171-181
    • /
    • 2001
  • The previous Bayesian document categorization method has problems that it requires a lot of time and effort in word clustering and it hardly reflects the semantic information between words. In this paper, we propose a weighted Bayesian document categorizing method based on association word knowledge base acquired by mining technique. The proposed method constructs weighted association word knowledge base using documents in training set. Then, classifier using Bayesian probability categorizes documents based on the constructed association word knowledge base. In order to evaluate performance of the proposed method, we compare our experimental results with those of weighted Bayesian document categorizing method using vocabulary dictionary by mutual information, weighted Bayesian document categorizing method, and simple Bayesian document categorizing method. The experimental result shows that weighted Bayesian categorizing method using association word knowledge base has improved performance 0.87% and 2.77% and 5.09% over weighted Bayesian categorizing method using vocabulary dictionary by mutual information and weighted Bayesian method and simple Bayesian method, respectively.

  • PDF

A Word Spacing System based on Syllable Patterns for Memory-constrained Devices (메모리 제약적 기기를 위한 음절 패턴 기반 띄어쓰기 시스템)

  • Kim, Shin-Il;Yang, Seon;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.8
    • /
    • pp.653-658
    • /
    • 2010
  • In this paper, we propose a word spacing system which can be performed with just a small memory. We focus on significant memory reduction while maintaining the performance of the system as much as the latest studies. Our proposed method is based on the theory of Hidden Markov Model. We use only probability information not adding any rule information. Two types of features are employed: 1) the first features are the spacing patterns dependent on each individual syllable and 2) the second features are the values of transition probability between the two syllable-patterns. In our experiment using only the first type of features, we achieved a high accuracy of more than 91% while reducing the memory by 53% compared with other systems developed for mobile application. When we used both types of features, we achieved an outstanding accuracy of more than 94% while reducing the memory by 76% compared with other system which employs bigram syllables as its features.

An Analysis on the Competence and the Methods of Problem Solving of Children at the Before of School Age in Four Operations Word Problems (학령 전 아이들의 사칙연산 문장제 해결 능력과 방법 분석)

  • Lee, Dae-Hyun
    • Journal of the Korean School Mathematics Society
    • /
    • v.13 no.3
    • /
    • pp.381-395
    • /
    • 2010
  • The purpose of this paper is to examine the competence and the methods of problem solving in four operations word problems based on the informal knowledges by five-year-old children. The numbers which are contained in problems consist of the numbers bigger than 5 and smaller than 10. The subjects were 21 five-year-old children who didn't learn four operations. The interview with observation was used in this research. Researcher gave the various materials to children and permitted to use them for problem solving. And researcher read the word problems to children and children solved the problems. The results are as follows: five-year-old children have the competence of problem solving in four operations word problems. They used mental computation or counting all materials strategy in addition problem. The methods of problem solving were similar to that of addition in subtraction, multiplication and division, but the rate of success was different. Children performed poor1y in division word problems. According to this research, we know that kindergarten educators should be interested in children's informal knowledges of four operations including shapes, patterns, statistics and probability. For this, it is needed to developed the curriculum and programs for informal mathematical experiences.

  • PDF

Factors Influencing Youngsters' Consumption Behavior on High-End Cosmetics in China

  • GILITWALA, Bhumiphat;NAG, Amit Kumar
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.1
    • /
    • pp.443-450
    • /
    • 2021
  • The paper investigates the factors that affect the decision of young Chinese consumers to buy high-end cosmetics. The study is based on the responses obtained by questionnaires from 400 respondents in Guangzhou, China. The information was collected and classified on the basis of gender, occupation, age and education in order to understand the main characteristics of the sample in a better way. The purposive, convenient and quota sampling techniques of non-probability sampling method were used. Besides this, the predictive test was carried out with 30 respondents to ensure the reliability and validity of the questionnaires. The data was put to descriptive statistical analysis and multiple regression analysis in order to verify the hypotheses. The data revealed that, while brand awareness does not affect the consumer attitude about the high-end cosmetics, other factors like product involvement, perceived quality, subjective norm, and word-of-mouth have significant effect on consumer's attitude and consumers' intention about high-end cosmetics. The findings of the study show that subjective norm, perceived value, word-of-mouth, and consumer attitude of cosmetic products highly affect consumers purchase intention of high-end cosmetic products. The research paper helps to form concrete and effective marketing strategy based on various aspects of consumer behavior for high-end cosmetics in China.

A Study of Development for Korean Phonotactic Probability Calculator (한국어 음소결합확률 계산기 개발연구)

  • Lee, Chan-Jong;Lee, Hyun-Bok;Choi, Hun-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.239-244
    • /
    • 2009
  • This paper is to develop the Korean Phonotactic Probability Calculator (KPPC) that anticipates the phonotactic probability in Korean. KPPC calculates the positional segment frequecncy, position-specific biphone frequency and position-specific triphone frequency. And KPPC also calculates the Neighborhood Density that is the number of words that sound similar to a target word. The Phonotactic Calculator that was developed in University of Kansas can be analyzed by the computer-readable phonemic transcription. This can calculate positional frequency and position-specific biphone frequency that were derived from 20,000 dictionary words. But KPPC calculates positional frequency, positional biphone frequency, positional triphone frequency and neighborhood density. KPPC can calculate by korean alphabet or computer-readable phonemic transcription. This KPPC can anticipate high phonotactic probability, low phonotactic probability, high neighborhood density and low neighborhood density.

Speech Recognition using MSHMM based on Fuzzy Concept

  • Ann, Tae-Ock
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2E
    • /
    • pp.55-61
    • /
    • 1997
  • This paper proposes a MSHMM(Multi-Section Hidden Markov Model) recognition method based on Fuzzy Concept, as a method on the speech recognition of speaker-independent. In this recognition method, training data are divided into several section and multi-observation sequences given proper probabilities by fuzzy rule according to order of short distance from MSVQ codebook per each section are obtained. Thereafter, the HMM per each section using this multi-observation sequences is generated, and in case of recognition, a word that has the most highest probability is selected as a recognized word. In this paper, other experiments to compare with the results of these experiments are implemented by the various conventional recognition methods(DP, MSVQ, DMS, general HMM) under the same data. Through results of all-round experiment, it is proved that the proposed MSHMM based on fuzzy concept is superior to DP method, MSVQ method, DMS model and general HMM model in recognition rate and computational time, and does not decreases recognition rate as 92.91% in spite of increment of speaker number.

  • PDF