• 제목/요약/키워드: words

검색결과 9,087건 처리시간 0.039초

Word2Vec를 이용한 한국어 단어 군집화 기법 (Korean Language Clustering using Word2Vec)

  • 허지욱
    • 한국인터넷방송통신학회논문지
    • /
    • 제18권5호
    • /
    • pp.25-30
    • /
    • 2018
  • 최근 인터넷의 발전과 함께 사용자들이 원하는 정보를 빠르게 획득하기 위해서는 효율적인 검색 결과를 제공해주는 정보검색이나 데이터 추출등과 같은 연구 분야에 대한 중요성이 점점 커지고 있다. 하지만 새롭게 생겨나는 한국어 단어나 유행어들은 의미파악하기가 어렵기 때문에 주어진 단어와 의미적으로 유사한 단어들을 찾아 분석하는 기법들에 대한 연구가 필요하다. 이를 해결하기 위한 방법 중 하나인 단어 군집화 기법은 문서에서 주어진 단어와 의미상 유사한 단어들을 찾아서 묶어주는 기법이다. 본 논문에서는 Word2Vec기법을 이용하여 주어진 한글 문서의 단어들을 임베딩하여 자동적으로 유사한 한국어 단어들을 군집화 하는 기법을 제안한다.

웹사이트에 나타난 디지털 라이프의 특성 분석 (An analysis on the characteristics of digital life reflected in web sites)

  • 조명은;김현경;이현수
    • 한국주거학회논문집
    • /
    • 제12권2호
    • /
    • pp.181-190
    • /
    • 2001
  • The purpose of this study is to analysis web sites which relate to housing environment on Internet and to suggest guidelines which are needed in digital life. Data are in 53 web sites searched by housing environmental word such as people, living, town and so on. The web sites are analyzed by key words. The results of this study were as follow: The web sites are divided into e-housing community, e-housing management, e-housing workplace and e-housing design. These are the digital life of new type. E-housing community sitess key words are 3D virtual world, chatting, information, service, community etc. E-housing community is related to making new wired community cross time and space. E-housing management sitess key words are guard management, apartment management, building management etc. E-housing management sites provide the useful information of housing management. E-housing workplace sitess key words are virtual office. conference etc. E-housing workplace sites enable us to work in cyberspace. E-housing design sitess key words are design, interior, furniture etc. E-housing design sites provide marketing, consulting and designing in relation to the house. The web life style on cyberspace is common and makes many changes happen in house life and environment.

  • PDF

한글 자음과 모음에 대한 유아의 지식이 단어 읽기에 미치는 영향 (The Effects of Alphabet Knowledge on Korean Kindergarteners' Reading of Hangul Words)

  • 최나야;이순형
    • 가정과삶의질연구
    • /
    • 제25권3호
    • /
    • pp.151-168
    • /
    • 2007
  • The purpose of this study was to investigate the causal relationship of kindergarteners' alphabet knowledge to their ability to read words, in connection with the features of the Korean alphabet 'Hangul'. A total of 289 children aged four to six from three kindergartens in Busan participated in the study. The main results are as follows. To begin with, the participants showed continuous development in the knowledge of consonant names, vowel sounds, the vowel stroke-adding principle, and the alphabet composition principle. Meanwhile, discontinuous development was found in the knowledge of consonant sounds and the consonant stroke-adding principle, which indicated that kindergartners could show differential speed in various sub-skills of literacy development. The kindergartners' naming of consonants developed before their recall of consonant sounds, and the knowledge of consonant sounds had an effect on the knowledge of vowel sounds. Children had difficulty in treating more complicated letters of the alphabet stroke-adding principle test, and eve syllables of the alphabet composition principle test. Most importantly, the children's alphabet knowledge was strongly related to their ability to read words written in Hangul, as kindergarteners with a greater knowledge of alphabet names, sounds, and principles were shown to read words better.

국내 내셔널 남성복 브랜드명의 언어적 특성 (Linguistic Characteristics of Domestic National Men's Wear Brand Names)

  • 나수임
    • 한국의상디자인학회지
    • /
    • 제16권1호
    • /
    • pp.91-103
    • /
    • 2014
  • In this study, 70 national brands among men's wear brands were selected to examine linguistic characteristics of domestic national men's wear brand names. Linguistic factors which were used in national men's wear brand names were analyzed to understand their characteristics. Formative and semantic characteristics of each brand name were analyzed on the basis of the results from previous studies. It was found that long words with over four syllables are preferred than short words and single words in the form of noun are frequently used for domestic national men's wear brand names in terms of linguistic formality. English is most widely used in brand names, and European languages such as French, Spanish, and Italian are also used frequently under the influence of the country of origin. Next, the analysis result on the semantic characteristics of domestic national men's wear brand names showed that descriptive brand names are used to convey brand information directly and easily, or freestanding brand names which are absolutely irrelevant and newly coined words are chosen to create a characteristic image. In other words, brand names represent detailed business and product category of men's wear by forming a brand image of men's wear (ex. Man, Homme, Zio), and provide the information about properties and benefits related to the product such as dignity, masterpiece, and luxurious lifestyle to consumers by presenting the concept of the brand.

  • PDF

품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선 (Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus)

  • 임민규;김광호;김지환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

Phenomenological References : Arguments for Mentalistic Natural Language Semantics

  • Jun, Jong-Sup
    • 한국언어정보학회지:언어와정보
    • /
    • 제8권2호
    • /
    • pp.113-130
    • /
    • 2004
  • In a prevailing view of meaning and reference (cf. Frege 1892), words pick out entities in the physical world by virtue of meaning. Linguists and philosophers have argued whether the meaning of a word is inside or out-side language users' mind; but, in general, they have taken it for granted that words refer to entities in the physical world. Hilary Putnam (1975), based on his famous twin-earth thought experiment, argued that the meaning of a word could not be inside language users' head. In this paper, I point out that Putnam's argument makes sense only if words refer to entities in the physical world. That is, Putnam did not provide any argument against mentalistic semantics, since he erroneously assumed that meaning, but not reference, was inside our mind in mentalistic semantics. Mentalistic semanticist, however, assume that words pick out their references inside our head (instead of a possible outside world). A number of arguments for the mentalistic position come from psychology: studies on emotion and visual perception provide numerous cases where words cannot pick out entities from the physical world, but inside our head. The mentalistic theory has desirable consequences for the philosophy of language in that some classical puzzles of language (e.g. Russell's (1919) well-known puzzle of excluded middle) are explained well in the proposed theory.

  • PDF

동사 어휘의미망의 반자동 구축을 위한 사전정의문의 중심어 추출 (The Extraction of Head words in Definition for Construction of a Semi-automatic Lexical-semantic Network of Verbs)

  • 김혜경;윤애선
    • 한국언어정보학회지:언어와정보
    • /
    • 제10권1호
    • /
    • pp.47-69
    • /
    • 2006
  • Recently, there has been a surge of interests concerning the construction and utilization of a Korean thesaurus. In this paper, a semi-automatic method for generating a lexical-semantic network of Korean '-ha' verbs is presented through an analysis of the lexical definitions of these verbs. Initially, through the use of several tools that can filter out and coordinate lexical data, pairs constituting a word and a definition were prepared for treatment in a subsequent step. While inspecting the various definitions of each verb, we extracted and coordinated the head words from the sentences that constitute the definition of each word. These words are thought to be the main conceptual words that represent the sense of the current verb. Using these head words and related information, this paper shows that the creation of a thesaurus could be achieved without any difficulty in a semi-automatic fashion.

  • PDF

저빈도어를 고려한 개념학습 기반 의미 중의성 해소 (Word Sense Disambiguation based on Concept Learning with a focus on the Lowest Frequency Words)

  • 김동성;최재웅
    • 한국언어정보학회지:언어와정보
    • /
    • 제10권1호
    • /
    • pp.21-46
    • /
    • 2006
  • This study proposes a Word Sense Disambiguation (WSD) algorithm, based on concept learning with special emphasis on statistically meaningful lowest frequency words. Previous works on WSD typically make use of frequency of collocation and its probability. Such probability based WSD approaches tend to ignore the lowest frequency words which could be meaningful in the context. In this paper, we show an algorithm to extract and make use of the meaningful lowest frequency words in WSD. Learning method is adopted from the Find-Specific algorithm of Mitchell (1997), according to which the search proceeds from the specific predefined hypothetical spaces to the general ones. In our model, this algorithm is used to find contexts with the most specific classifiers and then moves to the more general ones. We build up small seed data and apply those data to the relatively large test data. Following the algorithm in Yarowsky (1995), the classified test data are exhaustively included in the seed data, thus expanding the seed data. However, this might result in lots of noise in the seed data. Thus we introduce the 'maximum a posterior hypothesis' based on the Bayes' assumption to validate the noise status of the new seed data. We use the Naive Bayes Classifier and prove that the application of Find-Specific algorithm enhances the correctness of WSD.

  • PDF

한국어의 종성중화 작용이 영어 단어 인지에 미치는 영향 (The Effects of Korean Coda-neutralization Process on Word Recognition in English)

  • 김선미;남기춘
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.59-68
    • /
    • 2010
  • This study addresses the issue of whether Korean(L1)-English(L2) non-proficient bilinguals are affected by the native coda-neutralization process when recognizing words in English continuous speech. Korean phonological rules require that if liaison occurs between 'words', then coda-neutralization process must come before the liaison process, which results in liaison-consonants being coda-neutralized ones such as /b/, /d/, or /g/, rather than non-neutralized ones like /p/, /t/, /k/, /$t{\int}$/, /$d_{\Im}$/, or /s/. Consequently, if Korean listeners apply their native coda-neutralization rules to English speech input, word detection will be easier when coda-neutralized consonants precede target words than when non-neutralized ones do. Word-spotting and word-monitoring tasks were used in Experiment 1 and 2, respectively. In both experiments, listeners detected words faster and more accurately when vowel-initial target words were preceded by coda-neutralized consonants than when preceded by coda non-neutralized ones. The results show that Korean listeners exploit their native phonological process when processing English, irrespective of whether the native process is appropriate or not.

  • PDF

Intonational Pattern Frequency of Seoul Korean and Its Implication to Word Segmentation

  • Kim, Sa-Hyang
    • 음성과학
    • /
    • 제15권2호
    • /
    • pp.21-30
    • /
    • 2008
  • The current study investigated distributional properties of the Korean Accentual Phrase and their implication to word segmentation. The properties examined were the frequency of various AP tonal patterns, the types of tonal patterns that are imposed upon content words, and the average number and temporal location of content words within the AP. A total of 414 sentences from the Read speech corpus and the Radio corpus were used for the data analysis. The results showed that the 84% of the APs contained one content word, and that almost 90% of the content words are located in AP-initial position. When the AP-initial onset was not an aspirated or tense consonant, the most common AP patterns were LH, LHH, and LHLH (78%), and 88% of the multisyllabic content words start with a rising tone in AP-initial position. When the AP-initial onset was an aspirated or tense consonant, the most common AP patterns were HH, HHLH, and HHL (72%), and 74% of the multisyllabic content words start with a level H tone in AP-initial position. The data further showed that 84.1% of APs end with the final H tone. The findings provide valuable information about the prosodic pattern and structure of Korean APs, and account for the results of a previous study which showed that Korean listeners are sensitive to AP-initial rising and AP-final high tones (Kim, 2007). This is in line with other cross-linguistic research which has revealed the correlation between prosodic probability and speech processing strategy.

  • PDF