• Title/Summary/Keyword: 단어길이

Search Result 147, Processing Time 0.022 seconds

Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure (k-Structure를 이용한 한국어 상품평 단어 자동 추출 방법)

  • Kang, Han-Hoon;Yoo, Seong-Joon;Han, Dong-Il
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.470-479
    • /
    • 2010
  • In relation to the extraction of opinion words, it may be difficult to directly apply most of the methods suggested in existing English studies to the Korean language. Additionally, the manual method suggested by studies in Korea poses a problem with the extraction of opinion words in that it takes a long time. In addition, English thesaurus-based extraction of Korean opinion words leaves a challenge to reconsider the deterioration of precision attributed to the one to one mismatching between Korean and English words. Studies based on Korean phrase analyzers may potentially fail due to the fact that they select opinion words with a low level of frequency. Therefore, this study will suggest the k-Structure (k=5 or 8) method, which may possibly improve the precision while mutually complementing existing studies in Korea, in automatically extracting opinion words from a simple sentence in a given Korean product review. A simple sentence is defined to be composed of at least 3 words, i.e., a sentence including an opinion word in ${\pm}2$ distance from the attribute name (e.g., the 'battery' of a camera) of a evaluated product (e.g., a 'camera'). In the performance experiment, the precision of those opinion words for 8 previously given attribute names were automatically extracted and estimated for 1,868 product reviews collected from major domestic shopping malls, by using k-Structure. The results showed that k=5 led to a recall of 79.0% and a precision of 87.0%; while k=8 led to a recall of 92.35% and a precision of 89.3%. Also, a test was conducted using PMI-IR (Pointwise Mutual Information - Information Retrieval) out of those methods suggested in English studies, which resulted in a recall of 55% and a precision of 57%.

A Study on Rhythm Information Visualization Using Syllable of Digital Text (디지털 텍스트의 음절을 이용한 운율 정보 시각화에 관한 연구)

  • Park, seon-hee;Lee, jae-joong;Park, jin-wan
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.120-126
    • /
    • 2009
  • As the information age grows rapidly, the amount of digital texts has been increasing as well. It has brought an increasing of visualization case in order to figure out lots of digital texts. Existing visualized design of digital text is merely concentrating on figuration of subject word through adoption of stemming algorithm and word frequency extraction, prominence of meaning of text, and connection in between sentences. So it is a fact that expression of rhythm that can visualize sentimental feeing of digital text was insufficient. Syllable is a phoneme unit that can express rhythm more efficiently. In sentences, syllable is a most basic pronunciation unit in pronouncing word, phase and sentence. On this basis, accent, intonation, length of rhythm factor and others are based on syllable. Sonority, which is most closely associated with definitions of syllable, is expressed through air flow of igniting lung and acoustic energy that is specified kinetic energy into sonority. Seen from this perspective, this study examines phonologic definition and characteristics based on syllable, which is properties of digital text, and research the way to visualize rhythm through diagram. After converting digital text into phonetic symbol by the experiment, rhythm information are visualized into images using degree of resonance, which was started from rhythm in all languages, and using syllable establishment of digital text. By visualizing syllable information, it provides syllable information of digital text and express sentiment of digital text through diagram to assist user's understanding by systematic formula. Therefore, this study is aimed at planning for easy understanding of text's rhythm and realizing visualization of digital text.

  • PDF

Development and effects of Nanta program using speech rhythm for children with limited speech sound production (말소리가 제한된 아동을 위한 말리듬을 이용한 난타 프로그램의 개발과 효과)

  • Park, Yeong Hye;Choi, Seong Hee
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.67-76
    • /
    • 2021
  • Nanta means "tapping" using percussion instruments such as drums, which is the rhythm of Samulnori, a tradtional Korean music. Nanta speech rhythm intervention program was developed and applied for the children with limited speech sound production and investigated its effect. Nanta program provided audible stimulation, various sound loudness and beats, and rhythms. Nanta program consists of three stages : Respiration, phonation and articulation with the rhythm. Six children with language development delay participated in this study. Children were encouraged to explore sounds and beats and freely express sounds and beats. Along with the rhythm, children also were encouraged to produce speech sounds by increasing the length of syllables in mimetic and imitating words. A total of 15 sessions were conducted twice a week for 40 minutes per session. For exploring the effectiveness, raw scores from preschool receptive-expressive scales (PRES) and receptive-expressive vocabulary test (REVT) were obtained and compared before and after therapy. The results demonstrated that significantly improved receptive (p=.027) and expressive language scores (p=.024) in PRES and receptive (p=.028) and expressive (p=.028) vocabulary scores following intervention using Wilcoxon signed-rank test.These findings suggest that the nanta rhythm program can be useful for improving language development and vocabulary in children with limited speech sound production.

Open Domain Machine Reading Comprehension using InferSent (InferSent를 활용한 오픈 도메인 기계독해)

  • Jeong-Hoon, Kim;Jun-Yeong, Kim;Jun, Park;Sung-Wook, Park;Se-Hoon, Jung;Chun-Bo, Sim
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.89-96
    • /
    • 2022
  • An open domain machine reading comprehension is a model that adds a function to search paragraphs as there are no paragraphs related to a given question. Document searches have an issue of lower performance with a lot of documents despite abundant research with word frequency based TF-IDF. Paragraph selections also have an issue of not extracting paragraph contexts, including sentence characteristics accurately despite a lot of research with word-based embedding. Document reading comprehension has an issue of slow learning due to the growing number of parameters despite a lot of research on BERT. Trying to solve these three issues, this study used BM25 which considered even sentence length and InferSent to get sentence contexts, and proposed an open domain machine reading comprehension with ALBERT to reduce the number of parameters. An experiment was conducted with SQuAD1.1 datasets. BM25 recorded a higher performance of document research than TF-IDF by 3.2%. InferSent showed a higher performance in paragraph selection than Transformer by 0.9%. Finally, as the number of paragraphs increased in document comprehension, ALBERT was 0.4% higher in EM and 0.2% higher in F1.

An Attention Method-based Deep Learning Encoder for the Sentiment Classification of Documents (문서의 감정 분류를 위한 주목 방법 기반의 딥러닝 인코더)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.268-273
    • /
    • 2017
  • Recently, deep learning encoder-based approach has been actively applied in the field of sentiment classification. However, Long Short-Term Memory network deep learning encoder, the commonly used architecture, lacks the quality of vector representation when the length of the documents is prolonged. In this study, for effective classification of the sentiment documents, we suggest the use of attention method-based deep learning encoder that generates document vector representation by weighted sum of the outputs of Long Short-Term Memory network based on importance. In addition, we propose methods to modify the attention method-based deep learning encoder to suit the sentiment classification field, which consist of a part that is to applied to window attention method and an attention weight adjustment part. In the window attention method part, the weights are obtained in the window units to effectively recognize feeling features that consist of more than one word. In the attention weight adjustment part, the learned weights are smoothened. Experimental results revealed that the performance of the proposed method outperformed Long Short-Term Memory network encoder, showing 89.67% in accuracy criteria.

Image Compression Using DCT Map FSVQ and Single - side Distribution Huffman Tree (DCT 맵 FSVQ와 단방향 분포 허프만 트리를 이용한 영상 압축)

  • Cho, Seong-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2615-2628
    • /
    • 1997
  • In this paper, a new codebook design algorithm is proposed. It uses a DCT map based on two-dimensional discrete cosine of transform (2D DCT) and finite state vector quantizer (FSVQ) when the vector quantizer is designed for image transmission. We make the map by dividing input image according to edge quantity, then by the map, the significant features of training image are extracted by using the 2D DCT. A master codebook of FSVQ is generated by partitioning the training set using binary tree based on tree-structure. The state codebook is constructed from the master codebook, and then the index of input image is searched at not master codebook but state codebook. And, because the coding of index is important part for high speed digital transmission, it converts fixed length codes to variable length codes in terms of entropy coding rule. The huffman coding assigns transmission codes to codes of codebook. This paper proposes single-side growing huffman tree to speed up huffman code generation process of huffman tree. Compared with the pairwise nearest neighbor (PNN) and classified VQ (CVQ) algorithm, about Einstein and Bridge image, the new algorithm shows better picture quality with 2.04 dB and 2.48 dB differences as to PNN, 1.75 dB and 0.99 dB differences as to CVQ respectively.

  • PDF

Yun Chi-ho's English Diary and English Writing Education (윤치호 영어 일기와 영어 쓰기 교육)

  • Seo, Min-Won
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.8
    • /
    • pp.528-541
    • /
    • 2014
  • The purpose of this study is to analyze Yun Chi-ho's English Diary both quantitatively and qualitatively. A corpus of 574 diary texts is created from his first and last years in an English speaking environment, and analyzed by two language analysis programs, RANGE and Coh-Metrix. His later diaries have more words in total, and longer average sentence length than his earlier diaries. Also, the Coh-Metrix indices in syntactic complexity and referential cohesion are higher in his later diaries. A qualitative analysis of 57 diary texts shows some improvement in his use of language forms. The most frequent topics of his journals are Christianity, everyday life, politics of Korea and his English studies. His constant effort to keep his journal and correspondence with foreigners, almost all in English, is estimated as one of the key factors for his successful English acquisition. Educational implications for EFL writing courses are discussed.

Korean Coreference Resolution using Stacked Pointer Networks based on Position Encoding (포지션 인코딩 기반 스택 포인터 네트워크를 이용한 한국어 상호참조해결)

  • Park, Cheoneum;Lee, Changki
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.113-121
    • /
    • 2018
  • Position encoding is a method of applying weights according to position of words that appear in a sentence. Pointer networks is a deep learning model that outputs corresponding index with an input sequence. This model can be applied to coreference resolution using attribute. However, the pointer networks has a problem in that its performance is degraded when the length of input sequence is long. To solve this problem, we proposed two contributions to resolve the coreference. First, we applied position encoding and dynamic position encoding to pointer networks. Second, we stack deeply layers of encoder to make high-level abstraction. As results, the position encoding based stacked pointer networks model proposed in this paper had a CoNLL F1 performance of 71.78%, which was improved by 6.01% compared to vanilla pointer networks.

Remnants of Culture in Journal Article Titles: A Comparison between the United States and Korea in the Field of Social Sciences (논문 제목상의 문화적 흔적: 한국과 미국의 사회과학분야 비교)

  • Kim, Eungi
    • Journal of Korean Library and Information Science Society
    • /
    • v.46 no.1
    • /
    • pp.345-372
    • /
    • 2015
  • Most academic journals in the world today typically require submission of journal article titles in English. However, most authors and reviewers are insensitive to the fact that cultural differences at a national level exist in writing titles. In this paper, journal article titles that have been published in the United States and Korea were compared in order to find cross-national cultural characteristics in these titles. To conduct this study, sample titles in the field of social sciences were obtained from two bibliographic databases-Scopus and RISS. A frequency count on number of variables was used: length of title, types of titles and n-gram phrases. In addition, a variety of similarities and differences found from this study including the type of words and phrases that Korean authors tend to favor in journal articles. The results showed that there is a considerable amount of cultural related variability in the construction of journal article titles. This study suggests that cross national characteristics of journal article titles should be emphasized in the future.

Slant Estimation and Correction for the Off-Line Handwritten Hangul String Using Hough transform (Hough 변환을 이용한 오프라인 필기 한글 문자열의 기울기 추정 및 교정)

  • 이성환;이동준
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.243-260
    • /
    • 1993
  • This paper presents an efficient method for estimationg and correcting the slant of off-line handwritten Hangul strings.In the proposed method,after extracting contours from input image.Hough tranform is applied to the contours to detect lines and estimate slants of the lines.When Hough trans form is applied to the contours,pixels which are not parts of the same stroke could be detected as a line.In order to exclude these lines from slant estimation process,detected lines which have the length less than threshold are eliminated.Experiments have been performed with address images which were extracted from live envelopes provided by Seoul Mail Center.Experimental results show that the proposed method is superior to the previous methods,which had been done with handwritten English strings.in estimation the slant of off-line handwritten Hangul strings.