• Title/Summary/Keyword: Sentence Reduction

Search Result 25, Processing Time 0.023 seconds

A Sentence Reduction Method using Part-of-Speech Information and Templates (품사 정보와 템플릿을 이용한 문장 축소 방법)

  • Lee, Seung-Soo;Yeom, Ki-Won;Park, Ji-Hyung;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.313-324
    • /
    • 2008
  • A sentence reduction is the information compression process which removes extraneous words and phrases and retains basic meaning of the original sentence. Most researches in the sentence reduction have required a large number of lexical and syntactic resources and focused on extracting or removing extraneous constituents such as words, phrases and clauses of the sentence via the complicated parsing process. However, these researches have some problems. First, the lexical resource which can be obtained in loaming data is very limited. Second, it is difficult to reduce the sentence to languages that have no method for reliable syntactic parsing because of an ambiguity and exceptional expression of the sentence. In order to solve these problems, we propose the sentence reduction method which uses templates and POS(part of speech) information without a parsing process. In our proposed method, we create a new sentence using both Sentence Reduction Templates that decide the reduction sentence form and Grammatical POS-based Reduction Rules that compose the grammatical sentence structure. In addition, We use Viterbi algorithms at HMM(Hidden Markov Models) to avoid the exponential calculation problem which occurs under applying to Sentence Reduction Templates. Finally, our experiments show that the proposed method achieves acceptable results in comparison to the previous sentence reduction methods.

Public perceptions of the reasons underlying sentence reduction for sex crimes against persons with intellectual disability (지적장애인 대상 성범죄 재판 시 형의 감경사유에 대한 국민들의 인식)

  • Yi, Misun
    • Korean Journal of Forensic Psychology
    • /
    • v.12 no.3
    • /
    • pp.323-341
    • /
    • 2021
  • This study examined public perceptions of the reasons underlying sentence reduction for defendants convicted of sex crimes against persons with intellectual disability. An online survey was conducted among 522 adults in South Korea. Respondent endorsement of 20 reasons underlying sentence reduction, which were embedded within the respective rulings, and the perceived appropriateness of statutory sentence for the crimes committed were assessed. The results showed that most respondents endorsed the sentence; moreover, those who disagreed underscored the need for more severe punishment. Almost all the respondents perceived the following reasons and explanations unfavorably: impulsiveness caused by sexual arousal or alcohol consumption; an accidental occurrence; and personal characteristics such as defendant age, health condition, socioeconomic status, developmental history, and family background. However, there was a relative agreement in that the damage caused by the incident was relatively minor, or the defendant's reflection and attitude to recover the damage were used as reasons for the reduction. Differences in respondent perceptions of the reasons underlying sentence reduction as a function of gender and educational background were relatively small. However, younger respondents held harsher attitudes toward sentence reduction. The present findings underscore the need to be mindful of victims' statements and the characteristics of persons with intellectual disability while sentencing.

Speech Recognition for twenty questions game (스무고개 게임을 위한 음성인식)

  • 노용완;윤재선;홍광석
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.203-206
    • /
    • 2002
  • In this paper, we present a sentence speech recognizer for twenty questions game. The proposed approaches for speaker-independent sentence speech recognition can be divided into two steps. One is extraction of the number of syllables in eojeol for candidate reduction, and the other is knowledge based language model for sentence recognition. For twenty questions game, we implemented speech recognizer using 956 sentences and 1095 eojeols. The results obtained in our experiments were 87% sentence recognition rate and 90.15% eojeol recognition rate.

  • PDF

Stuttering Reduction Rate during Sentence Reading: Choral Speech and Altered Auditory Feedback (문장읽기에서의 말더듬 감소율: 합독과 변조청각피드백)

  • Park, Jin;Park, Heeyoung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.109-115
    • /
    • 2012
  • This paper mainly aims to investigate how differently choral speech and altered auditory feedback (i.e., delayed auditory feedback, frequency-altered feedback) enhance speech fluency during sentence reading. To do this, a stuttering reduction rate was used and measured how much stuttering in frequency was reduced during each of the fluency enhancing conditions (i.e, typical choral reading, DAF, FAF) relative to typical solo reading. The results showed that stuttering frequency was reduced in the three fluency enhancing conditions and the highest mean value in stuttering reduction rate was observed during typical choral reading. Some discussion was provided in relation to the stuttering reduction rate observed during typical choral reading and its further speculation.

Factors Affecting Changes in English from a Synthetic Language to an Analytic One

  • Hyun, Wan-Song
    • English Language & Literature Teaching
    • /
    • v.13 no.2
    • /
    • pp.47-61
    • /
    • 2007
  • The purpose of this paper is to survey the major elements that have changed English from a synthetic language to an analytic one. Therefore, this paper has looked at the differences between synthetic languages and analytic ones. In synthetic languages, the relation of words in a sentence is synthetically determined by means of inflections, while in analytic languages, the functions of words in a sentence are analytically determined by means of word order and function words. Thus, Old English with full inflectional systems shows the synthetic nature. However, in the course of time, Old English inflections came to be lost by phonetic changes and operation, which made English dependent on word order and function words to signal the relation of words in a sentence. The major phonetic changes that have shifted English are the change of final /m/ to /n/, the leveling of unstressed vowels, the loss of final /n/, and the decay of schwa in final syllables. These changes led to reduction of inflections of English as well as the loss of grammatical gender. The operation of analogy, the tendency of language to follow certain patterns and to adapt a less common form to a more familiar one, has also played an important role in changing English.

  • PDF

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

  • Kim, Sang-Hun;Lee, Young-Jik;Hirose, Keikichi
    • ETRI Journal
    • /
    • v.23 no.4
    • /
    • pp.168-176
    • /
    • 2001
  • This paper discusses two important issues of corpus-based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra-word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ-based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.

  • PDF

The Effect of Rain on Traffic Flows in Urban Freeway Basic Segments (기상조건에 따른 도시고속도로 교통류변화 분석)

  • 최정순;손봉수;최재성
    • Journal of Korean Society of Transportation
    • /
    • v.17 no.1
    • /
    • pp.29-39
    • /
    • 1999
  • An earlier study of the effect of rain found that the capacity of freeway systems was reduced, but did not address the effects of rain on the nature of traffic flows. Indeed, the substantial variation due to the intensity of adverse weather conditions is entirely rational so that its effects must be considered in freeway facility design. However, all of the data in Highway Capacity Manual(HCM) have come from ideal conditions. The primary objective of this study is to investigate the effect of rain on urban freeway traffic flows in Seoul. To do so, the relations between three key traffic variables(flow rates, speed, occupancy), their threshold values between congested and uncontested traffic flow regimes, and speed distribution were investigated. The traffic data from Olympic Expressway in Seoul were obtained from Imagine Detection System (Autoscope) with 30 seconds and 1 minute time periods. The slope of the regression line relating flow to occupancy in the uncongested regime decreases when it is raining. In essence, this result indicates that the average service flow rate (it may be interpreted as a capacity of freeway) is reduced as weather conditions deteriorate. The reduction is in the range between 10 and 20%, which agrees with the range proposed by 1994 US HCM. It is noteworthy that the service flow rates of inner lanes are relatively higher than those of other lanes. The average speed is also reduced in rainy day, but the flow-speed relationship and the threshold values of speed and occupancy (these are called critical speed and critical occupancy) are not very sensitive to the weather conditions.

  • PDF

An effective teaching method of English composition through error analysis (오류분석을 통한 효율적인 영작문 지도법)

  • Park, Byung-Je
    • English Language & Literature Teaching
    • /
    • no.1
    • /
    • pp.159-187
    • /
    • 1995
  • The purpose of this study is to investigate common errors made by Korean learners in English composition and to find out what is an effective and appropriate teaching method of English composition in Korea. For these purposes, 197 students on the third grade in high school were selected as the subjects of this research. The students were tested by way of the immediate translation of 31 simple Korean sentences into English which are supposed to be easy for those students to write without any difficulty. About 2 minutes were given for testing each sentence. The results are as follows : First. the whole sum of errors made by 197 students was 2,972 and these types of errors were classified into 13 categories by Duskova's grammatical method and James'. The errors with comparatively high frequency were prepositional errors(17.2%), verbal errors(15.4%), and the errors with low frequency were article errors(1.9%), to-infinitive errors. Second, when Korean students learn English as a target language, overgeneralization(33.6%) and reduction(17.5) influenced the learners much more greatly than language transfer(22.2) did. But the influence of language transfer including interference & overgeneralization(l5.2%) and interference & reduction(10.7%) was no less than 48.1%. The statistics shows that the learners have a tendency to analyze, systematize and regularize the target language when they start to learn a new language.

  • PDF