• Title/Summary/Keyword: Word-Prediction

Search Result 114, Processing Time 0.028 seconds

Prediction of Physical Examination Demand Using Text Mining (텍스트 마이닝을 이용한 건강검진 수요 예측)

  • Park, Kyungbo;Kim, Mi Ryang
    • Journal of Information Technology Services
    • /
    • v.21 no.5
    • /
    • pp.95-106
    • /
    • 2022
  • Recently, physical examinations have become an important strategy to reduce costs for individuals and society. Pre-physical counseling is important for an effective physical examination. However, incomplete counseling is being conducted because the demand for physical examinations is not predicted. Therefore, in this study, the demand for physical examination was predicted using text mining and stepwise regression. As a result of the analysis, the most recent text data showed a high explanatory power of the demand for physical examination. Also, large amounts of data have high explanatory power. In addition, it was found that the high frequency of the text "health food" reduces the number of health examination customers. And the higher the frequency of the text of the word "food", the lower the number of physical examination customers. However, when the word "wild ginseng" was exposed a lot on Twitter, the number of physical examination customers visiting hospitals increased. In other words, customers consume efficiently by comparing the health examination price with the price of consumer goods. The proposed research framework can help predict demand in other industries.

Comparative analysis of model performance for predicting the customer of cafeteria using unstructured data

  • Seungsik Kim;Nami Gu;Jeongin Moon;Keunwook Kim;Yeongeun Hwang;Kyeongjun Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.5
    • /
    • pp.485-499
    • /
    • 2023
  • This study aimed to predict the number of meals served in a group cafeteria using machine learning methodology. Features of the menu were created through the Word2Vec methodology and clustering, and a stacking ensemble model was constructed using Random Forest, Gradient Boosting, and CatBoost as sub-models. Results showed that CatBoost had the best performance with the ensemble model showing an 8% improvement in performance. The study also found that the date variable had the greatest influence on the number of diners in a cafeteria, followed by menu characteristics and other variables. The implications of the study include the potential for machine learning methodology to improve predictive performance and reduce food waste, as well as the removal of subjective elements in menu classification. Limitations of the research include limited data cases and a weak model structure when new menus or foreign words are not included in the learning data. Future studies should aim to address these limitations.

A New Speech Quality Measure for Speech Database Verification System (음성 인식용 데이터베이스 검증시스템을 위한 새로운 음성 인식 성능 지표)

  • Ji, Seung-eun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.3
    • /
    • pp.464-470
    • /
    • 2016
  • This paper presents a speech recognition database verification system using speech measures, and describes a speech measure extraction algorithm which is applied to this system. In our previous study, to produce an effective speech quality measure for the system, we propose a combination of various speech measures which are highly correlated with WER (Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we increase the system independency by employing GMM acoustic score instead of HMM score which is obtained by a secondary speech recognition system. The combination with GMM score shows a slightly lower correlation with WER compared to the combination with HMM score, however it presents a higher relative improvement in correlation with WER, which is calculated compared to the correlation of each speech measure alone.

Glottal Weighted Cepstrum for Robust Speech Recognition (잡음에 강한 음성 인식을 위한 성문 가중 켑스트럼에 관한 연구)

  • 전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.78-82
    • /
    • 1999
  • This paper is a study on weighted cepstrum used broadly for robust speech recognition. Especially, we propose the weighted function of asymmetric glottal pulse shape. which is used for weighted cepstrum extracted by PLP(Perceptual Linear Predictive) based on auditory model. Also, we analyze this glottal weighted cepstrum from the glottal pulse of glottal model in connection with the cepstrum. And we obtain speech features analyzed by both the glottal model and the auditory model. The isolated-word recognition rate is adopted for the test of proposed method in the car moise and street environment. And the performance of glottal weighted cepstrum is compared with both that of weighted cepstrum extracted by LP(Linear Prediction) and that of weighted cepstrum extracted by PLP. The result of computer simulation shows that recognition rate of the proposed glottal weighted cepstrum is better than those of other weighted cepstrums.

  • PDF

Propeller Racing of Ocean-going Ships with Twin Screw Propellers (2축선의 프로펠러 레이싱 추정법에 관한 연구)

  • Park, J.H.
    • Journal of Power System Engineering
    • /
    • v.11 no.1
    • /
    • pp.98-106
    • /
    • 2007
  • This paper presents a statistical prediction procedure for the propeller racing of ships with twin screw propellers sailing in ocean waves. The propeller racing is one of the most important factors of seakeeping qualities in relation to the safety of main engine and shafting system. It is especially significant key word for designing the twin-screw-propeller-type ship in view of allowable maximum propeller diameter etc.. In former studies, the propeller racing generally means the situation (propeller exposed) in which the relative motion amplitude between ship hull and wave surface would exceed a depth of point in rotary disk propeller. Therefore, it seems that the magnitude of the amplitude and its exceeding frequency have been examined as a principal subject of study as usual. However, the time during which the amplitude exceeds a depth of point must be also one of most important factor affecting the trend of propeller racing. This paper proposes a simply practical method for estimating the time lasting of exposed propeller related to twin screw propeller racing in rough confused seas on the basis of the statistics. Then, it is confirmed that the practical method is useful and convenience for considering the propeller racing in the stage of the basic design.

  • PDF

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

Constructing Negative Links from Multi-facet of Social Media

  • Li, Lin;Yan, YunYi;Jia, LiBin;Ma, Jun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2484-2498
    • /
    • 2017
  • Various types of social media make the people share their personal experience in different ways. In some social networking sites. Some users post their reviews, some users can support these reviews with comments, and some users just rate the reviews as kind of support or not. Unfortunately, there is rare explicit negative comments towards other reviews. This means if there is a link between two users, it must be positive link. Apparently, the negative link is invisible in these social network. Or in other word, the negative links are redundant to positive links. In this work, we first discuss the feature extraction from social media data and propose new method to compute the distance between each pair of comments or reviews on social media. Then we investigate whether we can predict negative links via regression analysis when only positive links are manifested from social media data. In particular, we provide a principled way to mathematically incorporate multi-facet data in a novel framework, Constructing Negative Links, CsNL to predict negative links for discovering the hidden information. Additionally, we investigate the ways of solution to general negative link predication problems with CsNL and its extension. Experiments are performed on real-world data and results show that negative links is predictable with multi-facet of social media data by the proposed framework CsNL. Essentially, high prediction accuracy suggests that negative links are redundant to positive links. Further experiments are performed to evaluate coefficients on different kernels. The results show that user generated content dominates the prediction performance of CsNL.

Analysis and Prediction of Prosodic Phrage Boundary (운율구 경계현상 분석 및 텍스트에서의 운율구 추출)

  • Kim, Sang-Hun;Seong, Cheol-Jae;Lee, Jung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.24-32
    • /
    • 1997
  • This study aims to describe, at one aspect, the relativity between syntactic structure and prosodic phrasing, and at the other, to establish a suitable phrasing pattern to produce more natural synthetic speech. To get meaningful results, all the word boundaries in the prosodic database were statistically analyzed, and assigned by the proper boundary type. The resulting 10 types of prosodic boundaries were classified into 3 types according to the strength of the breaks, which are zero, minor, and major break respectively. We have found out that the durational information was a main cue to determine the major prosodic boundary. Using the bigram and trigram of syntactic information, we predicted major and minor classification of boundary types. With brigram model, we obtained the correct major break prediction rates of 4.60%, 38.2%, the insertion error rates of 22.8%, 8.4% on each Test-I and Test-II text database respectively. With trigram mode, we also obtained the correct major break prediction rates of 58.3%, 42.8%, the insertion error rates of 30.8%, 42.8%, the insertion error rates of 30.8%, 11.8% on Test-I and Test-II text database respectively.

  • PDF

Automatic Prediction of 'Anti-Search Variants' of Twitter based on Word Embeddings and Phonetic Similarity (단어 임베딩과 음성적 유사도를 이용한 트위터 '서치 방지 단어'의 자동 예측)

  • Lee, Sangah
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.190-193
    • /
    • 2017
  • '서치 방지 단어'는 SNS 상에서 사용자들이 작성한 문서의 검색 및 수집을 피하기 위하여 사용하는 변이형을 뜻한다. 하나의 검색 키워드가 있다면 그와 같은 대상을 나타내는 변이형이 여러 형태로 존재할 수 있으며, 이들 변이형에 대한 검색 결과를 함께 수집할 수 있다면 데이터 확보가 중요하게 작용하는 다양한 연구에 큰 도움이 될 것이다. 본 연구에서는 특정 단어가 주어진 키워드로부터 의미 벡터 상의 거리가 가까울수록, 그리고 주어진 키워드와 비슷한 음성적 형태 즉 발음을 가질수록, 해당 키워드의 변이형일 가능성이 높을 것이라고 가정하였다. 이에 따라 단어 임베딩을 이용한 의미 유사도와 최소 편집 거리를 응용한 음성적 유사도를 이용하여 주어진 검색 키워드와 유사한 변이형들을 제안하고자 하였다. 그 결과 구성된 변이형 후보의 목록에는 다양한 형태의 단어들이 포함되었으며, 이들 중 다수가 실제 SNS 상에서 같은 의미로 사용되고 있음이 확인되었다.

  • PDF

Real-time implementation of the G.728 speech codec using the Vincent6 DSP core (Vincent6 DSP코어를 이용한 G.728 음성 부호화기의 실시간 구현)

  • 성호상
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.131-135
    • /
    • 2000
  • 본 논문에서는 고성능 고정 소수점 DSP (Digital Signal Processor) 코어인 Vincent6 코어 [1]를 이용하여 ITU-T C.728 음성 부호화기를 실시간으로 구현하였다 G.728 은 16 kb/s전송률의 ITU-T표준 음성 부호화기이며, 입력신호는 8 kHz로 샘플링되며 샘플 당 16 bit 로 양자화된 PCM 신호이다. G.728 은 LD-CELP(Low Delay Code Excited Linear Prediction)라고도 하며, 알고리 듬 delay는 0.625ms 이다. Vincent6 DSP core 는 VLIW (Very-Long Instruction Word) 특성을 가지므로 다중 명령 (multiple instruction)을 수행할 수 있다 이를 위해서 G.728 annex G를 이용하여 고정 소숫점 연산으로 코드를 작성한 후, 이를 vincent6 어셈블리 코드로 구현하였다. 최종적으로 구현된 코드는 ITU-T 의 test vector 에 대 해 bit exact 한 결과를 보이며 34 MCPS (Million Cycles Per Second)의 계산량을 가지며 사용 메모리크기는 데이터 메모리가 약 9KByte, 프로그램 메모리가 약 57 KByte 이다.

  • PDF