• Title/Summary/Keyword: Text Index

Search Result 268, Processing Time 0.023 seconds

Using noise filtering and sufficient dimension reduction method on unstructured economic data (노이즈 필터링과 충분차원축소를 이용한 비정형 경제 데이터 활용에 대한 연구)

  • Jae Keun Yoo;Yujin Park;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.119-138
    • /
    • 2024
  • Text indicators are increasingly valuable in economic forecasting, but are often hindered by noise and high dimensionality. This study aims to explore post-processing techniques, specifically noise filtering and dimensionality reduction, to normalize text indicators and enhance their utility through empirical analysis. Predictive target variables for the empirical analysis include monthly leading index cyclical variations, BSI (business survey index) All industry sales performance, BSI All industry sales outlook, as well as quarterly real GDP SA (seasonally adjusted) growth rate and real GDP YoY (year-on-year) growth rate. This study explores the Hodrick and Prescott filter, which is widely used in econometrics for noise filtering, and employs sufficient dimension reduction, a nonparametric dimensionality reduction methodology, in conjunction with unstructured text data. The analysis results reveal that noise filtering of text indicators significantly improves predictive accuracy for both monthly and quarterly variables, particularly when the dataset is large. Moreover, this study demonstrated that applying dimensionality reduction further enhances predictive performance. These findings imply that post-processing techniques, such as noise filtering and dimensionality reduction, are crucial for enhancing the utility of text indicators and can contribute to improving the accuracy of economic forecasts.

A Comparative Study of WWW Search Engine Performance (WWW 탐색도구의 색인 및 탐색 기능 평가에 관한 연구)

  • Chung Young-Mee;Kim Seong-Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.1
    • /
    • pp.153-184
    • /
    • 1997
  • The importance of WWW search services is increasing as Internet information resources explode. An evaluation of current 9 search services was first conducted by comparing descriptively the features concerning indexing, searching, and ranking of search results. Secondly, a couple of search queries were used to evaluate search performance of those services by the measures of retrieval effectiveness. the degree of overlap in searching sites, and the degree of similarity between services. In this experiment, Alta Vista, HotBot and Open Text Index showed better results for the retrieval effectiveness. The level of similarity among the 9 search services was extremely low.

  • PDF

A Study on Generation Method of Intonation using Peak Parameter and Pitch Lookup-Table (Peak 파라미터와 피치 검색테이블을 이용한 억양 생성방식 연구)

  • Jang, Seok-Bok;Kim, Hyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.184-190
    • /
    • 1999
  • 본 논문에서는 Text-to-Speech 시스템에서 사용할 억양 모델을 위해 음성 DB에서 모델 파라미터와 피치 검색테이블(lookup-table)을 추출하여 미리 구성하고, 합성시에는 이를 추정하여 최종 F0 값을 생성하는 자료기반 접근방식(data-driven approach)을 사용한다. 어절 경계강도(break-index)는 경계강도의 특성에 따라 고정적 경계강도와 가변적 경계강도로 세분화하여 사용하였고, 예측된 경계강도를 기준으로 억양구(Intonation Phrase)와 액센트구(Accentual Phrase)를 설정하였다. 특히, 액센트구 모델은 인지적, 음향적으로 중요한 정점(peak)을 정확하게 모델링하는 것에 주안점을 두어 정점(peak)의 시간축, 주파수축 값과 이를 기준으로 한 앞뒤 기울기를 추정하여 4개의 파라미터로 설정하였고, 이 파라미터들은 CART(Classification and Regression Tree)를 이용하여 예측규칙을 만들었다. 경계음조가 나타나는 조사, 어미는 정규화된(normalized) 피치값과 key-index로 구성되는 검색테이블을 만들어 보다 정교하게 피치값을 예측하였다. 본 논문에서 제안한 억양 모델을 본 연구실에서 제작한 음성합성기를 통해 합성하여 청취실험을 거친 결과, 기존의 상용 Text-to-Speech 시스템에 비해 자연스러운 합성음을 얻을 수 있었다.

  • PDF

A Study on City Brand Evaluation Method Using Text Mining : Focused on News Media (텍스트 마이닝 기법을 활용한 도시 브랜드 평가방법론 연구 : 뉴스미디어를 중심으로)

  • Yoon, Seungsik;Shin, Minchul;Kang, Juyoung
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.153-171
    • /
    • 2019
  • Competition among cities has become fierce with decentralization and globalization, and each city tries to establish a brand image of the city to build its competitiveness and implement its policies based on it. At this time, surveys, expert interviews, etc. are commonly used to establish city brands. These methods are difficult to establish as sampling methods an empirical component, the biggest component of a city brand. In this paper, therefore, based on the precedent research's urban brand measurement and components, the words representing each city image property were extracted and relocated to five indicators to form the evaluation index. The constructed indicators have been validated through the review of three experts. Through the index, we analyzed the brands of four cities, Ulsan, Incheon, Yeosu, and Gyeongju, and identified the factors by using Topic Modeling and Word Cloud. This methodology is expected to reduce costs and monitor timely in identifying and analyzing urban brand images in the future.

Topic Modeling on Research Trends of Industry 4.0 Using Text Mining (텍스트 마이닝을 이용한 4차 산업 연구 동향 토픽 모델링)

  • Cho, Kyoung Won;Woo, Young Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.764-770
    • /
    • 2019
  • In this research, text mining techniques were used to analyze the papers related to the "4th Industry". In order to analyze the papers, total of 685 papers were collected by searching with the keyword "4th industry" in Korea Journal Index(KCI) from 2016 to 2019. We used Python-based web scraping program to collect papers and use topic modeling techniques based on LDA algorithm implemented in R language for data analysis. As a result of perplexity analysis on the collected papers, nine topics were determined optimally and nine representative topics of the collected papers were extracted using the Gibbs sampling method. As a result, it was confirmed that artificial intelligence, big data, Internet of things(IoT), digital, network and so on have emerged as the major technologies, and it was confirmed that research has been conducted on the changes due to the major technologies in various fields related to the 4th industry such as industry, government, education field, and job.

A Sentiment Analysis of Customer Reviews on the Connected Car using Text Mining: Focusing on the Comparison of UX Factors between Domestic-Overseas Brands (텍스트 마이닝을 활용한 커넥티드 카 고객 리뷰의 감성 분석: 국내-해외 브랜드간 UX 요인 비교를 중심으로)

  • Youjung Shin;Junho Choi;Sung Woo Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.517-528
    • /
    • 2023
  • The purpose of this study is to analyze and compare UX factors of connectivity systems of domestic and overseas car brands. Using a text mining analysis, UX factors of domestic and overseas brands were compared through positive-negative sentiment index. After collecting 120,000 reviews on Hyundai Motor Group (Hyundai, Kia, Genesis) and 190,000 on Tesla, BMW, and Mercedes, pre-processing was performed. Keywords were classified into 11 UX factors in 3 dimensions of the system connection, information, and service. For domestic brands, sentiment index for 'safety' was the highest. For overseas brands, 'entertainment' was the most positive UX factor.

A Study on Questionnaire Improvement using Text Mining (텍스트 마이닝 기법을 활용한 설문 문항 개선에 관한 연구)

  • Paek, Yun-Ji;Jung, Chang-Hyun
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.121-128
    • /
    • 2020
  • The Marine Safety Culture Index (MSCI) was developed in the year 2018 for objectively assessing the public safety culture levels and for incorporating it as data to spread knowledge regarding the marine safety culture. The method for calculating the safety culture index should include issues that may affect the safety culture and should consist of appropriate attributes for estimating the current status. In addition, continuous verification and supplementation are required for addressing social and economic changes. In this study, to determine whether the questionnaire designed by marine experts reflects the people's interests and needs, we analyzed 915 marine safety proposals. Text mining was employed for analyzing the unstructured data of the marine safety proposals, and network analysis and topic modeling were subsequently performed. Analysis of the marine safety proposals was centered on attributes such as education, public relations, safety rules, awareness, skilled workers, and systems. Eighteen questions were modified and supplemented for reflecting the marine safety proposals, and reliability of the revised questions was analyzed. Furthermore, compared to the previous year, the questionnaire's internal consistency was improved upon and was rated at a high value of 0.895. It is expected that by employing the derived marine safety culture index and incorporating the improved questionnaire that reflects the requirements of marine experts and the people, the improved questionnaire will contribute to the establishment of policies for spreading knowledge regarding the marine safety culture.

Efficient Dynamic Index Structure for SSD (SPM) (SSD에 적합한 동적 색인 저장 구조 : SPM)

  • Jin, Du-Seok;Kim, Jin-Suk;You, Beom-Jong;Jung, Hoe-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.54-62
    • /
    • 2010
  • Inverted index structures have become the most efficient data structure for high performance indexing of large text collections, especially online index maintenance, In-Place and merge-based index structures are the two main competing strategies for index construction in dynamic search environments. In the above-mentioned two strategies, a contiguity of posting information is the mainstay of design for online index maintenance and query time. Whereas with the emergence of new storage device(SSD, SCRAM), those do not consider a contiguity of posting information in the design of index structures because of its superiority such as low access latency and I/O throughput speeds. However, SSD(Solid State Drive) is not well suited for traditional inverted structures due to the poor random write throughput in practical systems. In this paper, we propose the new efficient online index structure(SPM) for SSD that significantly reduces the query time and improves the index maintenance performance.

Developing Innovation Index of Hospital Service Using 6 Sigma and SERVQUAL (6 시그마와 SERVQUAL을 활용한 병원서비스 혁신지표 개발)

  • Oh, Ka-Eun;Bak, Won-Sook;Han, Sang-Sook;Park, Sang-Chan;Lee, Sang-Chul
    • Journal of Korean Society for Quality Management
    • /
    • v.41 no.4
    • /
    • pp.555-566
    • /
    • 2013
  • Purpose: The purpose of this study is to develop innovation index of hospital service integrating 6 sigma and SERVQUAL. Methods: This study used DMA(Define, Measure and Analysis) from 6 sigma and 5 Factors from SEVQUAL. To test data, chi-squire text, association analysis and behavior analysis was conducted. Results: This study indicated the management index through CTQ (Critical to Quality) and Chosen few X using 6 sigma process. Finally, And this study developed 5 Factors; Equipment Utilization in Tangibility, Ratio of Patients/Disease/Behavior/Treatment in Reliability, Survival RAte, Canselation Rate of Reservation, Churn Rate, Interval of Treatment and Confidence in Responsiveness, Frequency of Patients/Disease/Behavior/Treatment in Assurance and Contrast to Best Department/Best Doctor/Best Doctor in Faculty/Average of Mine in Empathy. Conclusion: This study developed innovation index of hospital service. Managing this index, hospital is able to achieve the decline of total treatment cycle, adjustment of patients behavior and increase of equipment utilization. Ultimately, hospital is able to accomplish innovation of healthcare service.

An Index Method for Storing and Extracting XML Documents (XML 문서의 저장과 추출을 위한 색인 기법)

  • Kim Woosaeng;Song Jungsuk
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.2
    • /
    • pp.154-163
    • /
    • 2005
  • Because most researches that were studied so far on XML documents used an absolute coordinate system in most of the index techniques, the update operation makes a large burden. To express the structural relations between elements, attributes and text, we need to reconstruct the structure of the coordinates. As the reconstruction process proceeds through out the entire XML document in a cascade manner, which is not limited to the current changing node, a serious performance problem may be caused by the frequent update operations. In this paper, we propose an index technique based on extensible index that does not cause serious performance degradations. It can limit the number of node to participate in reconstruction process and improve lots of performance capacities on the whole. And extensible index performs the containment relationship query by the simple expression using SQL statement.

  • PDF