• Title/Summary/Keyword: Term Frequency

Search Result 1,615, Processing Time 0.029 seconds

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

Image Processing Based Time-Frequency Domain Reflectometry for Estimating the Fault Location Close to the Applied Signal Point (케이블 내 근접 결함 추정을 위한 영상 처리 기반의 시간 주파수 영역 반사파 계측법)

  • Jeong, Jong Min;Lee, Chun Ku;Yoon, Tae Sung;Park, Jin Bae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.12
    • /
    • pp.1683-1689
    • /
    • 2014
  • In this paper, we propose an image processing based time-frequency domain reflectometry(TFDR) in order to estimate the fault location of a cable. The Wigner-Ville distribution is used for analysis in both the time domain and the frequency domain when the conventional TFDR estimates the fault location in a cable. However, the Winger-Ville distribution is a bi-linear function, and hence the cross-term is occurred. The conventional TFDR cannot estimate the accurate fault location due to the cross-term in case the fault location is close to the position where the reference signal is applied to the cable. The proposed method can reduce the cross-term effectively using binarization and morphological image processing, and can estimate the fault location more accurately using the template matching based cross correlation compared to the conventional TFDR. To prove the performance of the proposed method, the actual experiments are carried out in some cases.

A Text Mining Analysis for Research Trend about the Mathematics Education (텍스트 마이닝 분석을 통한 수학교육 연구 동향 분석)

  • Jin, Mireu;Ko, Ho Kyoung
    • East Asian mathematical journal
    • /
    • v.35 no.4
    • /
    • pp.489-508
    • /
    • 2019
  • In this paper we used text mining method to analyze journals of mathematics education posterior to the year of 2016. To figure out trends of mathematics education research. we analyzed the key words largely mentioned in the recent mathematics education journals by Term Frequency and Term Frequency-Inverse Document Frequency method. We also looked at how these keywords match up with the key words that appear of education to prepare for future society. This result can infer the characteristics of mathematics education research in the aspect upcoming research topics.

A Study on the Factors Influencing Semantic Relation in Building a Structured Glossary (구조적 학술용어사전 데이터베이스 구축에 있어서 용어의 의미관계 형성에 영향을 미치는 요인에 관한 연구)

  • Kwon, Sun-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.2
    • /
    • pp.353-378
    • /
    • 2014
  • The purpose of this study is to find factors to affect on the formation of semantic relation from terminology and what is to be affected by these factors to build the database scheme of terminology dictionary by a structural definition. In this research, 826,905 keywords of 88,874 social science articles and 985,580 keywords of 125,046 humanities science articles in the KCI journals from 2007 to 2011 were collected. From collected data, subject complexity, structural hole, term frequency, occurrence pattern and an effect between the number of nodes and the number of patterns which were derived from the semantic relation of linked terms of established 'STNet' System were analyzed. The summarized results from analyzed data and network patterns are as follows. Betweenness Centrality, term frequency, and effective size affect the numbers of semantic relation node. Among these factors, betweenness centrality was the most effective and effective size. But term frequency was the least effective. Betweenness Centrality, term frequency, and effective size affect the numbers of semantic relation type. Term frequency is the most effective. Therefore, when building a terminology dictionary, factors of betweenness centrality, term frequency, effective size, and complexity of subject are needed to select term. As a result, these factors can be expected to improve the quality of terminology dictionary.

Offset Frequency Stabilization of He-Ne Lasers Using Phase Locked Loop (PLL을 이용한 헬륨-네온 레이저의 옵셋 주파수 안정화)

  • Yun Dong Hyun;Suh Ho Sung;Lyou Joon
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.6
    • /
    • pp.496-501
    • /
    • 2005
  • This paper presents experimental results of the frequency offset locking of He-Ne lasers and the stability analysis. The master laser is free running, and the slave laser is a single-mode operating laser. The frequency difference of two lasers is stabilized to 200 MHz which can be synchronized using PLL servo. The measured beat frequency between two lasers was 200.004 MHz ${\pm}$ 0.15 MHz. The square root of Allan variance as a measure of stability in time domain is also measured. The long-term stability of the beat was worse than sort-term stability. With a gate time $\tau=1000\;s$, the square root of Allan variance was about 1 GHz. The results of the square root of Allan variance of the stabilized beat signal was a gate time of $\tau=1000\;s$, the square root of Allan variance was about 1.5 kHz. The long-term stability was improved by more than several hundred times compared with that without the stabilization.

Comparison of AT1- and Kalman Filter-Based Ensemble Time Scale Algorithms

  • Lee, Ho Seong;Kwon, Taeg Yong;Lee, Young Kyu;Yang, Sung-hoon;Yu, Dai-Hyuk;Park, Sang Eon;Heo, Myoung-Sun
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.10 no.3
    • /
    • pp.197-206
    • /
    • 2021
  • We compared two typical ensemble time scale algorithms; AT1 and Kalman filter. Four commercial atomic clocks composed of two hydrogen masers and two cesium atomic clocks provided measurement data to the algorithms. The allocation of relative weights to the clocks is important to generate a stable ensemble time. A 30 day-average-weight model, which was obtained from the average Allan variance of each clock, was applied to the AT1 algorithm. For the reduced Kalman filter (Kred) algorithm, we gave the same weights to the two hydrogen masers. We also compared the frequency stabilities of the outcome from the algorithms when the frequency offsets and/or the frequency drift offsets estimated by the algorithms were corrected or not corrected by the KRISS-made primary frequency standard, KRISS-F1. We found that the Kred algorithm is more effective to generate a stable ensemble time scale in the long-term, and the algorithm also generates much enhanced short-term stability when the frequency offset is used for the calculation of the Allan deviation instead of the phase offset.

Statistical Techniques for Automatic Indexing and Some Experiments with Korean Documents (자동색인의 통계적기법과 한국어 문헌의 실험)

  • Chung Young Mee;Lee Tae Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.9
    • /
    • pp.99-118
    • /
    • 1982
  • This paper first reviews various techniques proposed for automatic indexing with special emphasis placed on statistical techniques. Frequency-based statistical techniques are categorized into the following three approaches for further investigation on the basis of index term selection criteria: term frequency approach, document frequency approach, and probabilistic approach. In the experimental part of this study, Pao's technique based on the Goffman's transition region formula and Harter's 2-Poisson distribution model with a measure of the potential effectiveness of index term were tested. Experimental document collection consists of 30 agriculture-related documents written in Korean. Pao's technique did not yield good result presumably due to the difference in word usage between Korean and English. However, Harter's model holds some promise for Korean document indexing because the evaluation result from this experiment was similar to that of the Harter's.

  • PDF

The Joint Frequency Function for Long-term Air Quality Prediction Models (장기 대기확산 모델용 안정도별 풍향·풍속 발생빈도 산정 기법)

  • Kim, Jeong-Soo;Choi, Doug-Il
    • Journal of Environmental Impact Assessment
    • /
    • v.5 no.1
    • /
    • pp.95-105
    • /
    • 1996
  • Meteorological Joint Frequency Function required indispensably in long-term air quality prediction models were discussed for practical application in Korea. The algorithm, proposed by Turner(l964), is processed with daily solar insolation and cloudiness and height basically using Pasquill's atmospheric stability classification method. In spite of its necessity and applicability, the computer program, called STAR(STability ARray), had some significant difficulties caused from the difference in meteorological data format between that of original U.S. version and Korean's. To cope with the problems, revised STAR program for Korean users were composed of followings; applicability in any site of Korea with regard to local solar angle modification; feasibility with both of data which observed by two classes of weather service centers; and examination on output format associated with prediction models which should be used.

  • PDF

Material as a Key Element of Fashion Trend in 2010~2019 - Text Mining Analysis - (패션 트렌트(2010~2019)의 주요 요소로서 소재 - 텍스트마이닝을 통한 분석 -)

  • Jang, Namkyung;Kim, Min-Jeong
    • Fashion & Textile Research Journal
    • /
    • v.22 no.5
    • /
    • pp.551-560
    • /
    • 2020
  • Due to the nature of fashion design that responds quickly and sensitively to changes, accurate forecasting for upcoming fashion trends is an important factor in the performance of fashion product planning. This study analyzed the major phenomena of fashion trends by introducing text mining and a big data analysis method. The research questions were as follows. What is the key term of the 2010SS~2019FW fashion trend? What are the terms that are highly relevant to the key trend term by year? Which terms relevant to the key trend term has shown high frequency in news articles during the same period? Data were collected through the 2010SS~2019FW Pre-Trend data from the leading trend information company in Korea and 45,038 articles searched by "fashion+material" from the News Big Data System. Frequency, correlation coefficient, coefficient of variation and mapping were performed using R-3.5.1. Results showed that the fashion trend information were reflected in the consumer market. The term with the highest frequency in 2010SS~2019FW fashion trend information was material. In trend information, the terms most relevant to material were comfort, compact, look, casual, blend, functional, cotton, processing, metal and functional by year. In the news article, functional, comfort, sports, leather, casual, eco-friendly, classic, padding, culture, and high-quality showed the high frequency. Functional was the only fashion material term derived every year for 10 years. This study helps expand the scope and methods of fashion design research as well as improves the information analysis and forecasting capabilities of the fashion industry.

Estimation of Voltage Swell Frequency Caused by Asymmetrical Faults

  • Park, Chang-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.4
    • /
    • pp.1376-1385
    • /
    • 2017
  • This paper proposes a method for estimating the expected frequency of voltage swells caused by asymmetrical faults in a power system. Although voltage swell is less common than voltage sag, repeated swells can have severe destructive impact on sensitive equipment. It is essential to understand system performance related to voltage swells for finding optimal countermeasures. An expected swell frequency at a sensitive load terminal can be estimated based on the concept of an area of vulnerability (AOV) and long-term system fault data. This paper describes an effective method for calculating an AOV to voltage swells. Interval estimation for an expected swell frequency is also presented for effective understanding of system performance. The proposed method provides long-term performance evaluation of the frequency and degree of voltage swell occurrences.