• Title/Summary/Keyword: Frequency of Words

Search Result 881, Processing Time 0.027 seconds

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

The study on the structural relation among professors' core competency, college students' cognitive learning competency and life competencies (교수의 핵심역량과 대학생의 인지역량 및 생애역량의 구조적 관계 분석)

  • Kim, Dae-Myung
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.97-105
    • /
    • 2017
  • The subjects were 500 college students in 7 provincial areas for the study on the structural relation among professors' core competency, college students' cognitive learning competency and life competencies. The statistical methods of this study were as follows: frequency analysis, descriptive statistic analysis, exploratory factor analysis, reliability analysis, correlation analysis, confirmatory factor analysis, and structure equation model analysis. The results of the study are as follows. First, the lifelong learning educators' college students recognized core competency significantly affects on the college students' life competencies. Second, the lifelong learning educators' core competency significantly affects on the college students' cognitive learning competency. Third, the college students' cognitive learning competency significantly affects on life competencies. Fourth, the college students' cognitive learning competency has a significant mediating effect between the lifelong learning educators' core competency and the college students' life competencies. In other words, the lifelong learning educators' core competency based on the college students' cognitive learning competency has great effect on life competencies.

Relationship between rainfall in Korea and Antarctic Oscillation in June (6월의 남극진동이 한국의 6월 강우량에 미치는 영향)

  • Choi, Ki Seon;Kim, Baek Jo;Lee, Jong Ho
    • Journal of the Korean earth science society
    • /
    • v.34 no.2
    • /
    • pp.136-147
    • /
    • 2013
  • This study examined the effect of the Antarctic Oscillation (AAO) in June on the June rainfall in Korea by using a correlational statistical analysis. Results showed that there is a highly positive correlation between the two variables. In other words, the June rainfall in Korea is influenced by the Mascarene High and Australian High that are strengthened in the Southern Hemisphere, which is a typical positive AAO pattern. When these two anomalous pressure systems strengthen, the cold cross-equatorial flows in the direction from the region around Australia to the equator are intensified, which in turn, force a western North Pacific subtropical high (WNPSH) to develop northward. This pressure development eventually drives the rain belt to head north. As a result, the Changma begins early in the positive AAO phase and the June rainfall increases in Korea. In addition, a WNPSH that develops more northward increases the landfall (or affecting) frequency of tropical cyclones in Korea, which plays an important role in increasing the June rainfall.

Topic Modeling on Fine Dust Issues Using LDA Analysis (LDA 기법을 이용한 미세먼지 이슈의 토픽모델링 분석)

  • Yoon, soonuk;Kim, Minchul
    • Journal of Energy Engineering
    • /
    • v.29 no.2
    • /
    • pp.23-29
    • /
    • 2020
  • In this study, the last 10 years of news data on fine dust was collected and 80 topics are selected through LDA analysis. As a result, weather-related information made up the main words for the topic, and we can see that fine dust becomes a big issue below 10 degrees Celsius. The frequency of exposure to the media and the maximum concentration of fine dust are correlated with positive. Topics related to fine dust reduction measures and the government's comprehensive measures over the past decade, topics related to products such as air purifiers related to fine dust, topics related to policies protecting vulnerable people from fine dust, and topics on fine dust reduction through R&D were found to be major topics. Measures against fine dust as a social issue can be seen to be closely related to the government's policy.

Analysis and the Standardization Plan of the Terms Used by Seafarers on Small Vessel (소형선박 종사자 사용용어 실태 분석 및 표준화 방안)

  • Kang, Suk-Young;Ryu, Won;Bae, Chang-Won;Kim, Jong-Kwan
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.7
    • /
    • pp.867-873
    • /
    • 2019
  • As of August 2019, there were 3,823 vessels under 30 tons that could be included in the category of small vessels; these account for 42.5 % of the 9,001 registered vessels in Korea. The problem is that many small vessel seafarers face many problems such as an board communication disconnection, difficulties in communication in maritime license interviews, or education related to maritime training using a large number of nonstandard terms, which are derived from foreign languages; this is leading to a decline the job skills of small vessel seafarers. Therefore, in this study, we closely analyzed the terminology of small vessel seafarers and proposed a standardization plan. In the terminology analysis, the preliminary terms of the maritime license interview and the high-frequency terms of the small vessel educational textbook were identified and the corresponding nonstandard terms were examined. Based on a survey, an expert meeting was held and incorrect Japanese notation, English notation, and the standard language for key terms were presented to analyze which questionnaire was most familiar. The ratio of the use of standard words is relatively high in the case of nautical terms, however, the wrong Japanese notation is used more for engine terms; the analysis results by age and tonnage also generally use the Japanese notation and the use frequency of English notation was determined to be low. Based on this, short- and long-term plans for the use of standard words by small vessel seafarers were proposed, including the production of a standard language dictionary for terms used by these seafarers, a promotion of the importance of using standard terms, active education through educational institutions, and the systematic preparation and implementation of Korean-language education for foreign sailors.

The influence with buddhist music appearing in PanYeombul out of Ogu exorcism of East coast - focused on the song by Kim Janggil - (동해안 오구굿 중 판염불에 나타난 불교음악의 영향 - 김장길의 소리를 중심으로 -)

  • Seo, Jeong-mae
    • (The) Research of the performance art and culture
    • /
    • no.34
    • /
    • pp.277-313
    • /
    • 2017
  • This study is to find out the correlation with buddhist music after analyzing the rhythm of six pieces of PanYeombul sung by Kim Janggil out of Ogu exorcism of East coast the findings summarized are as follows. First, PanYeombul by Kim Janggil, performed on Oct, 16, 2016, was composed of , , , , , , , , , , and . Still, even if PanYeombul is performed by the same male shaman, the composition can be added or left out depending on some circumstances, which means the procedures are flexible. Seeing that there is common component of in additoin to compared with Kimyongtaek, it can be said that the component of is an important part in PanYeombul of Ogu exorcism of East coast Second, is usually referred to 'SinmyojangguDaedalani' in buddhist ritual, While Kim yongtaek accepts this practice in title, Kim Janggil uses 'YeomhwajangguDaedalani' as the title which makes his song different from others. Yeomhwa means "picking up flowers with fingers" which has been used in buddhism, not in common Considering this fact, the conclusion can be reached that by using the term 'Daedalani' from a buddhist chant, but making differentiation from buddhism, Kim Janggil is making the effort to be different from buddhist rituals. give some unique meaning to shaman rituals. Third, PanYeombul of Ogu exorcism of East coast may be divided into two main parts - the former part is PanYeombul and the latter part is Jiokga. In performing PanYeombul, male shaman sits singing alone and playing Jing himself, on the other hand, in case of Jiokga, he stands singing a solo with gwaeggwari in his hand accompanied by other musicians with the rhythm of Samgongjaebi. As the song and the accompaniment are in the form of giving and taking like duet. it is in peak in terms of music. Accordingly, PanYeombul can be divided into PanYeombul and Jiokga, But since it is performed by one male shaman and sung a solo, it is usually seen as one procedure. Jing, which is a kind of accompaniment in PanYeombul by Kim Janggil, has the role to distinguish a phrase and settle the musical paragraph. When the buddhist chant with one word-one note is performed. it requires the performer to catch his or her breath or clear throat. Just then, Jing comes out for filling out the intervals. Also, its role to distinguish a phrase and settle the musical paragraph helps make it clear to deliver words. The rhythm of Jing is mostly made up of small triple time except equal small binary time, comes out with overwhelmingly more frequency of Sutsoe(♪♩) than Amsoe(♩♪), and often shows syncopation. By often using Off Beat or short-long rhythm even in accompaniment of equal small binary time, he tris to give some variation to monotonous and equal rhythm for the musical vitality. These are similar to Sutsoe rhythm which can evoke tension and Kim Janggil makes these things his characierisiic of rhythm. Fifth, all the pieces consist of mi, sol, la, do, re and the descending melody like do'${\searrow}$la${\searrow}$sol${\searrow}$mi appears most frequently. The descending melody usually arouses the feeling of sorrow, so the sadness for the deceased is presented properly, which suggests his musical talent. Generally, pieces take on Menari-tori as a whole where the length of sol appears for a short time in descending la${\searrow}$sol${\searrow}$mi of perfect four degrees. Sixth, Even he accepts the lines of buddhist chant, he changes them in some degree. For example, he inserted some words between lines like 'Wonwangsaeng' and 'NamuAmitabul' and added Korean words like hapsosa to the lines of buddhist service written in Chinese character. Also, he inserted some words like 'iiiiiii~' to express the feeling of sadness. These are to maximize the desire of the deceased to go to the heaven and at the same time to diminish the sign of buddhism and strengthen the features of shamanism. Seventh, the effort to decrease the sign of buddhism is made in pasting lines of two songs. For example, Between the last words 'Wonsuaenapsu of Dage and the first words 'Jisimgwimyeongrye' of Chiljeongrye, there is usually a short pause to distinguish paragraphs, But he continues two songs without any pause to get rid of the feelings of buddhist chant. In terms of melody, he makes a distance from buddhist chant in an effort that he gives some traits to shaman rituals which are different from buddhist even if he uses the lines of buddhist rituals. Eighth, the analyzed pieces can be in four categories - no regular melody , , equal small binary time , eotmori melody of ten eighth time with 3+2+3+2 mixed small time . and Samgongjaebi melody 3+2+3 mixed small time . Each piece has its unique melody. Although of buddhist ritual is often performed, by using eotmori melody, he evokes the feeling of shaman and is another example of giving unique characteristic to the shaman of East coast by using Samgongjaebi melody.

Korean Learning Assistant System with Automatically Extracted Knowledge (자동 추출된 지식에 기반한 한국어 학습 지원 시스템)

  • Park, Gi-Tae;Lee, Tae-Hoon;Hwang, So-Hyun;Kim, Byeong Man;Lee, Hyun Ah;Shin, Yoon Sik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.2
    • /
    • pp.91-102
    • /
    • 2012
  • Computer aided language learning has become popular. But the level of automation of constructing a Korean learning assistant system is not so high because a practical language learning system needs large scale knowledge resources, which is very hard to acquire. In this paper, we propose a Korean learning assistant system that utilizes easily obtainable knowledge resources like a corpus, web documents and a lexicon. Our system has three modules - problem solving, pronunciation marker and writing assistant. Automatic problem generator uses a corpus and a lexicon to make problems with one correct answer and three distracters, then verifies their suitability by utilizing frequency information from web documents. We analyze pronunciation rules for a pronunciation marker and recommend appropriate words and sentences in real-time by using data extracted from a corpus. In experiment, we evaluate 400 automatically generated problems, which show 89.9% problem suitability and 64.9% example suitability.

Hazard assessment for the fishermen's safety in offshore large powered purse seiner using insurance proceeds payment of NFFC in 2013 (2013년 수협 재해 보험급여를 이용한 근해대형선망 어선원의 안전 위험 요소 평가)

  • Lee, Yoo-Won;Cho, Young-Bok;Kim, Sung-Ki;Kim, Seok-Jae;Park, Tae-Geun;Ryu, Kyong-Jin;Kim, Wook-Sung
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.51 no.2
    • /
    • pp.188-194
    • /
    • 2015
  • The powered purse seine fishery is an important fishery accounting for 19.4% of adjacent water fishery production in Korea, and the commercial fishing is associated with high rate of fatal and non-fatal occupational injury. The hazard analysis for the fishermen's safety of offshore large powered purse seiner was conducted to serve as a basic data for improving the healthy and safe working environment of fishermen using fishermen's occupational accidents of the national federation of fisheries cooperatives (NFFC) in 2013 (n=583). As a result, the occupational accident occurrence rate of this fishery was 182.6‰ in all industries 30.9 times the rate of that. In addition, death and missing rate was found to have a very serious level management to 25.1‰ in all industries of death of 17.5 times. The accident occurred in 72.3 to 85.8% was happened at sea. The others, slipping and struck by object etc occurred more frequently in order in the frequency of accident occurrence pattern. However, the occurrence rate of death and missing did not match the frequency of accident pattern. In other words, slipping occurred frequently higher while death and missing risk was not high. And the contact with fishing gear and fall in the waters was low while death and missing risk was high. The results are expected to contribute for identification and assessment of safety hazard occurred in offshore large powered purse seiners.

Hazard Factors Assessment for the Fishermen's Safety on the Vessel of Offshore Stow Nets on Anchor using Insurance Proceeds Payment of NFFC (수협 재해 보험급여를 이용한 근해안강망 어선원의 안전 위험 요소 평가)

  • LEE, Yoo-Won;CHO, Young-Bok;KIM, Sung-Ki;KIM, Seok-Jae;PARK, Tae-Geun;RYU, Kyong-Jin;KIM, Wook-Sung
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.27 no.4
    • /
    • pp.1129-1135
    • /
    • 2015
  • The stow net is a stationary gear made from netting, usually in shape like trawl net without wings. The nets are fixed by means of anchors, placed according to the direction and strength of the current. And the commercial fishing is associated with high rate of fatal and non-fatal occupational injury. The hazard factors analysis for the fishermen's safety of offshore stow nets vessel was conducted to serve as a basic data for improving the healthy and safe working environment of fishermen using fishermen's occupational accidents of the national federation of fisheries cooperatives (NFFC) from 2012 to 2014 (n=1,144). As a result, the average occupational accident occurrence rate of this fishery was 206.9‰ in all industries 36.9 times the rate of that. In addition, average death and missing rate was found to have a very serious level management to 50.4‰ in all industries of death of 42.0 times. The accident occurred in 84.5 to 94.6% was happened at sea. The struck by object, slipping, contact with machinery, contact by object or gear and others occurred more frequently in order on the frequency of accident occurrence pattern. However, the occurrence rate of death and missing did not match the frequency of accident pattern. In other words, slipping occurred frequently higher while death and missing risk was not high. And the contact with fishing gear and fall in the waters was low while death and missing risk was high. The results are expected to contribute for identification and assessment of safety hazard occurred in offshore stow nets vessel.