• Title/Summary/Keyword: Text mining analysis

Search Result 1,200, Processing Time 0.038 seconds

Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching (Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험)

  • Kim, Young Soo;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.19 no.1
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

Analysis on Status and Trends of SIAM Journal Papers using Text Mining (텍스트마이닝 기법을 활용한 미국산업응용수학 학회지의 연구 현황 및 동향 분석)

  • Kim, Sung-Yeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.212-222
    • /
    • 2020
  • The purpose of this study is to understand the current status and trends of the research studies published by the Society for Industrial and Applied Mathematics which is a leader in the field of industrial mathematics around the world. To perform this purpose, titles and abstracts were collected from 6,255 research articles between 2016 and 2019, and the R program was used to analyze the topic modeling model with LDA techniques and a regression model. As the results of analyses, first, a variety of studies have been studied in the fields of industrial mathematics, such as algebra, discrete mathematics, geometry, topological mathematics, probability and statistics. Second, it was found that the ascending research subjects were fluid mechanics, graph theory, and stochastic differential equations, and the descending research subjects were computational theory and classical geometry. The results of the study, based on the understanding of the overall flows and changes of the intellectual structure in the fields of industrial mathematics, are expected to provide researchers in the field with implications of the future direction of research and how to build an industrial mathematics curriculum that reflects the zeitgeist in the field of education.

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP (국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -)

  • Park, JunHyeong;Oh, Hyo-Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.235-258
    • /
    • 2017
  • The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

Trend Analysis of Apartments Demand based on Big Data (빅데이터 기반의 아파트 수요 트렌드 분석에 관한 연구)

  • Kim, Tae-Kyeong;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.18 no.6
    • /
    • pp.13-25
    • /
    • 2017
  • Apartments are a major type of residence and their number has continuously increased. Apartments have multiple meanings in that for public they are not only for residence purpose but for investment, a major commodity for construction firms and a critical policy measure of public well-fare for the government. Therefore, it is critical to understand and analyze trends in apartments demand for pro-active actions. The objective of the study is to analyze and identify key trends in apartments demand based on big data drawn from articles of major daily newspapers. The study identifies 17 major trends from seven themes including development, trade, sale in lots, location requirements, policy, residential environment, and investment and profit. The research methods in the study can be usefully applied to further studies for various issues in relation to the construction industry.

Privacy Policy Analysis Techniques Using Deep Learning (딥러닝을 활용한 개인정보 처리방침 분석 기법 연구)

  • Jo, Yong-Hyun;Cha, Young-Kyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.2
    • /
    • pp.305-312
    • /
    • 2020
  • The Privacy Act stipulates that the privacy policy document, which is a privacy statement, should be disclosed in order to guarantee the rights of the information subjects, and the Fair Trade Commission considers the privacy policy as a condition and conducts an unfair review of the terms and conditions under the Terms and Conditions Control Act. However, the information subjects tend not to read personal information because it is complicated and difficult to understand. Simple and legible information processing policies will increase the probability of participating in online transactions, contributing to the increase in corporate sales and resolving the problem of information asymmetry between operators and information entities. In this study, complex personal information processing policies are analyzed using deep learning, and models are presented for acquiring simplified personal information processing policies that are highly readable by the information subjects. To present the model, the personal information processing policies of 258 domestic companies were established as data sets and analyzed using deep learning technology.

Development of Filtering System ADDAVICHI for Fake Reviews using Big Data Analysis (빅데이터 분석을 활용한 가짜 리뷰 필터링 시스템 ADDAVICHI)

  • Jeong, Davichi;Rho, Young-J.
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.1-8
    • /
    • 2019
  • Recently, consumer distrust has deepened due to blog posts focusing only on public relations due to 'viral marketing'. In addition, marketing projects such as false writing or exaggerated use of the latter phase are one of the most popular programs in 2016 as they are cheaper and more effective than newspaper and TV ads, and the size of advertising costs is set to be a major means of advertising at '3 trillion 394.1 billion won. From this 'viral marketing,' it has become an Internet environment that needs tools to filter information. The fake review filtering application ADDAVICHI presented in this paper extracts, analyzes, and presents blog keywords, total number of searches, reliability and satisfaction when users search for content such as "event" and "taste restaurant." Reliability shows the number of ad posts on a blog, the total number of posts, and satisfaction shows a clean post with confidence divided into positive and negative posts. Finally, the keyword shows a list of the top three words in the review from a positive post. In this way, it helps users interpret information away from advertising.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Developing an Intelligent System for the Analysis of Signs Of Disaster (인적재난사고사례기반의 새로운 재난전조정보 등급판정 연구)

  • Lee, Young Jai
    • Journal of Korean Society of societal Security
    • /
    • v.4 no.2
    • /
    • pp.29-40
    • /
    • 2011
  • The objective of this paper is to develop an intelligent decision support system that is able to advise disaster countermeasures and degree of incidents on the basis of the collected and analyzed signs of disasters. The concepts derived from ontology, text mining and case-based reasoning are adapted to design the system. The functions of this system include term-document matrix, frequency normalization, confidency, association rules, and criteria for judgment. The collected qualitative data from signs of new incidents are processed by those functions and are finally compared and reasoned to past similar disaster cases. The system provides the varying degrees of how dangerous the new signs of disasters are and the few countermeasures to the disaster for the manager of disaster management. The system will be helpful for the decision-maker to make a judgment about how much dangerous the signs of disaster are and to carry out specific kinds of countermeasures on the disaster in advance. As a result, the disaster will be prevented.

  • PDF

Prediction of Highy Pathogenic Avian Influenza(HPAI) Diffusion Path Using LSTM (LSTM을 활용한 고위험성 조류인플루엔자(HPAI) 확산 경로 예측)

  • Choi, Dae-Woo;Lee, Won-Been;Song, Yu-Han;Kang, Tae-Hun;Han, Ye-Ji
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.1-9
    • /
    • 2020
  • The study was conducted with funding from the government (Ministry of Agriculture, Food and Rural Affairs) in 2018 with support from the Agricultural, Food, and Rural Affairs Agency, 318069-03-HD040, and in based on artificial intelligence-based HPAI spread analysis and patterning. The model that is actively used in time series and text mining recently is LSTM (Long Short-Term Memory Models) model utilizing deep learning model structure. The LSTM model is a model that emerged to resolve the Long-Term Dependency Problem that occurs during the Backpropagation Through Time (BPTT) process of RNN. LSTM models have resolved the problem of forecasting very well using variable sequence data, and are still widely used.In this paper study, we used the data of the Call Detailed Record (CDR) provided by KT to identify the migration path of people who are expected to be closely related to the virus. Introduce the results of predicting the path of movement by learning the LSTM model using the path of the person concerned. The results of this study could be used to predict the route of HPAI propagation and to select routes or areas to focus on quarantine and to reduce HPAI spread.

A Literature Review on the Recent Tendency of the Treatment about Atypical Hyperplasia of Breast on the Chinese Herbal Medicine (비정형유방증식에 대한 최근 중의 약물치료 동향에 대한 문헌연구)

  • Kim, Jun-Hee;Lee, In-Seon
    • The Journal of Korean Obstetrics and Gynecology
    • /
    • v.33 no.1
    • /
    • pp.36-58
    • /
    • 2020
  • Objectives: We conducted a literature study on the treatment trends in China to find out the possibility of Oriental medicine treatment of atypical hyperplasia of breast (AHB). Methods: RCTs (randomized controlled trial) on AHB were collected from CNKI (China National Knowledge Infrastructure). The search words were "乳腺增生", "乳腺囊性增生", "乳癖", "中医", "中药" and "中西医结合". The search period was limited from July 2006 to May 2017. Finally, we selected 107 RCTs which were clinical studies to find out the effectiveness of Chinese herbal medicine in comparison with Western medicine. After reviewing, we investigated Chinese herbal medication guide, Chinese treatment method and prescriptions. And the correlation between the treatments and the medicinal herbs was investigated to be useful in the clinical practice. Results: 1. The administration of herbal medicine was 58.9 percent in 63 cases, followed by menstrual cycles, and 41.1 percent in 44 cases, regardless of menstrual cycles. 2. In the basic frequency analysis between the treatment and the medicinal herb, the frequency of dissipate binds (散結) was the highest. Next, there was a high frequency of therapies such as activating blood-activating (活血), relieve pain (止痛), soothe the liver (疏肝), regulate qi (理氣), resolve phlegm (化痰), soften hardness (軟堅), resolve depression (解鬱), move qi (行氣) of frequency was high. In herbal medicine, bupleuri radix (柴胡), cyperi rhizoma (香附子), angelicae gigantis radix (當歸), fritillaria thunbergii bulb (貝母), paeoniae radix alba (白芍藥), prunellae spica (夏枯草), corydalis rhizoma (玄胡索) showed high frequency. 3. We finded out the correlation between the frequent treatment methods and the medicinal herbs using Text Mining. Conclusions: These findings are thought to help implement Korean traditional medicine treatments for AHB.