• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.023 seconds

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

A Child Emotion Analysis System using Text Mining and Method for Constructing a Children's Emotion Dictionary (텍스트마이닝 기반 아동 감정 분석 시스템 및 아동용 감정 사전 구축 방안)

  • Young-Jun Park;Sun-Young Kim;Yo-Han Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.545-550
    • /
    • 2024
  • In a society undergoing rapid change, modern individuals are facing various stresses, and there's a noticeable increase in mental health treatments for children as well. For the psychological well-being of children, it's crucial to swiftly discern their emotional states. However, this proves challenging as young children often articulate their emotions using limited vocabulary. This paper aims to categorize children's psychological states into four emotions: depression, anxiety, loneliness, and aggression. We propose a method for constructing an emotion dictionary tailored for children based on assessments from child psychology experts.

Voice Phishing Scammers' Psychological Manipulation and Consumer Protection Measures (보이스피싱 심리조작 수법과 소비자 보호 방안: 텍스트 마이닝 기법을 중심으로)

  • Chihun Han;Beomsoo Kim;Jaeyoung Park
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.5
    • /
    • pp.1089-1100
    • /
    • 2024
  • Despite various measures being implemented by the government and related institutions to prevent voice phishing, incidents of such fraud continue to occur. This study analyzed 448 actual conversations between voice phishing scammers and potential victims using text mining techniques. The text analysis reveals that voice phishing scammers frequently use words emphasizing limited time frames such as now, soon, in progress, today, first. This indicates that scammers manipulate the victim's psychology through specific words, preventing them from making rational decisions. The results of this study can aid government and related institutions in formulating effective policies for preventing voice phishing and protecting consumers.

Data Mining Research on Maehwado Painting Poetry in the Early Joseon Dynasty

  • Haeyoung Park;Younghoon An
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.474-482
    • /
    • 2023
  • Data mining is a technique for extracting valuable information from vast amounts of data by analyzing statistical and mathematical operations, rules, and relationships. In this study, we employed data mining technology to analyze the data concerning the painting poetry of Maehwado (plum blossom paintings) from the early Joseon Dynasty. The data was extracted from the Hanguk Munjip Chonggan (Korean Literary Collections in Classical Chinese) in the Hanguk Gojeon Jonghap database (Korea Classics DB). Using computer information processing techniques, we carried out web scraping and classification of the painting poetry from the Hanguk Munjip Chonggan. Subsequently, we narrowed down our focus to the painting poetry specifically related to Maehwado in the early Joseon Dynasty. Based on this, refined dataset, we conducted an in-depth analysis and interpretation of the text data at the syllable corpus level. As a result, we found a direct correlation between the corpus statistics for each syllable in Maehwado painting poetry and the symbolic meaning of plum blossoms.

The Analysis of User Perception and Attitude Using SNS Data about Emergency Contraceptive Pills

  • Lee, Sung Hyun
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.143-152
    • /
    • 2017
  • In order to ensure the right of self-determination of women, most of countries allow women to buy post-coital contraceptive pills or general medical supplies with ease. This study aims to analyze how ordinary people recognize and respond to post-coital contraceptive pills through collecting atypical data by using the keyword 'Contraception', rather than using the existing actual condition survey, such as questionnaire and interview, so that the results have been presented, which may be referred to for establishment of policies.

Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of Genomics & Informatics

  • Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.40.1-40.3
    • /
    • 2018
  • There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation.

A Study on the Research Trends in the Area of Geospatial-Information Using Text-mining Technique Focused on National R&D Reports and Theses (텍스트마이닝 기술을 이용한 공간정보 분야의 연구 동향에 관한 고찰 -국가연구개발사업 보고서 및 논문을 중심으로-)

  • Lim, Si Yeong;Yi, Mi Sook;Jin, Gi Ho;Shin, Dong Bin
    • Spatial Information Research
    • /
    • v.22 no.4
    • /
    • pp.11-20
    • /
    • 2014
  • This study aims to provide information about the research-trends in the area of Geospatial Information using text-mining methods. We derived the National R&D Reports and papers from NDSL(National Discovery for Science Leaders) site. And then we preprocessed their key-words and classified those in separable sectors. We investigated the appearance rates and changes of key-words for R&D reports and papers. As a result, we conformed that the researches concerning applications are increasing, while the researches dealing with systems are decreasing. Especially, with in the framework of the keyword, '3D-GIS', 'sensor' and 'service' xcept ITS are emerging. It could be helpful to investigate research items later.

Establishment of ITS Policy Issues Investigation Method in the Road Section applied Textmining (텍스트마이닝을 활용한 도로분야 ITS 정책이슈 탐색기법 정립)

  • Oh, Chang-Seok;Lee, Yong-taeck;Ko, Minsu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.6
    • /
    • pp.10-23
    • /
    • 2016
  • With requiring circumspections using big data, this study attempts to develop and apply the search method for audit issues relating to the ITS policy or program. For the foregoing, the auditing process of the board of audit and inspection was converged with the theoretical frame of boundary analysis proposed by William Dunn as an analysis tool for audit issues. Moreover, we apply the text mining technique in order to computerize the analysis tool, which is similar to the boundary analysis in the concept of approaching meta-problems. For the text mining analysis, specific model we applied the antisymmetry-symmetry compound lexeme-based LDA model based on the Latent Dirichlet Allocation(LDA) methodologies proposed by David Blei. The several prime issues were founded through a case analysis as follows: lack of collection of traffic information by the urban traffic information system, which is operated by the National Police Agency, the overlapping problems between the Ministry of Land, Infrastructure and Transport and the Advanced Traffic Management System and fabrication of the mileage on digital tachograph.

Analysis of the Unstructured Traffic Report from Traffic Broadcasting Network by Adapting the Text Mining Methodology (텍스트 마이닝을 적용한 한국교통방송제보 비정형데이터의 분석)

  • Roh, You Jin;Bae, Sang Hoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.3
    • /
    • pp.87-97
    • /
    • 2018
  • The traffic accident reports that are generated by the Traffic Broadcasting Networks(TBN) are unstructured data. It, however, has the value as some sort of real-time traffic information generated by the viewpoint of the drives and/or pedestrians that were on the roads, the time and spots, not the offender or the victim who caused the traffic accidents. However, the traffic accident reports, which are big data, were not applied to traffic accident analysis and traffic related research commonly. This study adopting text-mining technique was able to provide a clue for utilizing it for the impacts of traffic accidents. Seven years of traffic reports were grasped by this analysis. By analyzing the reports, it was possible to identify the road names, accident spot names, time, and to identify factors that have the greatest influence on other drivers due to traffic accidents. Authors plan to combine unstructured accident data with traffic reports for further study.