• Title/Summary/Keyword: Preprocessed data

Search Result 188, Processing Time 0.026 seconds

Identifying Interdisciplinary Trends of Humanities, Sociology, Science and Technology Research in Korea Using Topic Modeling and Network Analysis (인문사회 과학기술 분야 연구의 학제적 동향 분석 : 토픽 모델링과 네트워크 분석의 활용)

  • Choi, Jaewoong;Jang, Jaehyuk;Kim, Dae Hwan;Yoon, Janghyeok
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.1
    • /
    • pp.74-86
    • /
    • 2019
  • As many existing research fields are matured academically, researchers have encountered numbers of academic, social and other problems that cannot be addressed by internal knowledge and methodologies of existing disciplines. Earlier, pioneers of researchers thus are following a new paradigm that breaks the boundaries between the prior disciplines, fuses them and seeks new approaches. Moreover, developed countries including Korea are actively supporting and fostering the convergence research at the national level. Nevertheless, there is insufficient research to analyze convergence trends in national R&D support projects and what kind of content the projects mainly deal with. This study, therefore, collected and preprocessed the research proposal data of National Research Foundation of Korea, transforming the proposal documents to term-frequency matrices. Based on the matrices, this study derived detailed research topics through Latent Dirichlet Allocation, a kind of topic modeling algorithm. Next, this study identified the research topics each proposal mainly deals with, visualized the convergence relationships, and quantitatively analyze them. Specifically, this study analyzed the centralities of the detailed research topics to derive clues about the convergence of the near future, in addition to visualizing the convergence relationship and analyzing time-varying number of research proposals per each topic. The results of this study can provide specific insights on the research direction to researchers and monitor domestic convergence R&D trends by year.

Hyperspectral imaging technique to evaluate the firmness and the sweetness index of tomatoes

  • Rahman, Anisur;Park, Eunsoo;Bae, Hyungjin;Cho, Byoung-Kwan
    • Korean Journal of Agricultural Science
    • /
    • v.45 no.4
    • /
    • pp.823-837
    • /
    • 2018
  • The objective of this study was to evaluate the firmness and the sweetness index (SI) of tomatoes with a hyperspectral imaging (HSI) technique within the wavelength range of 1000 - 1550 nm. The hyperspectral images of 95 tomatoes were acquired with a push-broom hyperspectral reflectance imaging system, from which the mean spectra of each tomato were extracted from the regions of interest. The reference firmness and sweetness index of the same sample was measured and calibrated with their corresponding spectral data by partial least squares (PLS) regression with different preprocessing methods. The calibration model developed by PLS regression based on the Savitzky-Golay second-derivative preprocessed spectra resulted in a better performance for both the firmness and the SI of the tomatoes compared to models developed by other preprocessing methods. The correlation coefficients ($R_{pred}$) were 0.82, and 0.74 with a standard error of prediction of 0.86 N, and 0.63, respectively. Then, the feature wavelengths were identified using a model-based variable selection method, i.e., variable importance in projection, from the PLS regression analyses. Finally, chemical images were derived by applying the respective regression coefficients on the spectral image in a pixel-wise manner. The resulting chemical images provided detailed information on the firmness and the SI of the tomatoes. The results show that the proposed HSI technique has potential for rapid and non-destructive evaluation of firmness and the sweetness index of tomatoes.

Analysis of Text Mining of Consumer's Personality Implication Words in Review of Used Transaction Application (중고거래 어플리케이션 <당근마켓> 리뷰텍스트에 나타난 소비자의 인성 함축단어 텍스트마이닝 분석)

  • Jung, Yea-Rin;Ju, Young-Ae
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This study analyzes the use and meaning of consumer personality implication words in the review text of the Used Transaction Application . From of May 2021, the data were collected for the past six months by our Web crawler in Seoul and Gyeonggi Province, and a total of 1368 cases were collected first by random sampling, and finally 570 cases were preprocessed. The results are as follows. First, 48.2% of review texts were related to the personality of consumers even though it was a commercial platform of products. Second, the review text is mainly positive, which formed a text network structure based on the keyword 'gratitude'. Third, the review text, which implies consumer character, was divided into two groups: 'extrovert personality' and 'introvert personality' of consumers. And the individuality of the two groups worked together on the platform. In conclusion, we would like to suggest that consumer personality plays an important role in the platform transaction process, that consumer personality will play a role in the services of the platform in the future, and that consumer personality should be studied from various perspectives.

A Detecting Technique for the Climatic Factors that Aided the Spread of COVID-19 using Deep and Machine Learning Algorithms

  • Al-Sharari, Waad;Mahmood, Mahmood A.;Abd El-Aziz, A.A.;Azim, Nesrine A.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.131-138
    • /
    • 2022
  • Novel Coronavirus (COVID-19) is viewed as one of the main general wellbeing theaters on the worldwide level all over the planet. Because of the abrupt idea of the flare-up and the irresistible force of the infection, it causes individuals tension, melancholy, and other pressure responses. The avoidance and control of the novel Covid pneumonia have moved into an imperative stage. It is fundamental to early foresee and figure of infection episode during this troublesome opportunity to control of its grimness and mortality. The entire world is investing unimaginable amounts of energy to fight against the spread of this lethal infection. In this paper, we utilized machine learning and deep learning techniques for analyzing what is going on utilizing countries shared information and for detecting the climate factors that effect on spreading Covid-19, such as humidity, sunny hours, temperature and wind speed for understanding its regular dramatic way of behaving alongside the forecast of future reachability of the COVID-2019 around the world. We utilized data collected and produced by Kaggle and the Johns Hopkins Center for Systems Science. The dataset has 25 attributes and 9566 objects. Our Experiment consists of two phases. In phase one, we preprocessed dataset for DL model and features were decreased to four features humidity, sunny hours, temperature and wind speed by utilized the Pearson Correlation Coefficient technique (correlation attributes feature selection). In phase two, we utilized the traditional famous six machine learning techniques for numerical datasets, and Dense Net deep learning model to predict and detect the climatic factor that aide to disease outbreak. We validated the model by using confusion matrix (CM) and measured the performance by four different metrics: accuracy, f-measure, recall, and precision.

Performance Comparison for Exercise Motion classification using Deep Learing-based OpenPose (OpenPose기반 딥러닝을 이용한 운동동작분류 성능 비교)

  • Nam Rye Son;Min A Jung
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.59-67
    • /
    • 2023
  • Recently, research on behavior analysis tracking human posture and movement has been actively conducted. In particular, OpenPose, an open-source software developed by CMU in 2017, is a representative method for estimating human appearance and behavior. OpenPose can detect and estimate various body parts of a person, such as height, face, and hands in real-time, making it applicable to various fields such as smart healthcare, exercise training, security systems, and medical fields. In this paper, we propose a method for classifying four exercise movements - Squat, Walk, Wave, and Fall-down - which are most commonly performed by users in the gym, using OpenPose-based deep learning models, DNN and CNN. The training data is collected by capturing the user's movements through recorded videos and real-time camera captures. The collected dataset undergoes preprocessing using OpenPose. The preprocessed dataset is then used to train the proposed DNN and CNN models for exercise movement classification. The performance errors of the proposed models are evaluated using MSE, RMSE, and MAE. The performance evaluation results showed that the proposed DNN model outperformed the proposed CNN model.

Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis (토픽 모델링을 활용한 광범위 선천성 대사이상 신생아 선별검사 관련 온라인 육아 커뮤니티 게시 글 분석: 계량적 내용분석 연구)

  • Myeong Seon Lee;Hyun-Sook Chung;Jin Sun Kim
    • Women's Health Nursing
    • /
    • v.29 no.1
    • /
    • pp.20-31
    • /
    • 2023
  • Purpose: As more newborns have received expanded newborn screening (NBS) for metabolic disorders, the overall number of false-positive results has increased. The purpose of this study was to explore the psychological impacts experienced by mothers related to the NBS process. Methods: An online parenting community in Korea was selected, and questions regarding NBS were collected using web crawling for the period from October 2018 to August 2021. In total, 634 posts were analyzed. The collected unstructured text data were preprocessed, and keyword analysis, topic modeling, and visualization were performed. Results: Of 1,057 words extracted from posts, the top keyword based on 'term frequency-inverse document frequency' values was "hypothyroidism," followed by "discharge," "close examination," "thyroid-stimulating hormone levels," and "jaundice." The top keyword based on the simple frequency of appearance was "XXX hospital," followed by "close examination," "discharge," "breastfeeding," "hypothyroidism," and "professor." As a result of LDA topic modeling, posts related to inborn errors of metabolism (IEMs) were classified into four main themes: "confirmatory tests of IEMs," "mother and newborn with thyroid function problems," "retests of IEMs," and "feeding related to IEMs." Mothers experienced substantial frustration, stress, and anxiety when they received positive NBS results. Conclusion: The online parenting community played an important role in acquiring and sharing information, as well as psychological support related to NBS in newborn mothers. Nurses can use this study's findings to develop timely and evidence-based information for parents whose children receive positive NBS results to reduce the negative psychological impact.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

Optimized inverse distance weighted interpolation algorithm for γ radiation field reconstruction

  • Biao Zhang;Jinjia Cao;Shuang Lin;Xiaomeng Li;Yulong Zhang;Xiaochang Zheng;Wei Chen;Yingming Song
    • Nuclear Engineering and Technology
    • /
    • v.56 no.1
    • /
    • pp.160-166
    • /
    • 2024
  • The inversion of radiation field distribution is of great significance in the decommissioning sites of nuclear facilities. However, the radiation fields often contain multiple mixtures of radionuclides, making the inversion extremely difficult and posing a huge challenge. Many radiation field reconstruction methods, such as Kriging algorithm and neural network, can not solve this problem perfectly. To address this issue, this paper proposes an optimized inverse distance weighted (IDW) interpolation algorithm for reconstructing the gamma radiation field. The algorithm corrects the difference between the experimental and simulated scenarios, and the data is preprocessed with normalization to improve accuracy. The experiment involves setting up gamma radiation fields of three Co-60 radioactive sources and verifying them by using the optimized IDW algorithm. The results show that the mean absolute percentage error (MAPE) of the reconstruction result obtained by using the optimized IDW algorithm is 16.0%, which is significantly better than the results obtained by using the Kriging method. Importantly, the optimized IDW algorithm is suitable for radiation scenarios with multiple radioactive sources, providing an effective method for obtaining radiation field distribution in nuclear facility decommissioning engineering.

Deep Learning-Based Daily Baseball Attendance Predcition (딥러닝 기반 일별 야구 관중 수 예측)

  • Hyunhee Lee;Seoyoung Sohn;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.131-135
    • /
    • 2024
  • Baseball attracts the largest audience among professional sports in Korea. In particular, attendance is the primary source of income in baseball. Previous studies have limitations in reflecting the characteristics of individual stadium. For instance, the KIA Tigers exhibit the highest away game revenue among domestic teams, but they show lower home game earnings. Therefore, we aim to predict the daily attendance at the Gwangju-KIA Champions Field of the KIA Tigers using deep learning. We collected and preprocessed daily attendance, dates, weather, and team-related variables for Gwangju-KIA Champions Field from 2018 to 2023. We propose a deep learning-based linear regression model to predict the daily attendance. We expect that the proposed deep learning model will be used as basic information to increase the club's revenue.

Classification of mandibular molar furcation involvement in periapical radiographs by deep learning

  • Katerina Vilkomir;Cody Phen;Fiondra Baldwin;Jared Cole;Nic Herndon;Wenjian Zhang
    • Imaging Science in Dentistry
    • /
    • v.54 no.3
    • /
    • pp.257-263
    • /
    • 2024
  • Purpose: The purpose of this study was to classify mandibular molar furcation involvement (FI) in periapical radiographs using a deep learning algorithm. Materials and Methods: Full mouth series taken at East Carolina University School of Dental Medicine from 2011-2023 were screened. Diagnostic-quality mandibular premolar and molar periapical radiographs with healthy or FI mandibular molars were included. The radiographs were cropped into individual molar images, annotated as "healthy" or "FI," and divided into training, validation, and testing datasets. The images were preprocessed by PyTorch transformations. ResNet-18, a convolutional neural network model, was refined using the PyTorch deep learning framework for the specific imaging classification task. CrossEntropyLoss and the AdamW optimizer were employed for loss function training and optimizing the learning rate, respectively. The images were loaded by PyTorch DataLoader for efficiency. The performance of ResNet-18 algorithm was evaluated with multiple metrics, including training and validation losses, confusion matrix, accuracy, sensitivity, specificity, the receiver operating characteristic (ROC) curve, and the area under the ROC curve. Results: After adequate training, ResNet-18 classified healthy vs. FI molars in the testing set with an accuracy of 96.47%, indicating its suitability for image classification. Conclusion: The deep learning algorithm developed in this study was shown to be promising for classifying mandibular molar FI. It could serve as a valuable supplemental tool for detecting and managing periodontal diseases.