• Title/Summary/Keyword: vector data

Search Result 3,306, Processing Time 0.034 seconds

An Ensemble Classification of Mental Health in Malaysia related to the Covid-19 Pandemic using Social Media Sentiment Analysis

  • Nur 'Aisyah Binti Zakaria Adli;Muneer Ahmad;Norjihan Abdul Ghani;Sri Devi Ravana;Azah Anir Norman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.370-396
    • /
    • 2024
  • COVID-19 was declared a pandemic by the World Health Organization (WHO) on 30 January 2020. The lifestyle of people all over the world has changed since. In most cases, the pandemic has appeared to create severe mental disorders, anxieties, and depression among people. Mostly, the researchers have been conducting surveys to identify the impacts of the pandemic on the mental health of people. Despite the better quality, tailored, and more specific data that can be generated by surveys,social media offers great insights into revealing the impact of the pandemic on mental health. Since people feel connected on social media, thus, this study aims to get the people's sentiments about the pandemic related to mental issues. Word Cloud was used to visualize and identify the most frequent keywords related to COVID-19 and mental health disorders. This study employs Majority Voting Ensemble (MVE) classification and individual classifiers such as Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR) to classify the sentiment through tweets. The tweets were classified into either positive, neutral, or negative using the Valence Aware Dictionary or sEntiment Reasoner (VADER). Confusion matrix and classification reports bestow the precision, recall, and F1-score in identifying the best algorithm for classifying the sentiments.

Estimating the tensile strength of geopolymer concrete using various machine learning algorithms

  • Danial Fakhri;Hamid Reza Nejati;Arsalan Mahmoodzadeh;Hamid Soltanian;Ehsan Taheri
    • Computers and Concrete
    • /
    • v.33 no.2
    • /
    • pp.175-193
    • /
    • 2024
  • Researchers have embarked on an active investigation into the feasibility of adopting alternative materials as a solution to the mounting environmental and economic challenges associated with traditional concrete-based construction materials, such as reinforced concrete. The examination of concrete's mechanical properties using laboratory methods is a complex, time-consuming, and costly endeavor. Consequently, the need for models that can overcome these drawbacks is urgent. Fortunately, the ever-increasing availability of data has paved the way for the utilization of machine learning methods, which can provide powerful, efficient, and cost-effective models. This study aims to explore the potential of twelve machine learning algorithms in predicting the tensile strength of geopolymer concrete (GPC) under various curing conditions. To fulfill this objective, 221 datasets, comprising tensile strength test results of GPC with diverse mix ratios and curing conditions, were employed. Additionally, a number of unseen datasets were used to assess the overall performance of the machine learning models. Through a comprehensive analysis of statistical indices and a comparison of the models' behavior with laboratory tests, it was determined that nearly all the models exhibited satisfactory potential in estimating the tensile strength of GPC. Nevertheless, the artificial neural networks and support vector regression models demonstrated the highest robustness. Both the laboratory tests and machine learning outcomes revealed that GPC composed of 30% fly ash and 70% ground granulated blast slag, mixed with 14 mol of NaOH, and cured in an oven at 300°F for 28 days exhibited superior tensile strength.

Generation of Induced Pluripotent Stem Cells from Lymphoblastoid Cell Lines by Electroporation of Episomal Vectors

  • Myunghyun Kim;Junmyeong Park;Sujin Kim;Dong Wook Han;Borami Shin;Hans Robert Scholer;Johnny Kim;Kee-Pyo Kim
    • International Journal of Stem Cells
    • /
    • v.16 no.1
    • /
    • pp.36-43
    • /
    • 2023
  • Background and Objectives: Lymphoblastoid cell lines (LCLs) deposited from disease-affected individuals could be a valuable donor cell source for generating disease-specific induced pluripotent stem cells (iPSCs). However, generation of iPSCs from the LCLs is still challenging, as yet no effective gene delivery strategy has been developed. Methods and Results: Here, we reveal an effective gene delivery method specifically for LCLs. We found that LCLs appear to be refractory toward retroviral and lentiviral transduction. Consequently, lentiviral and retroviral transduction of OCT4, SOX2, KFL4 and c-MYC into LCLs does not elicit iPSC colony formation. Interestingly, however we found that transfection of oriP/EBNA-1-based episomal vectors by electroporation is an efficient gene delivery system into LCLs, enabling iPSC generation from LCLs. These iPSCs expressed pluripotency makers (OCT4, NANOG, SSEA4, SALL4) and could form embryoid bodies. Conclusions: Our data show that electroporation is an effective gene delivery method with which LCLs can be efficiently reprogrammed into iPSCs.

The Impact of Urban Heat Island-induced Temperature Differences on the Hatching Rates of Aedes albopictus (도시열섬 현상에 의한 기온차이가 흰줄숲모기(Aedes albopictus) 부화율에 미치는 영향)

  • Jihun Ryu;Kwang Shik Choi
    • Korean journal of applied entomology
    • /
    • v.63 no.1
    • /
    • pp.77-80
    • /
    • 2024
  • Aedes albopictus, a common species in the Republic of Korea, is internationally known as a major vector for various diseases, and it is well-adapted to urban environments. Recent insect outbreaks in urban areas, attributed to climate change and urban heat islands, have increased the necessity of researching the effects on mosquito populations. This study analyzed climate data from 25 Automatic Weather System (AWS) stations in Seoul, identifying urban areas with pronounced heat island effects and suburban areas with milder effects. Nine urban heat island conditions were established based on this analysis, under which the hatching rates of Ae. albopictus were examined. The results revealed an increase in hatching rates correlating with the intensity of the urban heat island effect. Regression analysis further indicated that this trend accelerates as the strength of the heat island effect increases. This study suggests that temperature variations resulting from urban heat island phenomena can significantly influence the hatching rates of Ae. albopictus.

Animal Infectious Diseases Prevention through Big Data and Deep Learning (빅데이터와 딥러닝을 활용한 동물 감염병 확산 차단)

  • Kim, Sung Hyun;Choi, Joon Ki;Kim, Jae Seok;Jang, Ah Reum;Lee, Jae Ho;Cha, Kyung Jin;Lee, Sang Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.137-154
    • /
    • 2018
  • Animal infectious diseases, such as avian influenza and foot and mouth disease, occur almost every year and cause huge economic and social damage to the country. In order to prevent this, the anti-quarantine authorities have tried various human and material endeavors, but the infectious diseases have continued to occur. Avian influenza is known to be developed in 1878 and it rose as a national issue due to its high lethality. Food and mouth disease is considered as most critical animal infectious disease internationally. In a nation where this disease has not been spread, food and mouth disease is recognized as economic disease or political disease because it restricts international trade by making it complex to import processed and non-processed live stock, and also quarantine is costly. In a society where whole nation is connected by zone of life, there is no way to prevent the spread of infectious disease fully. Hence, there is a need to be aware of occurrence of the disease and to take action before it is distributed. Epidemiological investigation on definite diagnosis target is implemented and measures are taken to prevent the spread of disease according to the investigation results, simultaneously with the confirmation of both human infectious disease and animal infectious disease. The foundation of epidemiological investigation is figuring out to where one has been, and whom he or she has met. In a data perspective, this can be defined as an action taken to predict the cause of disease outbreak, outbreak location, and future infection, by collecting and analyzing geographic data and relation data. Recently, an attempt has been made to develop a prediction model of infectious disease by using Big Data and deep learning technology, but there is no active research on model building studies and case reports. KT and the Ministry of Science and ICT have been carrying out big data projects since 2014 as part of national R &D projects to analyze and predict the route of livestock related vehicles. To prevent animal infectious diseases, the researchers first developed a prediction model based on a regression analysis using vehicle movement data. After that, more accurate prediction model was constructed using machine learning algorithms such as Logistic Regression, Lasso, Support Vector Machine and Random Forest. In particular, the prediction model for 2017 added the risk of diffusion to the facilities, and the performance of the model was improved by considering the hyper-parameters of the modeling in various ways. Confusion Matrix and ROC Curve show that the model constructed in 2017 is superior to the machine learning model. The difference between the2016 model and the 2017 model is that visiting information on facilities such as feed factory and slaughter house, and information on bird livestock, which was limited to chicken and duck but now expanded to goose and quail, has been used for analysis in the later model. In addition, an explanation of the results was added to help the authorities in making decisions and to establish a basis for persuading stakeholders in 2017. This study reports an animal infectious disease prevention system which is constructed on the basis of hazardous vehicle movement, farm and environment Big Data. The significance of this study is that it describes the evolution process of the prediction model using Big Data which is used in the field and the model is expected to be more complete if the form of viruses is put into consideration. This will contribute to data utilization and analysis model development in related field. In addition, we expect that the system constructed in this study will provide more preventive and effective prevention.

On-line Image Guided Radiation Therapy using Cone-Beam CT (CBCT) (콘빔CT (CBCT)를 이용한 온라인 영상유도방사선치료 (On-line Image Guided Radiation Therapy))

  • Bak, Jin-O;Jeong, Kyoung-Keun;Keum, Ki-Chang;Park, Suk-Won
    • Radiation Oncology Journal
    • /
    • v.24 no.4
    • /
    • pp.294-299
    • /
    • 2006
  • $\underline{Purpose}$: Using cone beam CT, we can compare the position of the patients at the simulation and the treatment. In on-line image guided radiation therapy, one can utilize this compared data and correct the patient position before treatments. Using cone beam CT, we investigated the errors induced by setting up the patients when use only the markings on the patients' skin. $\underline{Materials\;and\;Methods}$: We obtained the data of three patients that received radiation therapy at the Department of Radiation Oncology in Chung-Ang University during August 2006 and October 2006. Just as normal radiation therapy, patients were aligned on the treatment couch after the simulation and treatment planning. Patients were aligned with lasers according to the marking on the skin that were marked at the simulation time and then cone beam CTs were obtained. Cone beam CTs were fused and compared with simulation CTs and the displacement vectors were calculated. Treatment couches were adjusted according to the displacement vector before treatments. After the treatment, positions were verified with kV X-ray (OBI system). $\underline{Results}$: In the case of head and neck patients, the average sizes of the setup error vectors, given by the cone beam CT, were 0.19 cm for the patient A and 0.18 cm for the patient B. The standard deviations were 0.15 cm and 0.21 cm, each. On the other hand, in the case of the pelvis patient, the average and the standard deviation were 0.37 cm and 0.1 cm. $\underline{Conclusion}$: Through the on-line IGRT using cone beam CT, we could correct the setup errors that could occur in the conventional radiotherapy. The importance of the on-line IGRT should be emphasized in the case of 3D conformal therapy and intensity-modulated radiotherapy, which have complex target shapes and steep dose gradients.

Accuracy of 5-axis precision milling for guided surgical template (가이드 수술용 템플릿을 위한 5축 정밀가공공정의 정확성에 관한 연구)

  • Park, Ji-Man;Yi, Tae-Kyoung;Jung, Je-Kyo;Kim, Yong;Park, Eun-Jin;Han, Chong-Hyun;Koak, Jai-Young;Kim, Seong-Kyun;Heo, Seong-Joo
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.48 no.4
    • /
    • pp.294-300
    • /
    • 2010
  • Purpose: The template-guided implant surgery offers several advantages over the traditional approach. The purpose of this study was to evaluate the accuracy of coordinate synchronization procedure with 5-axis milling machine for surgical template fabrication by means of reverse engineering through universal CAD software. Materials and methods: The study was performed on ten edentulous models with imbedded gutta percha stoppings which were hidden under silicon gingival form. The platform for synchordination was formed on the bottom side of models and these casts were imaged in Cone beam CT. Vectors of stoppings were extracted and transferred to those of planned implant on virtual planning software. Depth of milling process was set to the level of one half of stoppings and the coordinate of the data was synchronized to the model image. Synchronization of milling coordinate was done by the conversion process for the platform for the synchordination located on the bottom of the model. The models were fixed on the synchordination plate of 5-axis milling machine and drilling was done as the planned vector and depth based on the synchronized data with twist drill of the same diameter as GP stopping. For the 3D rendering and image merging, the impression tray was set on the conbeam CT and pre- and post- CT acquiring was done with the model fixed on the impression body. The accuracy analysis was done with Solidworks (Dassault systems, Concord, USA) by measuring vector of stopping’s top and bottom centers of experimental model through merging and reverse engineering the planned and post-drilling CT image. Correlations among the parameters were tested by means of Pearson correlation coefficient and calculated with SPSS (release 14.0, SPSS Inc. Chicago, USA) ($\alpha$ = 0.05). Results: Due to the declination, GP remnant on upper half of stoppings was observed for every drilled bores. The deviation between planned image and drilled bore that was reverse engineered was 0.31 (0.15 - 0.42) mm at the entrance, 0.36 (0.24 - 0.51) mm at the apex, and angular deviation was 1.62 (0.54 - 2.27)$^{\circ}$. There was positive correlation between the deviation at the entrance and that at the apex (Pearson Correlation Coefficient = 0.904, P = .013). Conclusion: The coordinate synchronization 5-axis milling procedure has adequate accuracy for the production of the guided surgical template.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.