• Title/Summary/Keyword: matrix learning

Search Result 354, Processing Time 0.023 seconds

Scientometrics-based R&D Topography Analysis to Identify Research Trends Related to Image Segmentation (이미지 분할(image segmentation) 관련 연구 동향 파악을 위한 과학계량학 기반 연구개발지형도 분석)

  • Young-Chan Kim;Byoung-Sam Jin;Young-Chul Bae
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.563-572
    • /
    • 2024
  • Image processing and computer vision technologies are becoming increasingly important in a variety of application fields that require techniques and tools for sophisticated image analysis. In particular, image segmentation is a technology that plays an important role in image analysis. In this study, in order to identify recent research trends on image segmentation techniques, we used the Web of Science(WoS) database to analyze the R&D topography based on the network structure of the author's keyword co-occurrence matrix. As a result, from 2015 to 2023, as a result of the analysis of the R&D map of research articles on image segmentation, R&D in this field is largely focused on four areas of research and development: (1) researches on collecting and preprocessing image data to build higher-performance image segmentation models, (2) the researches on image segmentation using statistics-based models or machine learning algorithms, (3) the researches on image segmentation for medical image analysis, and (4) deep learning-based image segmentation-related R&D. The scientometrics-based analysis performed in this study can not only map the trajectory of R&D related to image segmentation, but can also serve as a marker for future exploration in this dynamic field.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

International Comparison of Cognitive Attributes using Analysis on Science Results at TIMSS 2011 Based on the Cognitive Diagnostic Theory (인지진단이론에 근거한 TIMSS 2011의 과학 결과 분석을 통한 인지 속성의 국제비교)

  • Kim, Jiyoung;Kim, Soojin;Dong, Hyokwan
    • Journal of The Korean Association For Science Education
    • /
    • v.35 no.2
    • /
    • pp.267-275
    • /
    • 2015
  • This research purports to find out the characteristics of Korean students cognitive attributes and compare it with that of high-achieving countries who took TIMSS 2011 based on the Cognitive Diagnostic Theory. Based on TIMSS 2011 Science framework, nine cognitive attributes were extracted and the researcher analyzed that 216 of the TIMSS 2011 science items require these attributes. This analysis was conducted to come up with a Q-matrix. After producing the Q-matrix, multi-level IRT was used to figure out each countries' characteristics for each of the cognitive attribute. According to the study results, four attributes, such as 'Use Models,' 'Interpret Information,' 'Draw Conclusions,' and 'Evaluate and justify' were easier attributes for Korean middle school students. However, the other five attributes such as 'Recall/Recognize', 'Explain', 'Classify', 'Integrate', 'Hypothesize and Design' were considered as harder attributes compared to other countries. Korean students also considered 'Interpret Information' as the easiest attributes, and 'Explain' as the hardest attributes of all. For Korean students, those attributes considered to be easy were the easiest and hard attributes as the hardest compared to other countries, showing very extreme cases. Therefore, to give students more meaningful learning experience, it is better to use all the attributes altogether rather than use specific attributes while constructing Science curriculum or textbooks.

A Study on the UAV-based Vegetable Index Comparison for Detection of Pine Wilt Disease Trees (소나무재선충병 피해목 탐지를 위한 UAV기반의 식생지수 비교 연구)

  • Jung, Yoon-Young;Kim, Sang-Wook
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.201-214
    • /
    • 2020
  • This study aimed to early detect damaged trees by pine wilt disease using the vegetation indices of UAV images. The location data of 193 pine wilt disease trees were constructed through field surveys and vegetation index analyses of NDVI, GNDVI, NDRE and SAVI were performed using multi-spectral UAV images at the same time. K-Means algorithm was adopted to classify damaged trees and confusion matrix was used to compare and analyze the classification accuracy. The results of the study are summarized as follows. First, the overall accuracy of the classification was analyzed in order of NDVI (88.04%, Kappa coefficient 0.76) > GNDVI (86.01%, Kappa coefficient 0.72) > NDRE (77.35%, Kappa coefficient 0.55) > SAVI (76.84%, Kappa coefficient 0.54) and showed the highest accuracy of NDVI. Second, K-Means unsupervised classification method using NDVI or GNDVI is possible to some extent to find out the damaged trees. In particular, this technique is to help early detection of damaged trees due to its intensive operation, low user intervention and relatively simple analysis process. In the future, it is expected that the utilization of time series images or the application of deep learning techniques will increase the accuracy of classification.

Importance-Performance Analysis (IPA) of the Core Competence of Gifted Education Teachers (영재교육 담당교원의 핵심역량 인식에 대한 중요도와 실행도(IPA) 분석)

  • Lee, Mina;Park, Sung Hee
    • Journal of Gifted/Talented Education
    • /
    • v.25 no.6
    • /
    • pp.927-949
    • /
    • 2015
  • The purpose of this study was to find out the difference between importance and performance regarding perception of core competence of gifted education teachers through importance-performance analysis (IPA). One hundred fourteen elementary gifted education teachers including math and science participated in the study. The collected survey data was analyzed with IPA matrix. As the result, firstly, there was significant difference between importance and performance regarding perception of core competence of gifted education teachers. Secondly, core competencies of 'understanding knowledge', 'research and instruction', 'passion and motivation', and 'ethics' are high in both perceptions of importance and performance. However, both 'communication and practices' and 'professional curriculum development' are low. Thirdly, there was a difference in core competence of gifted education teachers between math and science at the competence of 'passion and motivation'. Math gifted education teachers perceived 'passion and motivation' high in both importance and performance while science gifted education teachers perceived its importance low and performance high. In addition, math gifted education teachers showed lower performance compared to its importance in the sub-categories; 'knowledge of gifted development', 'gifted child assessment', 'information gathering and its literacy', and 'creative answers to various questions'. However, science gifted education teachers showed lower performance compared to its importance in sub-categories; 'higher-order thinking skills in its subject', 'teaching methodology for self-directed learning', 'problem behavior of the gifted', and 'counseling the gifted'.

Use of ChatGPT in college mathematics education (대학수학교육에서의 챗GPT 활용과 사례)

  • Sang-Gu Lee;Doyoung Park;Jae Yoon Lee;Dong Sun Lim;Jae Hwa Lee
    • The Mathematical Education
    • /
    • v.63 no.2
    • /
    • pp.123-138
    • /
    • 2024
  • This study described the utilization of ChatGPT in teaching and students' learning processes for the course "Introductory Mathematics for Artificial Intelligence (Math4AI)" at 'S' University. We developed a customized ChatGPT and presented a learning model in which students supplement their knowledge of the topic at hand by utilizing this model. More specifically, first, students learn the concepts and questions of the course textbook by themselves. Then, for any question they are unsure of, students may submit any questions (keywords or open problem numbers from the textbook) to our own ChatGPT at https://math4ai.solgitmath.com/ to get help. Notably, we optimized ChatGPT and minimized inaccurate information by fully utilizing various types of data related to the subject, such as textbooks, labs, discussion records, and codes at http://matrix.skku.ac.kr/Math4AI-ChatGPT/. In this model, when students have questions while studying the textbook by themselves, they can ask mathematical concepts, keywords, theorems, examples, and problems in natural language through the ChatGPT interface. Our customized ChatGPT then provides the relevant terms, concepts, and sample answers based on previous students' discussions and/or samples of Python or R code that have been used in the discussion. Furthermore, by providing students with real-time, optimized advice based on their level, we can provide personalized education not only for the Math4AI course, but also for any other courses in college math education. The present study, which incorporates our ChatGPT model into the teaching and learning process in the course, shows promising applicability of AI technology to other college math courses (for instance, calculus, linear algebra, discrete mathematics, engineering mathematics, and basic statistics) and in K-12 math education as well as the Lifespan Learning and Continuing Education.

Cancer-Subtype Classification Based on Gene Expression Data (유전자 발현 데이터를 이용한 암의 유형 분류 기법)

  • Cho Ji-Hoon;Lee Dongkwon;Lee Min-Young;Lee In-Beum
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.10 no.12
    • /
    • pp.1172-1180
    • /
    • 2004
  • Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.

Study of Joint Histogram Based Statistical Features for Early Detection of Lung Disease (폐질환 조기 검출을 위한 결합 히스토그램 기반의 통계적 특징 인자에 대한 연구)

  • Won, Chul-ho
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.10 no.4
    • /
    • pp.259-265
    • /
    • 2016
  • In this paper, new method was proposed to classify lung tissues such as Broncho vascular, Emphysema, Ground Glass Reticular, Ground Glass, Honeycomb, Normal for early lung disease detection. 459 Statistical features was extraced from joint histogram matrix based on multi resolution analysis, volumetric LBP, and CT intensity, then dominant features was selected by using adaboost learning. Accuracy of proposed features and 3D AMFM was 90.1% and 85.3%, respectively. Proposed joint histogram based features shows better classification result than 3D AMFM in terms of accuracy, sensitivity, and specificity.

Structural health monitoring through meta-heuristics - comparative performance study

  • Pholdee, Nantiwat;Bureerat, Sujin
    • Advances in Computational Design
    • /
    • v.1 no.4
    • /
    • pp.315-327
    • /
    • 2016
  • Damage detection and localisation in structures is essential since it can be a means for preventive maintenance of those structures under service conditions. The use of structural modal data for detecting the damage is one of the most efficient methods. This paper presents comparative performance of various state-of-the-art meta-heuristics for use in structural damage detection based on changes in modal data. The metaheuristics include differential evolution (DE), artificial bee colony algorithm (ABC), real-code ant colony optimisation (ACOR), charged system search (ChSS), league championship algorithm (LCA), simulated annealing (SA), particle swarm optimisation (PSO), evolution strategies (ES), teaching-learning-based optimisation (TLBO), adaptive differential evolution (JADE), evolution strategy with covariance matrix adaptation (CMAES), success-history based adaptive differential evolution (SHADE) and SHADE with linear population size reduction (L-SHADE). Three truss structures are used to pose several test problems for structural damage detection. The meta-heuristics are then used to solve the test problems treated as optimisation problems. Comparative performance is carried out where the statistically best algorithms are identified.