• Title/Summary/Keyword: Learning space

Search Result 1,482, Processing Time 0.025 seconds

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

A Case Study of Elementary Students' Developmental Pathway of Spatial Reasoning on Earth Revolution and Apparent Motion of Constellations (지구의 공전과 별자리의 겉보기 운동에 대한 초등학생들의 공간적 추론 발달 경로의 사례 연구)

  • Maeng, Seungho;Lee, Kiyoung
    • Journal of The Korean Association For Science Education
    • /
    • v.38 no.4
    • /
    • pp.481-494
    • /
    • 2018
  • This study investigated elementary students' understanding of Earth revolution and its accompanied apparent motion of constellation in terms of spatial reasoning. We designed a set of multi-tiered constructed response items in which students described their own idea about the reason of consecutive movement of constellations for three months and drew a diagram about relative locations of the Sun, the Earth, and the constellations. Sixty-five sixth grade students from four elementary schools participated in the tests both before and after science classes on the relative movement of Earth and Moon. Their answers to the items were categorized inductively in terms of transforming frames of reference which are observed on the Earth and designed from the Space-based perspective. We analyzed those categories by the levels of spatial reasoning and depicted the change of students' levels between pre/post-tests so that we could get an idea on the preliminary developmental pathway of students' understanding of this topic. The lower anchor description was that constellations move around the Earth with geocentric perspective. Intermediate level descriptions were planar understanding of Earth movement, intuitive idea on constellation movement along with the Earth. Students with intermediate levels did not reach understanding of the apparent motion of constellations. As the upper anchor description students understood the apparent motion of constellations according to the Earth revolution and could transform their frames of reference between Earth-based view and Space-based view. The features as the case of evolutionary learning progressions and critical points of students' development for this topic were discussed.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

A study for improvement of far-distance performance of a tunnel accident detection system by using an inverse perspective transformation (역 원근변환 기법을 이용한 터널 영상유고시스템의 원거리 감지 성능 향상에 관한 연구)

  • Lee, Kyu Beom;Shin, Hyu-Soung
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.3
    • /
    • pp.247-262
    • /
    • 2022
  • In domestic tunnels, it is mandatory to install CCTVs in tunnels longer than 200 m which are also recommended by installation of a CCTV-based automatic accident detection system. In general, the CCTVs in the tunnel are installed at a low height as well as near by the moving vehicles due to the spatial limitation of tunnel structure, so a severe perspective effect takes place in the distance of installed CCTV and moving vehicles. Because of this effect, conventional CCTV-based accident detection systems in tunnel are known in general to be very hard to achieve the performance in detection of unexpected accidents such as stop or reversely moving vehicles, person on the road and fires, especially far from 100 m. Therefore, in this study, the region of interest is set up and a new concept of inverse perspective transformation technique is introduced. Since moving vehicles in the transformed image is enlarged proportionally to the distance from CCTV, it is possible to achieve consistency in object detection and identification of actual speed of moving vehicles in distance. To show this aspect, two datasets in the same conditions are composed with the original and the transformed images of CCTV in tunnel, respectively. A comparison of variation of appearance speed and size of moving vehicles in distance are made. Then, the performances of the object detection in distance are compared with respect to the both trained deep-learning models. As a result, the model case with the transformed images are able to achieve consistent performance in object and accident detections in distance even by 200 m.

The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

  • Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.111-131
    • /
    • 2015
  • There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Study on the Conceptual Hierarchy for Seasonal Change (계절변화 개념 위계에 관한 연구)

  • Jung, Sun-La;Lee, Yong Bok
    • Journal of the Korean earth science society
    • /
    • v.34 no.4
    • /
    • pp.356-367
    • /
    • 2013
  • We study on the concept and reason of seasonal change that 164 university students have. Subsequently the concept types on the seasonal change are classified according to the characteristics and conceptual change after teaching on astronomy. All of the students were simply checked by the questionnaire of multiple choice and essay method before learning on the subjects. And then they answered to questionnaires of similar type after one semester. By the analyzed results, we classify it to three steps of hierarchical concept structure. The first step is the cosmic perspective that is related to the Earth's condition and motion. The second step is the influence of the Earth that is directly affected by the first step. The third step is observer's perspective on the Earth depending on the second step. Among the answers, the first step is prominent and second step is rare. The answers on the reason of seasonal change show some kinds of type which are 1st, 1-2nd, 1-3rd, and 1-2-3rd step. By the result, it is arranged in sequence like as 1-3rd>1st>1-2nd>1-2-3rd type. The lowest number of students was 2nd step of the Sun's altitude and duration of daytime in pre-test. However the students of 2nd step obtained more correct scientific concept on the seasonal change after learning on the subjects, and got the higher score in the post-test than in the pre-test. We found how much important the hierarchical structure on the reason of seasonal change is. As the results, second step on the learning of the Sun's altitude and duration of daytime essentially have to teach after first step. And then third step have to teach. At last, it is sure that the students can obtain the concept of seasonal change.

Estimation of Surface fCO2 in the Southwest East Sea using Machine Learning Techniques (기계학습법을 이용한 동해 남서부해역의 표층 이산화탄소분압(fCO2) 추정)

  • HAHM, DOSHIK;PARK, SOYEONA;CHOI, SANG-HWA;KANG, DONG-JIN;RHO, TAEKEUN;LEE, TONGSUP
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.3
    • /
    • pp.375-388
    • /
    • 2019
  • Accurate evaluation of sea-to-air $CO_2$ flux and its variability is crucial information to the understanding of global carbon cycle and the prediction of atmospheric $CO_2$ concentration. $fCO_2$ observations are sparse in space and time in the East Sea. In this study, we derived high resolution time series of surface $fCO_2$ values in the southwest East Sea, by feeding sea surface temperature (SST), salinity (SSS), chlorophyll-a (CHL), and mixed layer depth (MLD) values, from either satellite-observations or numerical model outputs, to three machine learning models. The root mean square error of the best performing model, a Random Forest (RF) model, was $7.1{\mu}atm$. Important parameters in predicting $fCO_2$ in the RF model were SST and SSS along with time information; CHL and MLD were much less important than the other parameters. The net $CO_2$ flux in the southwest East Sea, calculated from the $fCO_2$ predicted by the RF model, was $-0.76{\pm}1.15mol\;m^{-2}yr^{-1}$, close to the lower bound of the previous estimates in the range of $-0.66{\sim}-2.47mol\;m^{-2}yr^{-1}$. The time series of $fCO_2$ predicted by the RF model showed a significant variation even in a short time interval of a week. For accurate evaluation of the $CO_2$ flux in the Ulleung Basin, it is necessary to conduct high resolution in situ observations in spring when $fCO_2$ changes rapidly.

Development and Application of an Online Clinical Practicum Program on Emergency Nursing Care for Nursing Students (간호학생의 응급환자간호 임상실습 온라인 프로그램 개발 및 적용)

  • Kim, Weon-Gyeong;Park, Jeong-Min;Song, Chi-Eun
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.1
    • /
    • pp.131-142
    • /
    • 2021
  • Purpose: Clinical practicums via non-face-to-face methods were inevitable due to the COVID-19 pandemic. We developed an online program for emergency nursing care and identified the feasibility of the program and the learning achievements of students. Methods: This was a methodological study. The program was developed by three professors who taught theory and clinical practicum for adult nursing care and clinical experts. Students received four hours of video content and two task activities every week in four-week program. Real-time interactive video conferences were included. Qualitative and qualitative data were collected. Results: A total of 96 students participated in the program. The mean score for overall satisfaction with the online program was 4.72(±1.02) out of 6. Subjects that generally had high learning achievement scores were basic life support care, fall prevention, nursing documentation, infection control, and anaphylaxis care. As a result of a content analysis of 77 reflective logs on the advantages of this program, students reported that "experience in applying nursing process," "case-based learning and teaching method," and "No time and space constraints" were the program's best features. Conclusion: Collaboration between hospitals and universities for nursing is more important than ever to develop online content for effective clinical practicum.

An Exploratory study on derivation and Improvement of Kano Quality Attributes in Untact Classes (비대면 수업의 Kano 품질속성 도출과 개선에 관한 탐색적 연구)

  • Daeho Byun;Jaehoon Yang
    • Journal of Service Research and Studies
    • /
    • v.12 no.2
    • /
    • pp.65-79
    • /
    • 2022
  • Non-face-to-face classes continue due to Covid-19. There have been e-learning classes since the past, but the difference is that the current non-face-to-face classes are blended classes that combine real-time and recording classes or combine face-to-face and non-face classes. It is also characterized by being able to self-filmed or choose various lecture platforms in a place other than a dedicated studio. The advantages of non-face-to-face classes can be learned beyond time and space, and repetitive viewing and learning speed can be adjusted. Greening classes have no time and place constraints, and real-time classes have the advantage of high communication effects with learners. Evaluating whether non-face-to-face classes provide sufficient quality compared to face-to-face classes or e-learning will be necessary if branded classes are considered for post Covid. In this paper, for the evaluation of the service quality of non-face-to-face classes, the essential attributes desired by the instructors were derived from the viewpoint of Kano quality attributes and a quality improvement plan was proposed. After expressing the degree of functions that non-face-to-face classes should have on the X-axis and the satisfaction of learners on the Y-axis, 23 quality attributes were classified into 6 quality dimensions. In addition, satisfaction coefficient, dissatisfaction coefficient, and customer satisfaction improvement index were derived. As a result, 50% of learners were satisfied with non-face-to-face classes, but the preference was slightly higher than satisfaction, suggesting the sustainability of non-face-to-face classes. In terms of the customer satisfaction improvement index, the ranking of attributes with the largest increase in satisfaction when improving class quality was as follows. Professors' quick answers to learners' questions, content that can fully explain the subject, what the professor explains easily, develop high-quality content that can be learned on mobile phones, fairness of attendance checks, and real-time classes should start on time.