• Title/Summary/Keyword: Existing Model

Search Result 9,628, Processing Time 0.039 seconds

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Development of Multimedia Annotation and Retrieval System using MPEG-7 based Semantic Metadata Model (MPEG-7 기반 의미적 메타데이터 모델을 이용한 멀티미디어 주석 및 검색 시스템의 개발)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.14D no.6
    • /
    • pp.573-584
    • /
    • 2007
  • As multimedia information recently increases fast, various types of retrieval of multimedia data are becoming issues of great importance. For the efficient multimedia data processing, semantics based retrieval techniques are required that can extract the meaning contents of multimedia data. Existing retrieval methods of multimedia data are annotation-based retrieval, feature-based retrieval and annotation and feature integration based retrieval. These systems take annotator a lot of efforts and time and we should perform complicated calculation for feature extraction. In addition. created data have shortcomings that we should go through static search that do not change. Also, user-friendly and semantic searching techniques are not supported. This paper proposes to develop S-MARS(Semantic Metadata-based Multimedia Annotation and Retrieval System) which can represent and extract multimedia data efficiently using MPEG-7. The system provides a graphical user interface for annotating, searching, and browsing multimedia data. It is implemented on the basis of the semantic metadata model to represent multimedia information. The semantic metadata about multimedia data is organized on the basis of multimedia description schema using XML schema that basically comply with the MPEG-7 standard. In conclusion. the proposed scheme can be easily implemented on any multimedia platforms supporting XML technology. It can be utilized to enable efficient semantic metadata sharing between systems, and it will contribute to improving the retrieval correctness and the user's satisfaction on embedding based multimedia retrieval algorithm method.

The Comparison of Existing Synthetic Unit Hydrograph Method in Korea (국내 기존 합성단위도 방법의 비교)

  • Jeong, Seong-Won;Mun, Jang-Won
    • Journal of Korea Water Resources Association
    • /
    • v.34 no.6
    • /
    • pp.659-672
    • /
    • 2001
  • Generally, design flood for a hydraulic structure is estimated using statistical analysis of runoff data. However, due to the lack of runoff data, it is difficult that the statistical method is applied for estimation of design flood. In this case, the synthetic unit hydrograph method is used generally and the models such as NYMO method, Snyder method, SCS method, and HYMO method have been widely used in Korea. In this study, these methods and KICT method, which is developed in year 2000, are compared and analyzed in 10 study areas. Firstly, peak flow and peak time of representative unit hydrograph and synthetic unit hydrograph in study area are compared, and secondly, the shape of unit hydrograph is compared using a root mean square error(RMSE). In Nakayasu method developed in Japan, synthetic unit hydrograph is very different from peak flow, peak time, and the shape of representative unit hydrograph, and KICT method(2000) is superior to others. Also, KICT method(2000) is superior to others in the aspects of using hydrologic and topographical data. Therefore, Nakayasu method is not a proper in hydrological practice. Moreover, it is considered that KICT model is a better method for the estimation of design flood. However, if other model, i.e. SCS method, Nakayasu method, and HYMO method, is used, parameters or regression equations must be adjusted by analysis of real data in Korea.

  • PDF

A Development and Application of the Landscape Evaluation Model Based on the Biotope Classification (비오톱 유형분류를 기반으로 한 경관평가 모형개발 및 적용)

  • Park, Cheon-Jin;Ra, Jung-Hwa;Cho, Hyun-Ju;Kim, Jin-Hyo;Kwon, Oh-Sung
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.40 no.4
    • /
    • pp.114-126
    • /
    • 2012
  • The purpose of this study is to find ways of the view evaluation of biotope classification before development by selecting an area, which is as large as about $10.0km^2$ around Non Gong Up, Auk Po Myun, Dalsung Gun, Daugu where the large project has been planned, as a subject of this study. The results of this study are as follows. Because of the classification of biotope, there are 23 kinds of types that are subdivided into 140 types. Three surveys for selecting the assessment indicators were performed. The first survey analyzed the importance of 22 selected assessment indicators based on the evaluation of an existing literature review and on the spot research. The second survey performed factor analysis and reclassified the value indicators. The third survey computed additive values of the selected assessment indicators. It used a method of standardizing the average importance of indicators by making their sum equal by 10. Theses additive values were then multiplied by each grade of indicators in order to make a final evaluation. The number of assessment indicators finally selected through the survey of asking specialist is vitality elements, visual obstructs elements etc 19. According to the result of evaluation of 1st, 1 grade spaces which especially valuable is analyzed that 7 spaces, 2 grade spaces for 4, 3 grade spaces for 5, 4 grade space for 2, 5 grade space for 5. Because of the evaluation of 2st, 1 grade spaces which especially valuable(1a, 1b) is analyzed that 15 spaces, 2 grade spaces which valuable is analyzed that 28 space. As the evaluation of site suitability model of this study couldn't have high applicability to other similar area because of having only one site as a subject, it is needed to do synthesize and standardization of various examples to have higher objectivity later.

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning (데이터 증강을 통한 딥러닝 기반 주가 패턴 예측 정확도 향상 방안)

  • Kim, Youngjun;Kim, Yeojeong;Lee, Insun;Lee, Hong Joo
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.1-12
    • /
    • 2019
  • As Artificial Intelligence (AI) technology develops, it is applied to various fields such as image, voice, and text. AI has shown fine results in certain areas. Researchers have tried to predict the stock market by utilizing artificial intelligence as well. Predicting the stock market is known as one of the difficult problems since the stock market is affected by various factors such as economy and politics. In the field of AI, there are attempts to predict the ups and downs of stock price by studying stock price patterns using various machine learning techniques. This study suggest a way of predicting stock price patterns based on the Convolutional Neural Network(CNN) among machine learning techniques. CNN uses neural networks to classify images by extracting features from images through convolutional layers. Therefore, this study tries to classify candlestick images made by stock data in order to predict patterns. This study has two objectives. The first one referred as Case 1 is to predict the patterns with the images made by the same-day stock price data. The second one referred as Case 2 is to predict the next day stock price patterns with the images produced by the daily stock price data. In Case 1, data augmentation methods - random modification and Gaussian noise - are applied to generate more training data, and the generated images are put into the model to fit. Given that deep learning requires a large amount of data, this study suggests a method of data augmentation for candlestick images. Also, this study compares the accuracies of the images with Gaussian noise and different classification problems. All data in this study is collected through OpenAPI provided by DaiShin Securities. Case 1 has five different labels depending on patterns. The patterns are up with up closing, up with down closing, down with up closing, down with down closing, and staying. The images in Case 1 are created by removing the last candle(-1candle), the last two candles(-2candles), and the last three candles(-3candles) from 60 minutes, 30 minutes, 10 minutes, and 5 minutes candle charts. 60 minutes candle chart means one candle in the image has 60 minutes of information containing an open price, high price, low price, close price. Case 2 has two labels that are up and down. This study for Case 2 has generated for 60 minutes, 30 minutes, 10 minutes, and 5minutes candle charts without removing any candle. Considering the stock data, moving the candles in the images is suggested, instead of existing data augmentation techniques. How much the candles are moved is defined as the modified value. The average difference of closing prices between candles was 0.0029. Therefore, in this study, 0.003, 0.002, 0.001, 0.00025 are used for the modified value. The number of images was doubled after data augmentation. When it comes to Gaussian Noise, the mean value was 0, and the value of variance was 0.01. For both Case 1 and Case 2, the model is based on VGG-Net16 that has 16 layers. As a result, 10 minutes -1candle showed the best accuracy among 60 minutes, 30 minutes, 10 minutes, 5minutes candle charts. Thus, 10 minutes images were utilized for the rest of the experiment in Case 1. The three candles removed from the images were selected for data augmentation and application of Gaussian noise. 10 minutes -3candle resulted in 79.72% accuracy. The accuracy of the images with 0.00025 modified value and 100% changed candles was 79.92%. Applying Gaussian noise helped the accuracy to be 80.98%. According to the outcomes of Case 2, 60minutes candle charts could predict patterns of tomorrow by 82.60%. To sum up, this study is expected to contribute to further studies on the prediction of stock price patterns using images. This research provides a possible method for data augmentation of stock data.

  • PDF

An Empirical Study on How the Moderating Effects of Individual Cultural Characteristics towards a Specific Target Affects User Experience: Based on the Survey Results of Four Types of Digital Device Users in the US, Germany, and Russia (특정 대상에 대한 개인 수준의 문화적 성향이 사용자 경험에 미치는 조절효과에 대한 실증적 연구: 미국, 독일, 러시아의 4개 디지털 기기 사용자를 대상으로)

  • Lee, In-Seong;Choi, Gi-Woong;Kim, So-Lyung;Lee, Ki-Ho;Kim, Jin-Woo
    • Asia pacific journal of information systems
    • /
    • v.19 no.1
    • /
    • pp.113-145
    • /
    • 2009
  • Recently, due to the globalization of the IT(Information Technology) market, devices and systems designed in one country are used in other countries as well. This phenomenon is becoming the key factor for increased interest on cross-cultural, or cross-national, research within the IT area. However, as the IT market is becoming bigger and more globalized, a great number of IT practitioners are having difficulty in designing and developing devices or systems which can provide optimal experience. This is because not only tangible factors such as language and a country's economic or industrial power affect the user experience of a certain device or system but also invisible and intangible factors as well. Among such invisible and intangible factors, the cultural characteristics of users from different countries may affect the user experience of certain devices or systems because cultural characteristics affect how they understand and interpret the devices or systems. In other words, when users evaluate the quality of overall user experience, the cultural characteristics of each user act as a perceptual lens that leads the user to focus on a certain elements of experience. Therefore, there is a need within the IT field to consider cultural characteristics when designing or developing certain devices or systems and plan a strategy for localization. In such an environment, existing IS studies identify the culture with the country, emphasize the importance of culture in a national level perspective, and hypothesize that users within the same country have same cultural characteristics. Under such assumptions, these studies focus on the moderating effects of cultural characteristics on a national level within a certain theoretical framework. This has already been suggested by cross-cultural studies conducted by scholars such as Hofstede(1980) in providing numerical research results and measurement items for cultural characteristics and using such results or items as they increase the efficiency of studies. However, such national level culture has its limitations in forecasting and explaining individual-level behaviors such as voluntary device or system usage. This is because individual cultural characteristics are the outcome of not only the national culture but also the culture of a race, company, local area, family, and other groups that are formulated through interaction within the group. Therefore, national or nationally dominant cultural characteristics may have its limitations in forecasting and explaining the cultural characteristics of an individual. Moreover, past studies in psychology suggest a possibility that there exist different cultural characteristics within a single individual depending on the subject being measured or its context. For example, in relation to individual vs. collective characteristics, which is one of the major cultural characteristics, an individual may show collectivistic characteristics when he or she is with family or friends but show individualistic characteristics in his or her workplace. Therefore, this study acknowledged such limitations of past studies and conducted a research within the framework of 'theoretically integrated model of user satisfaction and emotional attachment', which was developed through a former study, on how the effects of different experience elements on emotional attachment or user satisfaction are differentiated depending on the individual cultural characteristics related to a system or device usage. In order to do this, this study hypothesized the moderating effects of four cultural dimensions (uncertainty avoidance, individualism vs, collectivism, masculinity vs. femininity, and power distance) as suggested by Hofstede(1980) within the theoretically integrated model of emotional attachment and user satisfaction. Statistical tests were then implemented on these moderating effects through conducting surveys with users of four digital devices (mobile phone, MP3 player, LCD TV, and refrigerator) in three countries (US, Germany, and Russia). In order to explain and forecast the behavior of personal device or system users, individual cultural characteristics must be measured, and depending on the target device or system, measurements must be measured independently. Through this suggestion, this study hopes to provide new and useful perspectives for future IS research.

Personification of On-line Shopping Mall -Focusing on the Social Presence- (온라인 쇼핑몰의 의인화 전략 -사회적 실재감을 중심으로-)

  • Park, Ju-Sik
    • Management & Information Systems Review
    • /
    • v.31 no.2
    • /
    • pp.143-172
    • /
    • 2012
  • While e-commerce market(B2C) grows rapidly, many experts argue that EC(B2C) transactions have not reached its full potential. A notable difference between online and offline consumer markets that is suppressing the growth of EC(B2C) is the decreased presence of human and social elements in the online shopping environments. Generally online shopping lacks human warmth and sociability. In this study, social presence in online shopping mall was proposed as a substitute for face-to-face social interaction in the traditional commerce and author explored what variables affect social presence(human warmth and sociability) on online shopping malls and how human warmth and sociability can influence on online store loyalty. To achieve research objectives, we reviewed literatures related with marketing, psychology and communication research areas. Based on literature review, we proposed a research model on the online shopping mall. To examine the proposed research model, we gathered data by using a self-report questionnaire. Respondents consists of online shoppers with at least five or more times of purchase experience in online shopping malls. Because social presence is a feeling which needs frequent contacts with malls to experience, respondents must have enough purchase experiences. The empirical results are as follows : First, shopping mall's customization efforts influence perceived social presence on the mall significantly. Second, shopping mall's responsiveness influences perceived social presence significantly. Third, perceived activity of community of online shopping mall influences perceived social presence significantly. Mall managers have to activate their customer community to reinforce social presence, resulting in trust building. Finally, perceived social presence influences trust and enjoyment on the mall significantly. And then trust and enjoyment on the mall affect store loyalty significantly. From these findings it can be inferred that perceived social presence appears determinant which is critical to the formation of core variables(trust and loyalty) in existing online shopping papers.

  • PDF

Performance of Collaboration Activities upon SME's Idiosyncrasy (중소기업 특성에 따른 외부 협업 활동이 혁신성과에 미치는 영향)

  • Lee, Hye Sun;Oh, Junseok;Lee, Jaeki;Lee, Bong Gyou
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.95-105
    • /
    • 2013
  • Recently, SME's Collaboration activities have become one of a vital factor for sustaining competitive edge. This is because of the rapidly changing and competitive market environment, and also to leverage performance by overcoming obstacles of having limited internal resources. Discussing about the effects and relationships of the firm's collaboration activities and its outputs are not new. However, as ICT and various technologies have been diffused into the traditional industries, boundaries and practice capabilities within the industries are becoming ambiguous. Thus contents of the products/services and their development methods are also go and come over the industries. Although many researchers suggested the relations of SME's collaboration activities and innovation performances, most of the previous literatures are focusing on broad perspectives of firm's environmental factors rather than considering various SME's idiosyncrasy factors such as their major product and customer types at once. Therefore, the purpose of this paper is to analyze how SME(Small Medium Enterprise)'s external collaboration activities by their idiosyncrasy act as an input to types of innovation performance. In order to analyze collaboration effects in detail, we defined factors that can represent the SME's business environment - Perceived importance of using external resources, Perceived importance of external partnership, Collaboration and Collaboration levels of Major Product types, Customer types and lastly the Firm Sizes. We have also specifically divided the performance of innovation types as product innovation and process innovation based on existing research. In this study, the empirical analysis is based on Probit Regression Model to observe the correlations with the impact of each SME's business environment and their activities. For the empirical data, 497 samples were collected which, this sample data was extracted from the 'Korean Open Innovation Survey' performed by ETRI(Korean Electronics Telecommunications Research Institute) in 2010. As a result, empirical test results indicated that the impact of collaboration varies depend on the innovation types (Product and Process Innovation). The Impact of the collaboration level for the product innovation tend to be more effective when SMEs are developing for a final product, targeting on for individual customers (B2C). But on the other hand, the analysis result of the Process innovation tend to be higher than the product innovation, when SMEs are developing raw materials for their partners or to other firms targeting on for manufacturing industries(B2B). Also perceived importance of using external resources has effected to both product and process innovation performance. But Perceived importance of external partnership was statistically insignificant. Interesting finding was that the service product has negative effects on for the process innovation performance. And Relationship between size of the firms and their external collaboration activities with their performance of the innovations indicated that the bigger firms(over 100 of employees) tend to have better for both product and process innovations. Finally, implications of the results can be suggested as performance of innovation can be varied depends on firm's unique business idiosyncrasy as well as levels of external collaboration activities. The Implication of this research can be considered for firms in selecting an appropriate strategy as well as for policy makers.

IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents (특허문서 필드의 기능적 특성을 활용한 IPC 다중 레이블 분류)

  • Lim, Sora;Kwon, YongJin
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.77-88
    • /
    • 2017
  • Recently, with the advent of knowledge based society where information and knowledge make values, patents which are the representative form of intellectual property have become important, and the number of the patents follows growing trends. Thus, it needs to classify the patents depending on the technological topic of the invention appropriately in order to use a vast amount of the patent information effectively. IPC (International Patent Classification) is widely used for this situation. Researches about IPC automatic classification have been studied using data mining and machine learning algorithms to improve current IPC classification task which categorizes patent documents by hand. However, most of the previous researches have focused on applying various existing machine learning methods to the patent documents rather than considering on the characteristics of the data or the structure of patent documents. In this paper, therefore, we propose to use two structural fields, technical field and background, considered as having impacts on the patent classification, where the two field are selected by applying of the characteristics of patent documents and the role of the structural fields. We also construct multi-label classification model to reflect what a patent document could have multiple IPCs. Furthermore, we propose a method to classify patent documents at the IPC subclass level comprised of 630 categories so that we investigate the possibility of applying the IPC multi-label classification model into the real field. The effect of structural fields of patent documents are examined using 564,793 registered patents in Korea, and 87.2% precision is obtained in the case of using title, abstract, claims, technical field and background. From this sequence, we verify that the technical field and background have an important role in improving the precision of IPC multi-label classification in IPC subclass level.

The Factors Influencing Intention to Use Bit Coin of Domestic Consumers (국내 소비자들의 비트코인 사용 의도에 영향을 미치는 요인 연구)

  • Shin, Dong-Hee;Kim, Yong-Moon
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.1
    • /
    • pp.24-41
    • /
    • 2016
  • Study is about Bit Coin that is electronic cash that is received attention globally in recent. It is increasing domestically that uses bit coin for convenience of micro payment, and also bit coin is possible to exchange each countries' currency. In this point, we searched understanding degree and acceptance of bit coin. Also we applied transformed TAM(Technology Acceptance Model) to search factors that have an effect on consumers' intention to use it. In advance, we analyze features of bit coin, and extract factors through preceding researches for existing electronic cash, because studies for intention to use bit coin are weak in internal and external. First of results is that 'economic efficiency' which is a characteristic variable of bit coin influences 'intention to use,' a dependent variable through 'perceived usefulness,' a parameter. It was investigated that monetary and mental costs that was costed when we use bit coin were less than using other cash. Secondly, 'payment convenience' that is a characteristic variable affects 'intention to use', a dependent variable through 'perceived usefulness,' a parameter. It was measured that problems of inconvenience that include transaction process, cash management time shortage and exchange changes will be solved by using bit coin. Thirdly, 'reliability' that is a perceived risk variable of bit coin has a direct effect on 'intention to use,' a dependent variable. It was investigated that we could achieve purpose of payment because we weren't influenced by breakdown on system by processing distributed database in some computers. Fourthly, 'perceived usefulness,' a parameter of bit coin directly affects 'intention to use,' a dependent variable. Then consumers who want to use bit coin are fascinated bit coin for various usability. Moreover, we want to provide implications to all of finance corporations, companies related electronic cash and bit coin users based on these results.