• Title/Summary/Keyword: method: data analysis

Search Result 22,164, Processing Time 0.053 seconds

Informal Quality Data Analysis via Sentimental analysis and Word2vec method (감성분석과 Word2vec을 이용한 비정형 품질 데이터 분석)

  • Lee, Chinuk;Yoo, Kook Hyun;Mun, Byeong Min;Bae, Suk Joo
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.1
    • /
    • pp.117-128
    • /
    • 2017
  • Purpose: This study analyzes automobile quality review data to develop alternative analytical method of informal data. Existing methods to analyze informal data are based mainly on the frequency of informal data, however, this research tries to use correlation information of each informal data. Method: After sentimental analysis to acquire the user information for automobile products, three classification methods, that is, $na{\ddot{i}}ve$ Bayes, random forest, and support vector machine, were employed to accurately classify the informal user opinions with respect to automobile qualities. Additionally, Word2vec was applied to discover correlated information about informal data. Result: As applicative results of three classification methods, random forest method shows most effective results compared to the other classification methods. Word2vec method manages to discover closest relevant data with automobile components. Conclusion: The proposed method shows its effectiveness in terms of accuracy and sensitivity on the analysis of informal quality data, however, only two sentiments (positive or negative) can be categorized due to human errors. Further studies are required to derive more sentiments to accurately classify informal quality data. Word2vec method also shows comparative results to discover the relevance of components precisely.

UNCERTAINTY ANALYSIS OF DATA-BASED MODELS FOR ESTIMATING COLLAPSE MOMENTS OF WALL-THINNED PIPE BENDS AND ELBOWS

  • Kim, Dong-Su;Kim, Ju-Hyun;Na, Man-Gyun;Kim, Jin-Weon
    • Nuclear Engineering and Technology
    • /
    • v.44 no.3
    • /
    • pp.323-330
    • /
    • 2012
  • The development of data-based models requires uncertainty analysis to explain the accuracy of their predictions. In this paper, an uncertainty analysis of the support vector regression (SVR) model, which is a data-based model, was performed because previous research showed that the SVR method accurately estimates the collapse moments of wall-thinned pipe bends and elbows. The uncertainty analysis method used in this study was an analytic uncertainty analysis method, and estimates with a 95% confidence interval were obtained for 370 test data points. From the results, the prediction interval (PI) was very narrow, which means that the predicted values are quite accurate. Therefore, the proposed SVR method can be used effectively to assess and validate the integrity of the wall-thinned pipe bends and elbows.

A New Sampling Method of Marine Climatic Data for Infrared Signature Analysis (적외선 신호 해석을 위한 해양 기상 표본 추출법)

  • Kim, Yoonsik;Vaitekunas, David A.
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.51 no.3
    • /
    • pp.193-202
    • /
    • 2014
  • This paper presents a new method of sampling the climatic data for infrared signature analysis. Historical hourly data from a stationary marine buoy of KMA(Korean Meteorological Administration) are used to select a small number of sample points (N=100) to adequately cover the range of statistics(PDF, CDF) displayed by the original data set (S=56,670). The method uses a coarse bin to subdivide the variable space ($3^5$=243 bins) to make sample points cover the original data range, and a single-point ranking system to select individual points so that uniform coverage (1/N = 0.01) is obtained for each variable. The principal component analysis is used to calculate a joint probability of the coupled climatic variables. The selected sample data show good agreement to the original data set in statistical distribution and they will be used for statistical analysis of infrared signature and susceptibility of naval ships.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

On the Analysis Method and its Application of Warranty Data (보증데이터 분석방법과 적용에 관한 연구)

  • Kim, Jong-Geol;Kim, Hye-Mi;Yun, Hye-Seon
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2012.04a
    • /
    • pp.525-534
    • /
    • 2012
  • The issue is all about the study of warranty data collection and the analysis method to get a reasonable information of the products and improve reliability. In this paper, we consider the classification of warranty data analyses into a parametric and non-parametric analysis and method to get a reasonable information of the products. Also, it is considered the research trend by grouping the relationship among the studies. This study would be used to find the effective application and the condition of warranty data analysis.

  • PDF

Utilization of Social Media Analysis using Big Data (빅 데이터를 이용한 소셜 미디어 분석 기법의 활용)

  • Lee, Byoung-Yup;Lim, Jong-Tae;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.211-219
    • /
    • 2013
  • The analysis method using Big Data has evolved based on the Big data Management Technology. There are quite a few researching institutions anticipating new era in data analysis using Big Data and IT vendors has been sided with them launching standardized technologies for Big Data management technologies. Big Data is also affected by improvements of IT gadgets IT environment. Foreran by social media, analyzing method of unstructured data is being developed focusing on diversity of analyzing method, anticipation and optimization. In the past, data analyzing methods were confined to the optimization of structured data through data mining, OLAP, statics analysis. This data analysis was solely used for decision making for Chief Officers. In the new era of data analysis, however, are evolutions in various aspects of technologies; the diversity in analyzing method using new paradigm and the new data analysis experts and so forth. In addition, new patterns of data analysis will be found with the development of high performance computing environment and Big Data management techniques. Accordingly, this paper is dedicated to define the possible analyzing method of social media using Big Data. this paper is proposed practical use analysis for social media analysis through data mining analysis methodology.

Study of Mental Disorder Schizophrenia, based on Big Data

  • Hye-Sun Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.4
    • /
    • pp.279-285
    • /
    • 2023
  • This study provides academic implications by considering trends of domestic research regarding therapy for Mental disorder schizophrenia and psychosocial. For the analysis of this study, text mining with the use of R program and social network analysis method have been used and 65 papers have been collected The result of this study is as follows. First, collected data were visualized through analysis of keywords by using word cloud method. Second, keywords such as intervention, schizophrenia, research, patients, program, effect, society, mind, ability, function were recorded with highest frequency resulted from keyword frequency analysis. Third, LDA (latent Dirichlet allocation) topic modeling result showed that classified into 3 keywords: patient, subjects, intervention of psychosocial, efficacy of interventions. Fourth, the social network analysis results derived connectivity, closeness centrality, betweennes centrality. In conclusion, this study presents significant results as it provided basic rehabilitation data for schizophrenia and psychosocial therapy through new research methods by analyzing with big data method by proposing the results through visualization from seeking research trends of schizophrenia and psychosocial therapy through text mining and social network analysis.

A dimensional reduction method in cluster analysis for multidimensional data: principal component analysis and factor analysis comparison (다차원 데이터의 군집분석을 위한 차원축소 방법: 주성분분석 및 요인분석 비교)

  • Hong, Jun-Ho;Oh, Min-Ji;Cho, Yong-Been;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.135-143
    • /
    • 2020
  • This paper proposes a pre-processing method and a dimensional reduction method in the analysis of shopping carts where there are many correlations between variables when dividing the types of consumers in the agri-food consumer panel data. Cluster analysis is a widely used method for dividing observational objects into several clusters in multivariate data. However, cluster analysis through dimensional reduction may be more effective when several variables are related. In this paper, the food consumption data surveyed of 1,987 households was clustered using the K-means method, and 17 variables were re-selected to divide it into the clusters. Principal component analysis and factor analysis were compared as the solution for multicollinearity problems and as the way to reduce dimensions for clustering. In this study, both principal component analysis and factor analysis reduced the dataset into two dimensions. Although the principal component analysis divided the dataset into three clusters, it did not seem that the difference among the characteristics of the cluster appeared well. However, the characteristics of the clusters in the consumption pattern were well distinguished under the factor analysis method.

A Method for Engineering Change Analysis by Using OLAP (OLAP를 이용한 설계변경 분석 방법에 관한 연구)

  • Do, Namchul
    • Korean Journal of Computational Design and Engineering
    • /
    • v.19 no.2
    • /
    • pp.103-110
    • /
    • 2014
  • Engineering changes are indispensable engineering and management activities for manufactures to develop competitive products and to maintain consistency of its product data. Analysis of engineering changes provides a core functionality to support decision makings for engineering change management. This study aims to develop a method for analysis of engineering changes based on On-Line Analytical Processing (OLAP), a proven database analysis technology that has been applied to various business areas. This approach automates data processing for engineering change analysis from product databases that follow an international standard for product data management (PDM), and enables analysts to analyze various aspects of engineering changes with its OLAP operations. The study consists of modeling a standard PDM database and a multidimensional data model for engineering change analysis, implementing the standard and multidimensional models with PDM and data cube systems and applying the implemented data cube to core functions of engineering change management, the evaluation and propagation of engineering changes.

A NODE PREDICTION ALGORITHM WITH THE MAPPER METHOD BASED ON DBSCAN AND GIOTTO-TDA

  • DONGJIN LEE;JAE-HUN JUNG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.27 no.4
    • /
    • pp.324-341
    • /
    • 2023
  • Topological data analysis (TDA) is a data analysis technique, recently developed, that investigates the overall shape of a given dataset. The mapper algorithm is a TDA method that considers the connectivity of the given data and converts the data into a mapper graph. Compared to persistent homology, another popular TDA tool, that mainly focuses on the homological structure of the given data, the mapper algorithm is more of a visualization method that represents the given data as a graph in a lower dimension. As it visualizes the overall data connectivity, it could be used as a prediction method that visualizes the new input points on the mapper graph. The existing mapper packages such as Giotto-TDA, Gudhi and Kepler Mapper provide the descriptive mapper algorithm, that is, the final output of those packages is mainly the mapper graph. In this paper, we develop a simple predictive algorithm. That is, the proposed algorithm identifies the node information within the established mapper graph associated with the new emerging data point. By checking the feature of the detected nodes, such as the anomality of the identified nodes, we can determine the feature of the new input data point. As an example, we employ the fraud credit card transaction data and provide an example that shows how the developed algorithm can be used as a node prediction method.