• Title/Summary/Keyword: numeric data

Search Result 242, Processing Time 0.027 seconds

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

  • Alyamani, Hasan J.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.283-287
    • /
    • 2022
  • Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

Predicting numeric ratings for Google apps using text features and ensemble learning

  • Umer, Muhammad;Ashraf, Imran;Mehmood, Arif;Ullah, Saleem;Choi, Gyu Sang
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.95-108
    • /
    • 2021
  • Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

Ethical Conducts in Qualitative Research Methodology :Participant Observation and Interview Process

  • KANG, Eungoo;HWANG, Hee-Joong
    • Journal of Research and Publication Ethics
    • /
    • v.2 no.2
    • /
    • pp.5-10
    • /
    • 2021
  • Purpose: Ethical behaviors become more salient when researchers utilize face-to-face interviews and observation with vulnerable groups or communities, which may be unable to express their emotions during the sessions. The present research aims to investigate ethical behaviors while conducting research have resonance due to the deep nature of observation and interview data collection methods. Research design, data and methodology: The present research obtained non-numeric (Textual) data based on prior literature review to investigate Ethical Conducts in Qualitative Research. Non-numeric data differs from numeric data in how the data is collected, analyzed and presented. It is important to formulate written questions and adopt them what the method claims for the researcher to understand the studied phenomenon. Results: Our findings show that while conducting qualitative research, researchers must adhere to the following ethical conducts; upholding informed consent, confidentiality and privacy, adhering to beneficence's principle, practicing honesty and integrity. Each ethical conduct is discoursed in detail to realize more information on how it impacts the researcher and research participants. Conclusions: The current authors concludes that five ethical conducts are important for realizing extensive and rich information during qualitative research and may be exploited in implementing research policies for researchers utilizing observation and interviews methods of data collection.

Block based Normalized Numeric Image Descriptor (블록기반 정규화 된 이미지 수 표현자)

  • Park, Yu-Yung;Cho, Sang-Bock;Lee, Jong-Hwa
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.61-68
    • /
    • 2012
  • This paper describes a normalized numeric image descriptor used to assess the luminance and contrast of the image. The proposed image descriptor used the each pixel data as weighted value of the probability density function (PDF) and defined by normalization in order to objective represent. The proposed image numeric descriptor can be used to the adaptive gamma process because it suggests the objective basis of the gamma value selection.

Printed Numeric Character Recognition using Fractal Dimension and Modified Henon Attractor (프랙탈 차원과 수정된 에농 어트랙터를 이용한 인쇄체 숫자인식)

  • 손영우
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.1
    • /
    • pp.89-96
    • /
    • 2003
  • This paper propose the new method witch is adopted in extracting character features and recognizing numeric characters using fractal dimension and modified Henon Attractor of the Chaos Theory. Firstly, it gets features of mesh feature, projection feature and cross distance feature from numeric character images And their feature hi converted into time series data. Then using the modified Henon system suggested in this paper, it gets last features of numeric character image after calculating Natural Measure and information bit which art meant fractal dimension. Finally, numeric character recognition is performed by statistically finding out the each information bit showing the minimum difference against the normalized pattern database. An Experimental result shows 100% character classification rates for 10 digits and 90% of recognition rates in real situation and the recognition speed was 26 characters per second.

  • PDF

Image and Altitude on the Alpha-Numeric Brand Name of Fashion Products (패션제품의 숫자 결합 상표명에 대한 이미지와 태도에 관한 연구)

  • Park Hye-Won;Ryou Eun-Jeong
    • The Research Journal of the Costume Culture
    • /
    • v.13 no.3 s.56
    • /
    • pp.494-502
    • /
    • 2005
  • The purposes of this study were to investigate the images on the alpha numeric brand name of fashion products, to identify the influences of clothing pursuit benefit on the brand name image and to determine the significant images on the attitude and purchasing intention. The data were collected via a self-administered questionnaire from 270 male and female students of undergraduate school in Kyongnam province during the March, 2004. Using SPSS 12.0 package, Cronbach's a, frequency analysis, factor analysis, and multiple regression analysis were performed. The results could be summarized as follows: First, the image dimensions of alpha-numeric brand name were composed of natural, new, active, urban, impactive and interesting image. Clothing pursuit benefits were composed of the brand value, attractiveness, fashion, individuality and economic value pursuit, Second, Clothing pursuit benefits had an significant effect on the image preference of alpha-numeric brand name. Especially, individuality pursuit and attractiveness pursuit have influenced on the natural, new, active, urban images. Third, the new, active, impactive, natural images had significant effects on the attitude, purchasing intention and conformity of products.

  • PDF

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

The Development of Hybrid Model and Empirical Study for the Several Inductive Approaches (여러 가지 Inductive 방법에 대한 통합모델 개발과 그 실증적 유효성에 대한 연구)

  • 김광용
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.23 no.3
    • /
    • pp.185-207
    • /
    • 1998
  • This research investigates computer generated hybrid second-order model of two numerically based approaches to risk classification : discriminant analysis and neural networks. The hybrid second-order models are derived by rule induction using the ID3 and tested in the several different kinds of data. This new hybrid approach is designed to combine the high prediction accuracy and robustness of DA or NN with perspicuity of ID3. The hybrid model also eliminates the problem of contradictory inputs of ID3. After doing empirical test for the validity of hybrid model using small and medium companies' bankrupt data, hybrid model shows high perspicuity, high prediction accuracy for bankrupt, and simplicity for rules. The hybrid model also shows high performance regardless the type of data such as numeric data, non-numeric data, and combined data.

  • PDF

A comparison between different TV channel input methods using numeric keypads (숫자판을 이용한 TV채널 입력방식에 대한 고찰)

  • Lee, Nam-Sik;Kim, Ho-Seong;Sin, Chan-Su
    • Journal of the Ergonomics Society of Korea
    • /
    • v.17 no.3
    • /
    • pp.61-70
    • /
    • 1998
  • The purpose of this paper is to evaluate the input methods of the numeric keypads that are widely used in various types of consumer and industrial electronic products. Three methods to enter numerals using keypads were compared: (1) Machine Intelligence, (2) + 100 key, and (3) Enter key input methods. Experiments were conducted to compare these three input methods for the channel selection of TV. Experimental prototypes which simulate TV user interfaces were developed using $RAPID^{TM}$ for usability testings. In the experiment, data on subject performance such as completion time, operational errors, and user interaction were collected through auto-logging method and video recording. After each test session, subjective preference was also asked using a questionnaire. In order to analyze the type of operation errors and the error causation, operation sequences were analyzed from the collected data. The Enter key input method showed better performance than other input methods. Based on these results. we can conclude that the input method using numeric keypads should be compatible with generic number counting(to input ch 7, it would be better to input 7 directly than to input 07 or 007) and should switch the channel as quickly as possible. This conclusion can be applied to the design of user interfaces which require numeral inputs.

  • PDF