• 제목/요약/키워드: numeric data

검색결과 242건 처리시간 0.026초

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

  • Alyamani, Hasan J.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권1호
    • /
    • pp.283-287
    • /
    • 2022
  • Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.

특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법 (Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping)

  • 이재성;김대원
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권12호
    • /
    • pp.1024-1027
    • /
    • 2009
  • 본 논문에서는 혼합형 데이터에 대한 특징 선별 기법의 효율성을 비교하기 위해 특징 필터링과 특징 래핑을 통한 특징 선별 후, 클래스 분류 성능을 측정하였다. 혼합형 데이터는 숫자형 특징과 범주형 특징이 함께 혼합되어 있으므로, 숫자형 특징을 범주형 특징으로 이산화를 하여 단일형 데이터로 변환한 뒤 특징 선별 기법 등을 적용할 수 있다. 본 연구에서는 혼합형 데이터를 전처리하여 단일형 데이터로 변환하고, 널리 활용되는 특징 필터링 기법과 특징 래핑 기법을 통해 클래스 분류 성능을 높일 수 있는 특징 집합을 선별하였다. 선별된 특징 집합을 통한 클래스 분류 성능을 비교한 결과, 특징 필터링에 비해 특징 래핑을 통해 선별한 특징 집합을 활용하여 클래스 분류를 하였을 때 분류 정확도가 높은 것을 확인할 수 있었다.

Predicting numeric ratings for Google apps using text features and ensemble learning

  • Umer, Muhammad;Ashraf, Imran;Mehmood, Arif;Ullah, Saleem;Choi, Gyu Sang
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.95-108
    • /
    • 2021
  • Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

Ethical Conducts in Qualitative Research Methodology :Participant Observation and Interview Process

  • KANG, Eungoo;HWANG, Hee-Joong
    • 연구윤리
    • /
    • 제2권2호
    • /
    • pp.5-10
    • /
    • 2021
  • Purpose: Ethical behaviors become more salient when researchers utilize face-to-face interviews and observation with vulnerable groups or communities, which may be unable to express their emotions during the sessions. The present research aims to investigate ethical behaviors while conducting research have resonance due to the deep nature of observation and interview data collection methods. Research design, data and methodology: The present research obtained non-numeric (Textual) data based on prior literature review to investigate Ethical Conducts in Qualitative Research. Non-numeric data differs from numeric data in how the data is collected, analyzed and presented. It is important to formulate written questions and adopt them what the method claims for the researcher to understand the studied phenomenon. Results: Our findings show that while conducting qualitative research, researchers must adhere to the following ethical conducts; upholding informed consent, confidentiality and privacy, adhering to beneficence's principle, practicing honesty and integrity. Each ethical conduct is discoursed in detail to realize more information on how it impacts the researcher and research participants. Conclusions: The current authors concludes that five ethical conducts are important for realizing extensive and rich information during qualitative research and may be exploited in implementing research policies for researchers utilizing observation and interviews methods of data collection.

블록기반 정규화 된 이미지 수 표현자 (Block based Normalized Numeric Image Descriptor)

  • 박유영;조상복;이종화
    • 대한전자공학회논문지SP
    • /
    • 제49권2호
    • /
    • pp.61-68
    • /
    • 2012
  • 본 논문에서는 이미지 밝기와 명암을 명확하고 객관적으로 평가하기 위한 정규화된 수 표현자를 제안한다. 제안하는 수 표현자는 이미지를 구성하는 각각의 픽셀 데이터 값을 확률밀도함수(PDF)의 가중치로 사용하고 이를 정규화하여 객관적으로 표현되도록 정의되었다. 제안된 정규화 된 이미지 수 표현자는 감마보정 처리 시에 객관적인 감마 값 선택 기준을 제시하므로 적응형 감마보정처리가 가능하다.

프랙탈 차원과 수정된 에농 어트랙터를 이용한 인쇄체 숫자인식 (Printed Numeric Character Recognition using Fractal Dimension and Modified Henon Attractor)

  • 손영우
    • 한국멀티미디어학회논문지
    • /
    • 제6권1호
    • /
    • pp.89-96
    • /
    • 2003
  • 본 논문은 카오스 이론의 프랙탈 차원과 수정된 에농 어트랙터를 이용하여 인쇄체 숫자를 인식하는 새로운 방법을 제안한다. 먼저 숫자 영상으로부터 망 특징 투영 특징, 교차거리 특징을 1차 구한 후, 이 특징들을 시계열 데이터로 변환한다. 그리고 본 논문에서 제안한 수정된 에농 시스템을 이용하여 프랙탈 차원을 나타내는 자연 척도 및 정보 비트값을 구한다. 마지막으로 표준패턴 데이터베이스와 비교하여, 최소 거리값을 이용하여 숫자 인식을 행한다. 실험 결과 10가지 숫자에 대하여 100%의 분류율을 나타내었고, 또한 실제 문서를 대상으로 실험한 결과 90%의 인식률과 초당 26자의 인식속도를 보임으로써 제안된 방법의 유효성을 보였다.

  • PDF

패션제품의 숫자 결합 상표명에 대한 이미지와 태도에 관한 연구 (Image and Altitude on the Alpha-Numeric Brand Name of Fashion Products)

  • 박혜원;류은정
    • 복식문화연구
    • /
    • 제13권3호
    • /
    • pp.494-502
    • /
    • 2005
  • The purposes of this study were to investigate the images on the alpha numeric brand name of fashion products, to identify the influences of clothing pursuit benefit on the brand name image and to determine the significant images on the attitude and purchasing intention. The data were collected via a self-administered questionnaire from 270 male and female students of undergraduate school in Kyongnam province during the March, 2004. Using SPSS 12.0 package, Cronbach's a, frequency analysis, factor analysis, and multiple regression analysis were performed. The results could be summarized as follows: First, the image dimensions of alpha-numeric brand name were composed of natural, new, active, urban, impactive and interesting image. Clothing pursuit benefits were composed of the brand value, attractiveness, fashion, individuality and economic value pursuit, Second, Clothing pursuit benefits had an significant effect on the image preference of alpha-numeric brand name. Especially, individuality pursuit and attractiveness pursuit have influenced on the natural, new, active, urban images. Third, the new, active, impactive, natural images had significant effects on the attitude, purchasing intention and conformity of products.

  • PDF

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

여러 가지 Inductive 방법에 대한 통합모델 개발과 그 실증적 유효성에 대한 연구 (The Development of Hybrid Model and Empirical Study for the Several Inductive Approaches)

  • 김광용
    • 한국경영과학회지
    • /
    • 제23권3호
    • /
    • pp.185-207
    • /
    • 1998
  • This research investigates computer generated hybrid second-order model of two numerically based approaches to risk classification : discriminant analysis and neural networks. The hybrid second-order models are derived by rule induction using the ID3 and tested in the several different kinds of data. This new hybrid approach is designed to combine the high prediction accuracy and robustness of DA or NN with perspicuity of ID3. The hybrid model also eliminates the problem of contradictory inputs of ID3. After doing empirical test for the validity of hybrid model using small and medium companies' bankrupt data, hybrid model shows high perspicuity, high prediction accuracy for bankrupt, and simplicity for rules. The hybrid model also shows high performance regardless the type of data such as numeric data, non-numeric data, and combined data.

  • PDF

숫자판을 이용한 TV채널 입력방식에 대한 고찰 (A comparison between different TV channel input methods using numeric keypads)

  • 이남식;김호성;신찬수
    • 대한인간공학회지
    • /
    • 제17권3호
    • /
    • pp.61-70
    • /
    • 1998
  • The purpose of this paper is to evaluate the input methods of the numeric keypads that are widely used in various types of consumer and industrial electronic products. Three methods to enter numerals using keypads were compared: (1) Machine Intelligence, (2) + 100 key, and (3) Enter key input methods. Experiments were conducted to compare these three input methods for the channel selection of TV. Experimental prototypes which simulate TV user interfaces were developed using $RAPID^{TM}$ for usability testings. In the experiment, data on subject performance such as completion time, operational errors, and user interaction were collected through auto-logging method and video recording. After each test session, subjective preference was also asked using a questionnaire. In order to analyze the type of operation errors and the error causation, operation sequences were analyzed from the collected data. The Enter key input method showed better performance than other input methods. Based on these results. we can conclude that the input method using numeric keypads should be compatible with generic number counting(to input ch 7, it would be better to input 7 directly than to input 07 or 007) and should switch the channel as quickly as possible. This conclusion can be applied to the design of user interfaces which require numeral inputs.

  • PDF