• Title/Summary/Keyword: mixed feature set

Search Result 12, Processing Time 0.024 seconds

Combined Feature Set and Hybrid Feature Selection Method for Effective Document Classification (효율적인 문서 분류를 위한 혼합 특징 집합과 하이브리드 특징 선택 기법)

  • In, Joo-Ho;Kim, Jung-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.49-57
    • /
    • 2013
  • A novel approach for the feature selection is proposed, which is the important preprocessing task of on-line document classification. In previous researches, the features based on information from their single population for feature selection task have been selected. In this paper, a mixed feature set is constructed by selecting features from multi-population as well as single population based on various information. The mixed feature set consists of two feature sets: the original feature set that is made up of words on documents and the transformed feature set that is made up of features generated by LSA. The hybrid feature selection method using both filter and wrapper method is used to obtain optimal features set from the mixed feature set. We performed classification experiments using the obtained optimal feature sets. As a result of the experiments, our expectation that our approach makes better performance of classification is verified, which is over 90% accuracy. In particular, it is confirmed that our approach has over 90% recall and precision that have a low deviation between categories.

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

Cluster-based Linear Projection and %ixture of Experts Model for ATR System (자동 목표물 인식 시스템을 위한 클러스터 기반 투영기법과 혼합 전문가 구조)

  • 신호철;최재철;이진성;조주현;김성대
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.3
    • /
    • pp.203-216
    • /
    • 2003
  • In this paper a new feature extraction and target classification method is proposed for the recognition part of FLIR(Forwar Looking Infrared)-image-based ATR system. Proposed feature extraction method is "cluster(=set of classes)-based"version of previous fisherfaces method that is known by its robustness to illumination changes in face recognition. Expecially introduced class clustering and cluster-based projection method maximizes the performance of fisherfaces method. Proposed target image classification method is based on the mixture of experts model which consists of RBF-type experts and MLP-type gating networks. Mixture of experts model is well-suited with ATR system because it should recognizee various targets in complexed feature space by variously mixed conditions. In proposed classification method, one expert takes charge of one cluster and the separated structure with experts reduces the complexity of feature space and achieves more accurate local discrimination between classes. Proposed feature extraction and classification method showed distinguished performances in recognition test with customized. FLIR-vehicle-image database. Expecially robustness to pixelwise sensor noise and un-wanted intensity variations was verified by simulation.

An enhanced feature selection filter for classification of microarray cancer data

  • Mazumder, Dilwar Hussain;Veilumuthu, Ramachandran
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.358-370
    • /
    • 2019
  • The main aim of this study is to select the optimal set of genes from microarray cancer datasets that contribute to the prediction of specific cancer types. This study proposes the enhancement of the feature selection filter algorithm based on Joe's normalized mutual information and its use for gene selection. The proposed algorithm is implemented and evaluated on seven benchmark microarray cancer datasets, namely, central nervous system, leukemia (binary), leukemia (3 class), leukemia (4 class), lymphoma, mixed lineage leukemia, and small round blue cell tumor, using five well-known classifiers, including the naive Bayes, radial basis function network, instance-based classifier, decision-based table, and decision tree. An average increase in the prediction accuracy of 5.1% is observed on all seven datasets averaged over all five classifiers. The average reduction in training time is 2.86 seconds. The performance of the proposed method is also compared with those of three other popular mutual information-based feature selection filters, namely, information gain, gain ratio, and symmetric uncertainty. The results are impressive when all five classifiers are used on all the datasets.

Refined identification of hybrid traffic in DNS tunnels based on regression analysis

  • Bai, Huiwen;Liu, Guangjie;Zhai, Jiangtao;Liu, Weiwei;Ji, Xiaopeng;Yang, Luhui;Dai, Yuewei
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.40-52
    • /
    • 2021
  • DNS (Domain Name System) tunnels almost obscure the true network activities of users, which makes it challenging for the gateway or censorship equipment to identify malicious or unpermitted network behaviors. An efficient way to address this problem is to conduct a temporal-spatial analysis on the tunnel traffic. Nevertheless, current studies on this topic limit the DNS tunnel to those with a single protocol, whereas more than one protocol may be used simultaneously. In this paper, we concentrate on the refined identification of two protocols mixed in a DNS tunnel. A feature set is first derived from DNS query and response flows, which is incorporated with deep neural networks to construct a regression model. We benchmark the proposed method with captured DNS tunnel traffic, the experimental results show that the proposed scheme can achieve identification accuracy of more than 90%. To the best of our knowledge, the proposed scheme is the first to estimate the ratios of two mixed protocols in DNS tunnels.

Economics of Self-Generation by Natural Gas Industry Using the Mixed Integer Program (혼합정수계획법을 이용한 천연가스(LNG) 산업의 자가발전소 건설에 대한 경제성 분석)

  • Lee, Jeong-Dong;Byun, Sang-Kyu;Kim, Tai-Yoo
    • IE interfaces
    • /
    • v.13 no.4
    • /
    • pp.658-667
    • /
    • 2000
  • Seasonal variation of natural gas demand coupled with rigid and stable import pattern of gas represents the characteristic feature of the Korean Liquified Natural Gas(LNG) industry. This attribute has required a huge amount of investment for the construction of storage facility. Thus, to minimize the supply cost, it is legitimate to reduce storage requirement itself. In this study, we combine three alternative methods to deal with the storage requirement to minimize the supply cost. Those are (1) adding additional storage tanks, (2) inducing large firm customers, and (3) constructing gas-turbine self generation facilities. Methodologically, we employ the mixed integer program (MIP) to optimize the system. The model also consider demand and price-setting scheme in separate modules. From the results, it is shown that if alternatives are combined optimally, a number of storage tanks can be reduced substantially compared with the original capacity plan set by the industry authorities. We perform various sensitivity analyses to check the robustness of the results. The methodology presented in this study can be applied to the other physical network industry, such as hydraulics. The empirical results will shed some light on the rationalization of capacity planning of the Korean natural gas industry.

  • PDF

A Genetic Algorithm for the Traveling Salesman Problem Using Prufer Number (Prufer 수를 이용한 외판원문제의 유전해법)

  • 이재승;신해웅;강맹규
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.20 no.41
    • /
    • pp.1-14
    • /
    • 1997
  • This study proposes a genetic algorithm using Pr(equation omitted)fer number for the traveling salesman problem(PNGATSP). Nearest neighbor nodes are mixed with randomly selected nodes at the stage of generating initial solutions. Proposed PNGATSP adopts a few ideas which are different from traditional genetic algorithms. For instance, an exponential fitness function and elitism are used and Pr(equation omitted)fer number is used for encoding TSP. Genetic operators are selected by experiments, which make a good solution among four combinations of conventional genetic operators and new genetic operators. For respective combinations, robust set of parameters is determined by the experimental designing approach. The feature of Pr(equation omitted)fer number code for TSP and the search power of GA using Pr(equation omitted)fer number is analysed. The best is a combination of OX(order crossover) and swap, which is superior to the other experimented combinations of genetic operators by 1.0%∼12.8% deviation.

  • PDF

A Hangul Script Matching Algorithm for PDA (PDA상에서의 한글 필기체 매칭 알고리즘)

  • Cho, Mi-Gyung;Cho, Hwan-Gue
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.10
    • /
    • pp.684-693
    • /
    • 2002
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(PDAs) for supporting natural and convenient data input. One of the most Important issue is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique. We did various experiments and our algorithm showed high matching rate over 97.7% for only the Korean script and 94% for the data mixed Korean with the Chinese character.

An experimental investigation of thermodynamic performance of R-22 alternative blends (R-22 대체용 혼합냉매의 열역학적 성능에 대한 실험연구)

  • Hwang, E.P.;Kim, C.N.;Park, Y.M.
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.9 no.1
    • /
    • pp.82-91
    • /
    • 1997
  • R-410a and R-407c witch have the best potential among the substances being considered as R-22 alternatives were tested as "drop in" refrigerants against a set R-22 baseline tests for comparison. The performance evaluations were carried out in a psychrometric calorimeter test facility using the residential split-type air conditioner under the ARI rating conditions. Other than the use of different lubricant and a hand-operated expansion valve, one of the commercial systems was selected for the experiment. Performance characteristics were measured; compressor power, capacity, VCR, mass flow rate and COP. The tests showed that R-407c can be directly applied to the existing refrigeration system because of its similar vapor pressure and other thermopysical properties with those of R-22. However, it required change to the volume flow rate of compressor in order to achieve the similar performance with R-22 because of its relatively small VCR and capacity. Meanwhile, R-410a has too high a vapor pressure to be applied to the existing system and this feature results in relatively low COP of the system compared to that of R-22. But this could be improved by changing compressor design considering R-410a's relatively high VCR and capacity compared to those of R-22.

  • PDF

Estimating Simulation Parameters for Kint Fabrics from Static Drapes (정적 드레이프를 이용한 니트 옷감의 시뮬레이션 파라미터 추정)

  • Ju, Eunjung;Choi, Myung Geol
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.5
    • /
    • pp.15-24
    • /
    • 2020
  • We present a supervised learning method that estimates the simulation parameters required to simulate the fabric from the static drape shape of a given fabric sample. The static drape shape was inspired by Cusick's drape, which is used in the apparel industry to classify fabrics according to their mechanical properties. The input vector of the training model consists of the feature vector extracted from the static drape and the density value of a fabric specimen. The output vector consists of six simulation parameters that have a significant influence on deriving the corresponding drape result. To generate a plausible and unbiased training data set, we first collect simulation parameters for 400 knit fabrics and generate a Gaussian Mixed Model (GMM) generation model from them. Next, a large number of simulation parameters are randomly sampled from the GMM model, and cloth simulation is performed for each sampled simulation parameter to create a virtual static drape. The generated training data is fitted with a log-linear regression model. To evaluate our method, we check the accuracy of the training results with a test data set and compare the visual similarity of the simulated drapes.