• Title/Summary/Keyword: Feature learning

Search Result 1,897, Processing Time 0.028 seconds

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

A COVID-19 Diagnosis Model based on Various Transformations of Cough Sounds (기침 소리의 다양한 변환을 통한 코로나19 진단 모델)

  • Minkyung Kim;Gunwoo Kim;Keunho Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.57-78
    • /
    • 2023
  • COVID-19, which started in Wuhan, China in November 2019, spread beyond China in 2020 and spread worldwide in March 2020. It is important to prevent a highly contagious virus like COVID-19 in advance and to actively treat it when confirmed, but it is more important to identify the confirmed fact quickly and prevent its spread since it is a virus that spreads quickly. However, PCR test to check for infection is costly and time consuming, and self-kit test is also easy to access, but the cost of the kit is not easy to receive every time. Therefore, if it is possible to determine whether or not a person is positive for COVID-19 based on the sound of a cough so that anyone can use it easily, anyone can easily check whether or not they are confirmed at anytime, anywhere, and it can have great economic advantages. In this study, an experiment was conducted on a method to identify whether or not COVID-19 was confirmed based on a cough sound. Cough sound features were extracted through MFCC, Mel-Spectrogram, and spectral contrast. For the quality of cough sound, noisy data was deleted through SNR, and only the cough sound was extracted from the voice file through chunk. Since the objective is COVID-19 positive and negative classification, learning was performed through XGBoost, LightGBM, and FCNN algorithms, which are often used for classification, and the results were compared. Additionally, we conducted a comparative experiment on the performance of the model using multidimensional vectors obtained by converting cough sounds into both images and vectors. The experimental results showed that the LightGBM model utilizing features obtained by converting basic information about health status and cough sounds into multidimensional vectors through MFCC, Mel-Spectogram, Spectral contrast, and Spectrogram achieved the highest accuracy of 0.74.

Improved Sentence Boundary Detection Method for Web Documents (웹 문서를 위한 개선된 문장경계인식 방법)

  • Lee, Chung-Hee;Jang, Myung-Gil;Seo, Young-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.455-463
    • /
    • 2010
  • In this paper, we present an approach to sentence boundary detection for web documents that builds on statistical-based methods and uses rule-based correction. The proposed system uses the classification model learned offline using a training set of human-labeled web documents. The web documents have many word-spacing errors and frequently no punctuation mark that indicates the end of sentence boundary. As sentence boundary candidates, the proposed method considers every Ending Eomis as well as punctuation marks. We optimize engine performance by selecting the best feature, the best training data, and the best classification algorithm. For evaluation, we made two test sets; Set1 consisting of articles and blog documents and Set2 of web community documents. We use F-measure to compare results on a large variety of tasks, Detecting only periods as sentence boundary, our basis engine showed 96.5% in Set1 and 56.7% in Set2. We improved our basis engine by adapting features and the boundary search algorithm. For the final evaluation, we compared our adaptation engine with our basis engine in Set2. As a result, the adaptation engine obtained improvements over the basis engine by 39.6%. We proved the effectiveness of the proposed method in sentence boundary detection.

Analysis of Fish Blocking Effect using Illuminance Difference (조도 차이를 이용한 어류 차단 효과 분석)

  • Kang, Joon-Gu;Kang, Su-Jin;Kim, Jong-Tae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.9
    • /
    • pp.76-83
    • /
    • 2017
  • Fish respond sensitively to light, so it is possible to develop fish management technology using this feature. In this study, we developed a light-based fish barrier and analyzed itsblocking effect using the difference in illuminance for the major fish species in Korea, bass and bluegill. The light was generated by a light emitting diode and the facility was installed vertically from the bottom. Considering the fish's ability to travel upstream, the flow rate was divided into three stages (0.2, 0.1, and 0.05 m/s). To prevent the learning effect, an experiment was carried out with fish that had rested for more than one day in a rearing tank. The experiment was carried out in such a way as tocompare the number of fish which travelled upstream after the introductionof the fish barrier and that of the fish which travelled upstream after itsremoval. It was also carried out after sunset to increase the effectiveness of the barrier. According to the results of the experiment, the fish blocking effect depending on the difference in illuminance was high and, overall, the blocking rate for bass was lower than that for bluegill. Based on the total size of the experimental population, the blocking rates for bass and bluegill were 96.33% and 99.00%, respectively. Based on the number of fish that travelled upstream, the blocking rates for bass and bluegill were 91.73% and 98.73%, respectively.

RBM-based distributed representation of language (RBM을 이용한 언어의 분산 표상화)

  • You, Heejo;Nam, Kichun;Nam, Hosung
    • Korean Journal of Cognitive Science
    • /
    • v.28 no.2
    • /
    • pp.111-131
    • /
    • 2017
  • The connectionist model is one approach to studying language processing from a computational perspective. And building a representation in the connectionist model study is just as important as making the structure of the model in that it determines the level of learning and performance of the model. The connectionist model has been constructed in two different ways: localist representation and distributed representation. However, the localist representation used in the previous studies had limitations in that the unit of the output layer having a rare target activation value is inactivated, and the past distributed representation has the limitation of difficulty in confirming the result by the opacity of the displayed information. This has been a limitation of the overall connection model study. In this paper, we present a new method to induce distributed representation with local representation using abstraction of information, which is a feature of restricted Boltzmann machine, with respect to the limitation of such representation of the past. As a result, our proposed method effectively solves the problem of conventional representation by using the method of information compression and inverse transformation of distributed representation into local representation.

Comprehensive Measures the Elimination of Violence in Schools validated - Centered on the fundamental countermeasures - (학교폭력 근절 종합대책에 대한 유효성 검증 - 근본대책을 중심으로 -)

  • Jung, Sung Sook
    • Convergence Security Journal
    • /
    • v.13 no.5
    • /
    • pp.187-196
    • /
    • 2013
  • Recently, school violence has come to the fore as a social phenomenon. "Comprehensive countermeasures for eradication of school violence" as a policy safety are created by Safety Administration bureau and Ministry of Education, Science and Technology under the chairmanship of the Office of Prime Minister on Feb,2012. This policy is supposed to be test-operated for a year from March, 2012. but voices of concern about effectiveness have been brought up by some critics greatly. So 172 teachers in high school in Seoul were surveyed in order to examine the effectiveness of "Comprehensive countermeasures for eradication of school violence" with a questionnaire composed of 5 point Likert-type. Among the fundamental measures, there were a total of 12 countermeasures about 'Practices for personality education' (with the exception of unrelated one question). 'Expanding opportunities of various art education and Supporting reading activities' of them ranked highest on average. Then, 'Reflecting results of special feature related to character develops to the Selection of Admission officers and Self-directed learning was the next. And among the three countermeasures about 'Reinforcement of roles of the family and society', 'Pan governmental conducting annual campaign related to broadcast, press, civic group to combat school violence was highest. Finally, among the 7 countermeasures about 'Countermeasure about harmful factors of games and internet addiction', 'Reinforcement of preventive discipline about game and internet addiction' was highest and 'Development and Promotion of various educational contents for preventive discipline about game and internet addiction' was the next.

A Study of Statistical Learning as a CRM s Classifier Functions (CRM의 기능 분류를 위한 통계적 학습에 관한 연구)

  • Jang, Geun;Lee, Jung-Bae;Lee, Byung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.71-76
    • /
    • 2004
  • The recent ERP and CRM is mostly focused on the conventional function performances. However, the recent business environment has brought the change in market due to the rapid progress of internet and e-commerce. It is mostly becoming e-business and spreading out as development of the relationship with other cooperating companies, the rapid progress of the relationship with customers, and intensification competitive power through the development of business progress in the organization. CRM(custom relationship management) is a kind of the marketing progress which forms, manages, and intensifies the relationship between the customers and companies to manage the acquired customers and increase the worth of customers for the company. It needs the system base which analyzes the information of customers since it functions on the basis of various information about customers and is linked to the business category such as producing, marketing, and decision making. Since ERP is extending its function to SCM, CRM, and SEM(strategic Enterprise Management), the 21 century s ERP develop as the strategy tool of e-business and, as the mediation for this, will subdivide the functions of CRM effectively by the analogic study of data. Also, to accomplish classification work of the file which in existing becomes accomplished with possibility work with an automatic movement with the user will be able to accomplish a more efficiently work the agent which in order leads the machine studying law, it is one thing with system feature.

The Improvement of Convergence Characteristic using the New RLS Algorithm in Recycling Buffer Structures

  • Kim, Gwang-Jun;Kim, Chun-Suck
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.4
    • /
    • pp.691-698
    • /
    • 2003
  • We extend the sue of the method of least square to develop a recursive algorithm for the design of adaptive transversal filters such that, given the least-square estimate of this vector of the filter at iteration n-l, we may compute the updated estimate of this vector at iteration n upon the arrival of new data. We begin the development of the RLS algorithm by reviewing some basic relations that pertain to the method of least squares. Then, by exploiting a relation in matrix algebra known as the matrix inversion lemma, we develop the RLS algorithm. An important feature of the RLS algorithm is that it utilizes information contained in the input data, extending back to the instant of time when the algorithm is initiated. In this paper, we propose new tap weight updated RLS algorithm in adaptive transversal filter with data-recycling buffer structure. We prove that convergence speed of learning curve of RLS algorithm with data-recycling buffer is faster than it of exiting RLS algorithm to mean square error versus iteration number. Also the resulting rate of convergence is typically an order of magnitude faster than the simple LMS algorithm. We show that the number of desired sample is portion to increase to converge the specified value from the three dimension simulation result of mean square error according to the degree of channel amplitude distortion and data-recycle buffer number. This improvement of convergence character in performance, is achieved at the B times of convergence speed of mean square error increase in data recycle buffer number with new proposed RLS algorithm.

Air-conditioning and Heating Time Prediction Based on Artificial Neural Network and Its Application in IoT System (냉난방 시간을 예측하는 인공신경망의 구축 및 IoT 시스템에서의 활용)

  • Kim, Jun-soo;Lee, Ju-ik;Kim, Dongho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.347-350
    • /
    • 2018
  • In order for an IoT system to automatically make the house temperature pleasant for the user, the system needs to predict the optimal start-up time of air-conditioner or heater to get to the temperature that the user has set. Predicting the optimal start-up time is important because it prevents extra fee from the unnecessary operation of the air-conditioner and heater. This paper introduces an ANN(Artificial Neural Network) and an IoT system that predicts the cooling and heating time in households using air-conditioner and heater. Many variables such as house structure, house size, and external weather condition affect the cooling and heating. Out of the many variables, measurable variables such as house temperature, house humidity, outdoor temperature, outdoor humidity, wind speed, wind direction, and wind chill was used to create training data for constructing the model. After constructing the ANN model, an IoT system that uses the model was developed. The IoT system comprises of a main system powered by Raspberry Pi 3 and a mobile application powered by Android. The mobile's GPS sensor and an developed feature used to predict user's return.

  • PDF

A Study on Spam Document Classification Method using Characteristics of Keyword Repetition (단어 반복 특징을 이용한 스팸 문서 분류 방법에 관한 연구)

  • Lee, Seong-Jin;Baik, Jong-Bum;Han, Chung-Seok;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.315-324
    • /
    • 2011
  • In Web environment, a flood of spam causes serious social problems such as personal information leak, monetary loss from fishing and distribution of harmful contents. Moreover, types and techniques of spam distribution which must be controlled are varying as days go by. The learning based spam classification method using Bag-of-Words model is the most widely used method until now. However, this method is vulnerable to anti-spam avoidance techniques, which recent spams commonly have, because it classifies spam documents utilizing only keyword occurrence information from classification model training process. In this paper, we propose a spam document detection method using a characteristic of repeating words occurring in spam documents as a solution of anti-spam avoidance techniques. Recently, most spam documents have a trend of repeating key phrases that are designed to spread, and this trend can be used as a measure in classifying spam documents. In this paper, we define six variables, which represent a characteristic of word repetition, and use those variables as a feature set for constructing a classification model. The effectiveness of proposed method is evaluated by an experiment with blog posts and E-mail data. The result of experiment shows that the proposed method outperforms other approaches.