• Title/Summary/Keyword: 벡터화 방식

Search Result 203, Processing Time 0.024 seconds

Fatigue Classification Model Based On Machine Learning Using Speech Signals (음성신호를 이용한 기계학습 기반 피로도 분류 모델)

  • Lee, Soo Hwa;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.741-747
    • /
    • 2022
  • Fatigue lowers an individual's ability and makes it difficult to perform work. As fatigue accumulates, concentration decreases and thus the possibility of causing a safety accident increases. Awareness of fatigue is subjective, but it is necessary to quantitatively measure the level of fatigue in the actual field. In previous studies, it was proposed to measure the level of fatigue by expert judgment by adding objective indicators such as bio-signal analysis to subjective evaluations such as multidisciplinary fatigue scales. However this method is difficult to evaluate fatigue in real time in daily life. This paper is a study on the fatigue classification model that determines the fatigue level of workers in real time using speech data recorded in the field. Machine learning models such as logistic classification, support vector machine, and random forest are trained using speech data collected in the field. The performance evaluation showed good performance with accuracy of 0.677 to 0.758, of which logistic classification showed the best performance. From the experimental results, it can be seen that it is possible to classify the fatigue level using speech signals.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Real-time Implementation of the AMR Speech Coder Using $OakDSPCore^{\circledR}$ ($OakDSPCore^{\circledR}$를 이용한 적응형 다중 비트 (AMR) 음성 부호화기의 실시간 구현)

  • 이남일;손창용;이동원;강상원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.34-39
    • /
    • 2001
  • An adaptive multi-rate (AMR) speech coder was adopted as a standard of W-CDMA by 3GPP and ETSI. The AMR coder is based on the CELP algorithm operating at rates ranging from 12.2 kbps down to 4.75 kbps, and it is a source controlled codec according to the channel error conditions and the traffic loading. In this paper, we implement the DSP S/W of the AMR coder using OakDSPCore. The implementation is based on the CSD17C00A chip developed by C&S Technology, and it is tested using test vectors, for the AMR speech codec, provided by ETSI for the bit exact implementation. The DSP B/W requires 20.6 MIPS for the encoder and 2.7 MIPS for the decoder. Memories required by the Am coder were 21.97 kwords, 6.64 kwords and 15.1 kwords for code, data sections and data ROM, respectively. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.

  • PDF

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

An Agent System for Supporting Adaptive Web Surfing (적응형 웹 서핑 지원을 위한 에이전트 시스템)

  • Kook, Hyung-Joon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.399-406
    • /
    • 2002
  • The goal of this research has been to develop an adaptive user agent for web surfing. To achieve this goal, the research has concentrated on three issues: collection of user data, construction and improvement of user profile, and adaptation by applying the user profile. The main outcome from the research is a prototype system that provides the functional definition and componential design scheme for an adaptive user agent for the web environment. Internally, the system achieves its operational goal from the cooperation of two independent agents. They are IIA (Interactive Interface Agent) and UPA (User Profiling Agent). As a tool for providing a user-friendly interface environment, the IIA employs the Keyword Index, which is a list of index terms of a webpage as well as a keyword menu for subsequent queries, and the Suggest Link, which is a hierarchical list of URLs showing the past browsing procedure of the user. The UPA reflects in the User Profile, both the static and the dynamic information obtained from the user's browsing behavior. In particular, a user's interests are represented in the form of Interest Vectors which, based on the similarity of the vectors, is subject to update and creation, thus dynamically profiling the user's ever-shifting interests.

Floating Point Unit Design for the IEEE754-2008 (IEEE754-2008을 위한 고속 부동소수점 연산기 설계)

  • Hwang, Jin-Ha;Kim, Hyun-Pil;Park, Sang-Su;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.10
    • /
    • pp.82-90
    • /
    • 2011
  • Because of the development of Smart phone devices, the demands of high performance FPU(Floating-point Unit) becomes increasing. Therefore, we propose the high-speed single-/double-precision FPU design that includes an elementary add/sub unit and improved multiplier and compare and convert units. The most commonly used add/sub unit is optimized by the parallel rounding unit. The matrix operation is used in complex calculation something like a graphic calculation. We designed the Multiply-Add Fused(MAF) instead of multiplier to calculate the matrix more quickly. The branch instruction that is decided by the compare operation is very frequently used in various programs. We bypassed the result of the compare operation before all the pipeline processes ended to decrease the total execution time. And we included additional convert operations that are added in IEEE754-2008 standard. To verify our RTL designs, we chose four hundred thousand test vectors by weighted random method and simulated each unit. The FPU that was synthesized by Samsung's 45-nm low-power process satisfied the 600-MHz operation frequency. And we confirm a reduction in area by comparing the improved FPU with the existing FPU.

Design and Implementation of High-dimensional Index Structure for the support of Concurrency Control (필터링에 기반한 고차원 색인구조의 동시성 제어기법의 설계 및 구현)

  • Lee, Yong-Ju;Chang, Jae-Woo;Kim, Hang-Young;Kim, Myung-Joon
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.1-12
    • /
    • 2003
  • Recently, there have been many indexing schemes for multimedia data such as image, video data. But recent database applications, for example data mining and multimedia database, are required to support multi-user environment. In order for indexing schemes to be useful in multi-user environment, a concurrency control algorithm is required to handle it. So we propose a concurrency control algorithm that can be applied to CBF (cell-based filtering method), which uses the signature of the cell for alleviating the dimensional curse problem. In addition, we extend the SHORE storage system of Wisconsin university in order to handle high-dimensional data. This extended SHORE storage system provides conventional storage manager functions, guarantees the integrity of high-dimensional data and is flexible to the large scale of feature vectors for preventing the usage of large main memory. Finally, we implement the web-based image retrieval system by using the extended SHORE storage system. The key feature of this system is platform-independent access to the high-dimensional data as well as functionality of efficient content-based queries. Lastly. We evaluate an average response time of point query, range query and k-nearest query in terms of the number of threads.

The Role of Geographic Information System and Its Functional Intergration Strategy in the Conventional Transportation Planning Process (전통교통계획과정에 있어서 GIS의 역할 및 기능적 통합방안에 관한 연구)

  • Choi, Kee-Choo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.1 no.1 s.1
    • /
    • pp.127-140
    • /
    • 1993
  • The purpose of this paper is to examine the possible benefits of combining transportation planning models with geographic information systems (GIS) in the hope that intergrating these systems can alleviate the inherent problems of transportation planning models such as user unfriendliness, labor intensiveness, and theoretical limitations. Specially, this paper focuses on the issue of incompatiblity between GIS and the conventional transportation planning models in dealing with network topologies. Resolving this conflict in topologies is a conerstone for eliminating the user-unfriendliness and labor-intensiveness issues. This paper presents the development of an algorithm that converts GIS topology into transportation network topology. The FORTRAN-based topology conversion algorithm generates transportation networks from the GIS cartographic file and establishes a communication charmel between the two systems.

  • PDF

A Study on Improving Facial Recognition Performance to Introduce a New Dog Registration Method (새로운 반려견 등록방식 도입을 위한 안면 인식 성능 개선 연구)

  • Lee, Dongsu;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.27 no.5
    • /
    • pp.794-807
    • /
    • 2022
  • Although registration of dogs is mandatory according to the revision of the Animal Protection Act, the registration rate is low due to the inconvenience of the current registration method. In this paper, a performance improvement study was conducted on the dog face recognition technology, which is being reviewed as a new registration method. Through deep learning learning, an embedding vector for facial recognition of a dog was created and a method for identifying each dog individual was experimented. We built a dog image dataset for deep learning learning and experimented with InceptionNet and ResNet-50 as backbone networks. It was learned by the triplet loss method, and the experiments were divided into face verification and face recognition. In the ResNet-50-based model, it was possible to obtain the best facial verification performance of 93.46%, and in the face recognition test, the highest performance of 91.44% was obtained in rank-5, respectively. The experimental methods and results presented in this paper can be used in various fields, such as checking whether a dog is registered or not, and checking an object at a dog access facility.

Predicting Crime Risky Area Using Machine Learning (머신러닝기반 범죄발생 위험지역 예측)

  • HEO, Sun-Young;KIM, Ju-Young;MOON, Tae-Heon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.4
    • /
    • pp.64-80
    • /
    • 2018
  • In Korea, citizens can only know general information about crime. Thus it is difficult to know how much they are exposed to crime. If the police can predict the crime risky area, it will be possible to cope with the crime efficiently even though insufficient police and enforcement resources. However, there is no prediction system in Korea and the related researches are very much poor. From these backgrounds, the final goal of this study is to develop an automated crime prediction system. However, for the first step, we build a big data set which consists of local real crime information and urban physical or non-physical data. Then, we developed a crime prediction model through machine learning method. Finally, we assumed several possible scenarios and calculated the probability of crime and visualized the results in a map so as to increase the people's understanding. Among the factors affecting the crime occurrence revealed in previous and case studies, data was processed in the form of a big data for machine learning: real crime information, weather information (temperature, rainfall, wind speed, humidity, sunshine, insolation, snowfall, cloud cover) and local information (average building coverage, average floor area ratio, average building height, number of buildings, average appraised land value, average area of residential building, average number of ground floor). Among the supervised machine learning algorithms, the decision tree model, the random forest model, and the SVM model, which are known to be powerful and accurate in various fields were utilized to construct crime prevention model. As a result, decision tree model with the lowest RMSE was selected as an optimal prediction model. Based on this model, several scenarios were set for theft and violence cases which are the most frequent in the case city J, and the probability of crime was estimated by $250{\times}250m$ grid. As a result, we could find that the high crime risky area is occurring in three patterns in case city J. The probability of crime was divided into three classes and visualized in map by $250{\times}250m$ grid. Finally, we could develop a crime prediction model using machine learning algorithm and visualized the crime risky areas in a map which can recalculate the model and visualize the result simultaneously as time and urban conditions change.