• Title/Summary/Keyword: Automatic Recognition

Search Result 1,070, Processing Time 0.029 seconds

ICLAL: In-Context Learning-Based Audio-Language Multi-Modal Deep Learning Models (ICLAL: 인 컨텍스트 러닝 기반 오디오-언어 멀티 모달 딥러닝 모델)

  • Jun Yeong Park;Jinyoung Yeo;Go-Eun Lee;Chang Hwan Choi;Sang-Il Choi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.514-517
    • /
    • 2023
  • 본 연구는 인 컨택스트 러닝 (In-Context Learning)을 오디오-언어 작업에 적용하기 위한 멀티모달 (Multi-Modal) 딥러닝 모델을 다룬다. 해당 모델을 통해 학습 단계에서 오디오와 텍스트의 소통 가능한 형태의 표현 (Representation)을 학습하고 여러가지 오디오-텍스트 작업을 수행할 수 있는 멀티모달 딥러닝 모델을 개발하는 것이 본 연구의 목적이다. 모델은 오디오 인코더와 언어 인코더가 연결된 구조를 가지고 있으며, 언어 모델은 6.7B, 30B 의 파라미터 수를 가진 자동회귀 (Autoregressive) 대형 언어 모델 (Large Language Model)을 사용한다 오디오 인코더는 자기지도학습 (Self-Supervised Learning)을 기반으로 사전학습 된 오디오 특징 추출 모델이다. 언어모델이 상대적으로 대용량이기 언어모델의 파라미터를 고정하고 오디오 인코더의 파라미터만 업데이트하는 프로즌 (Frozen) 방법으로 학습한다. 학습을 위한 과제는 음성인식 (Automatic Speech Recognition)과 요약 (Abstractive Summarization) 이다. 학습을 마친 후 질의응답 (Question Answering) 작업으로 테스트를 진행했다. 그 결과, 정답 문장을 생성하기 위해서는 추가적인 학습이 필요한 것으로 보였으나, 음성인식으로 사전학습 한 모델의 경우 정답과 유사한 키워드를 사용하는 문법적으로 올바른 문장을 생성함을 확인했다.

Design of Vehicle-mounted Loading and Unloading Equipment and Autonomous Control Method using Deep Learning Object Detection (차량 탑재형 상·하역 장비의 설계와 딥러닝 객체 인식을 이용한 자동제어 방법)

  • Soon-Kyo Lee;Sunmok Kim;Hyowon Woo;Suk Lee;Ki-Baek Lee
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.1
    • /
    • pp.79-91
    • /
    • 2024
  • Large warehouses are building automation systems to increase efficiency. However, small warehouses, military bases, and local stores are unable to introduce automated logistics systems due to lack of space and budget, and are handling tasks manually, failing to improve efficiency. To solve this problem, this study designed small loading and unloading equipment that can be mounted on transportation vehicles. The equipment can be controlled remotely and is automatically controlled from the point where pallets loaded with cargo are visible using real-time video from an attached camera. Cargo recognition and control command generation for automatic control are achieved through a newly designed deep learning model. This model is designed to be optimized for loading and unloading equipment and mission environments based on the YOLOv3 structure. The trained model recognized 10 types of palettes with different shapes and colors with an average accuracy of 100% and estimated the state with an accuracy of 99.47%. In addition, control commands were created to insert forks into pallets without failure in 14 scenarios assuming actual loading and unloading situations.

Research on PEFT Feasibility for On-Device Military AI (온 디바이스 국방 AI를 위한 PEFT 효용성 연구)

  • Gi-Min Bae;Hak-Jin Lee;Sei-Ok Kim;Jang-Hyong Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.51-54
    • /
    • 2024
  • 본 논문에서는 온 디바이스 국방 AI를 위한 효율적인 학습 방법을 제안한다. 제안하는 방법은 모델 전체를 재학습하는 대신 필요한 부분만 세밀하게 조정하여 계산 비용과 시간을 대폭 줄이는 PEFT 기법의 LoRa를 적용하였다. LoRa는 기존의 신경망 가중치를 직접 수정하지 않고 추가적인 낮은 랭크의 매트릭스를 학습하는 방식으로 기존 모델의 구조를 크게 변경하지 않으면서도, 효율적으로 새로운 작업에 적응할 수 있다. 또한 학습 파라미터 및 연산 입출력에 데이터에 대하여 32비트의 부동소수점(FP32) 대신 부동소수점(FP16, FP8) 또는 정수형(INT8)을 활용하는 경량화 기법인 양자화도 적용하였다. 적용 결과 학습시 요구되는 GPU의 사용량이 32GB에서 5.7GB로 82.19% 감소함을 확인하였다. 동일한 조건에서 동일한 데이터로 모델의 성능을 평가한 결과 동일 학습 횟수에선 LoRa와 양자화가 적용된 모델의 오류가 기본 모델보다 53.34% 증가함을 확인하였다. 모델 성능의 감소를 줄이기 위해서는 학습 횟수를 더 증가시킨 결과 오류 증가율이 29.29%로 동일 학습 횟수보다 더 줄어듬을 확인하였다.

  • PDF

How Through-Process Optimization (TPO) Assists to Meet Product Quality

  • Klaus Jax;Yuyou Zhai;Wolfgang Oberaigner
    • Corrosion Science and Technology
    • /
    • v.23 no.2
    • /
    • pp.131-138
    • /
    • 2024
  • This paper introduces Primetals Technologies' Through-Process Optimization (TPO) Services and Through-Process Quality Control (TPQC) System, which integrate domain knowledge, software, and automation expertise to assist steel producers in achieving operational excellence. TPQC collects high-resolution process and product data from the entire production route, providing visualizations and facilitating quality assurance. It also enables the application of artificial intelligence techniques to optimize processes, accelerate steel grade development, and enhance product quality. The main objective of TPO is to grow and digitize operational know-how, increase profitability, and better meet customer needs. The paper describes the contribution of these systems to achieving operational excellence, with a focus on quality assurance. Transparent and traceable production data is used for manual and automatic quality evaluation, resulting in product quality status and guiding the product disposition process. Deviation management is supported by rule-based and AI-based assistants, along with monitoring, alarming, and reporting functions ensuring early recognition of deviations. Embedded root cause proposals and their corrective and compensatory actions facilitate decision support to maintain product quality. Quality indicators and predictive quality models further enhance the efficiency of the quality assurance process. Utilizing the quality assurance software package, TPQC acts as a "one-truth" platform for product quality key players.

Rock Joint Trace Detection Using Image Processing Technique (영상 처리를 이용한 암석 절리 궤적의 추적)

  • 이효석;김재동;김동현
    • Tunnel and Underground Space
    • /
    • v.13 no.5
    • /
    • pp.373-388
    • /
    • 2003
  • The investigation on the rock discontinuity geometry has been usually undergone by direct measurement on the rock exposures. But this sort of field work has disadvantages, which we, for example, restriction of surveying areas and consuming excessive times and labors. To cover these kinds of disadvantages, image processing could be regarded as an altemative way, with additional advantages such as automatic and objective tools when used under adequate computerized algorithm. This study was focused on the recognition of the rock discontinuities captured in the image of rock exposure by digital camera and the production of the discontinuity map automatically. The whole process was written using macro commands builtin image analyzer, ImagePro Plus. ver 4.1(Media Cybernetic). The procedure of image processing developed in this research could be divided with three steps, which are enhancement, recognition and extraction of discontinuity traces from the digital image. Enhancement contains combining and applying several filters to remove and relieve various types of noises from the image of rock surface. For the next step, recognition of discontinuity traces was executed. It used local topographic features characterized by the differences of gray scales between discontinuity and rock. Such segments of discontinuity traces extracted from the image were reformulated using an algorithm of computer decision-making criteria and linked to form complete discontinuity traces. To verify the image processing algorithms and their sequences developed in this research, discontinuity traces digitally photographed on the rock slope were analyzed. The result showed about 75~80% of discontinuities could be detected. It is thought to be necessary that the algorithms and computer codes developed in this research need to be advanced further especially in combining digital filters to produce images to be more acceptable for extraction of discontinuity traces and setting seed pixels automatically when linking trace segments to make a complete discontinuity trace.

Application of Computer-Aided Diagnosis for the Differential Diagnosis of Fatty Liver in Computed Tomography Image (전산화단층촬영 영상에서 지방간의 감별진단을 위한 컴퓨터보조진단의 응용)

  • Park, Hyong-Hu;Lee, Jin-Soo
    • Journal of the Korean Society of Radiology
    • /
    • v.10 no.6
    • /
    • pp.443-450
    • /
    • 2016
  • In this study, we are using a computer tomography image of the abdomen, as an experimental linear research for the image of the fatty liver patients texture features analysis and computer-aided diagnosis system of implementation using the ROC curve analysis, from the computer tomography image. We tried to provide an objective and reliable diagnostic information of fatty liver to the doctor. Experiments are usually a fatty liver, via the wavelet transform of the abdominal computed tomography images are configured with the experimental image section, shows the results of statistical analysis on six parameters indicating a feature value of the texture. As a result, the entropy, average luminance, strain rate is shown a relatively high recognition rate of 90% or more, the control also, flatness, uniformity showed relatively low recognition rate of about 70%. ROC curve analysis of six parameters are all shown to 0.900 (p = 0.0001) or more, showed meaningful results in the recognition of the disease. Also, to determine the cut-off value for the prediction of disease six parameters. These results are applicable from future abdominal computed tomography images as a preliminary diagnostic article of diseases automatic detection and eventual diagnosis.

Eye Region Detection Method in Rotated Face using Global Orientation Information (전역적인 에지 오리엔테이션 정보를 이용한 기울어진 얼굴 영상에서의 눈 영역 추출)

  • Jang, Chang-Hyuk;Park, An-Jin;Kurata Takeshi;Jain Anil K.;Park, Se-Hyun;Kim, Eun-Yi;Yang, Jong-Yeol;Jung, Kee-Chul
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.4
    • /
    • pp.82-92
    • /
    • 2006
  • In the field of image recognition, research on face recognition has recently attracted a lot of attention. The most important step in face recognition is automatic eye detection researched as a prerequisite stage. Existing eye detection methods for focusing on the frontal face can be mainly classified into two categories: active infrared(IR)-based approaches and image-based approaches. This paper proposes an eye region detection method in non-frontal faces. The proposed method is based on the edge--based method that shows the fastest computation time. To extract eye region in non-frontal faces, the method uses edge orientationhistogram of the global region of faces. The problem caused by some noise and unfavorable ambient light is solved by using proportion of width and height for local information and relationship between components for global information in approximately extracted region. In experimental results, the proposed method improved precision rates, as solving 3 problems caused by edge information and achieves a detection accuracy of 83.5% and a computational time of 0.5sec per face image using 300 face images provided by The Weizmann Institute of Science.

  • PDF

Real-Time Human Tracker Based on Location and Motion Recognition of User for Smart Home (스마트 홈을 위한 사용자 위치와 모션 인식 기반의 실시간 휴먼 트랙커)

  • Choi, Jong-Hwa;Park, Se-Young;Shin, Dong-Kyoo;Shin, Dong-Il
    • The KIPS Transactions:PartA
    • /
    • v.16A no.3
    • /
    • pp.209-216
    • /
    • 2009
  • The ubiquitous smart home is the home of the future that takes advantage of context information from the human and the home environment and provides an automatic home service for the human. Human location and motion are the most important contexts in the ubiquitous smart home. We present a real-time human tracker that predicts human location and motion for the ubiquitous smart home. We used four network cameras for real-time human tracking. This paper explains the real-time human tracker's architecture, and presents an algorithm with the details of two functions (prediction of human location and motion) in the real-time human tracker. The human location uses three kinds of background images (IMAGE1: empty room image, IMAGE2: image with furniture and home appliances in the home, IMAGE3: image with IMAGE2 and the human). The real-time human tracker decides whether the human is included with which furniture (or home appliance) through an analysis of three images, and predicts human motion using a support vector machine. A performance experiment of the human's location, which uses three images, took an average of 0.037 seconds. The SVM's feature of human's motion recognition is decided from pixel number by array line of the moving object. We evaluated each motion 1000 times. The average accuracy of all the motions was found to be 86.5%.

Automatic Recognition and Normalization System of Korean Time Expression using the individual time units (시간의 단위별 처리를 이용한 자동화된 한국어 시간 표현 인식 및 정규화 시스템)

  • Seon, Choong-Nyoung;Kang, Sang-Woo;Seo, Jung-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.447-458
    • /
    • 2010
  • Time expressions are a very important form of information in different types of data. Thus, the recognition of a time expression is an important factor in the field of information extraction. However, most previously designed systems consider only a specific domain, because time expressions do not have a regular form and frequently include different ellipsis phenomena. We present a two-level recognition method consisting of extraction and transformation phases to achieve generality and portability. In the extraction phase, time expressions are extracted by atomic time units for extensibility. Then, in the transformation phase, omitted information is restored using basis time and prior knowledge. Finally, every complete atomic time unit is transformed into a normalized form. The proposed system can be used as a general-purpose system, because it has a language- and domain-independent architecture. In addition, this system performs robustly in noisy data like SMS data, which include various errors. For SMS data, the accuracies of time-expression extraction and time-expression normalization by using the proposed system are 93.8% and 93.2%, respectively. On the basis of these experimental results, we conclude that the proposed system shows high performance in noisy data.

  • PDF

Container Image Recognition using Fuzzy-based Noise Removal Method and ART2-based Self-Organizing Supervised Learning Algorithm (퍼지 기반 잡음 제거 방법과 ART2 기반 자가 생성 지도 학습 알고리즘을 이용한 컨테이너 인식 시스템)

  • Kim, Kwang-Baek;Heo, Gyeong-Yong;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1380-1386
    • /
    • 2007
  • This paper proposed an automatic recognition system of shipping container identifiers using fuzzy-based noise removal method and ART2-based self-organizing supervised learning algorithm. Generally, identifiers of a shipping container have a feature that the color of characters is blacker white. Considering such a feature, in a container image, all areas excepting areas with black or white colors are regarded as noises, and areas of identifiers and noises are discriminated by using a fuzzy-based noise detection method. Areas of identifiers are extracted by applying the edge detection by Sobel masking operation and the vertical and horizontal block extraction in turn to the noise-removed image. Extracted areas are binarized by using the iteration binarization algorithm, and individual identifiers are extracted by applying 8-directional contour tacking method. This paper proposed an ART2-based self-organizing supervised learning algorithm for the identifier recognition, which improves the performance of learning by applying generalized delta learning and Delta-bar-Delta algorithm. Experiments using real images of shipping containers showed that the proposed identifier extraction method and the ART2-based self-organizing supervised learning algorithm are more improved compared with the methods previously proposed.