• 제목/요약/키워드: audio application

검색결과 252건 처리시간 0.021초

적응적 QoS를 지원하는 인터넷 화상전화의 구현 (Implementation of Internet Video Phone Supporting Adaptive QoS)

  • 최태욱;김영주;정기동
    • 정보처리학회논문지C
    • /
    • 제10C권4호
    • /
    • pp.479-484
    • /
    • 2003
  • 현재의 인터넷 전화는 가변적인 대역폭과 패킷손실, 지연으로 인하여 일반 전화만큼의 통화품질을 제공하지 못하고 있다. 더욱이 비디오를 포함하는 화상전화에서 QoS를 보장하기는 더욱 어렵다. 본 논문에서는 가변적인 네트웍 변화에 적응적으로 대처할 수 있는 응용 수준의 QoS 제어 기법들을 고찰하고 화상전화에 적합한 오류제어 기법과 혼잡제어 기법을 설명한다. 또한, 이러한 기법들을 기반으로 적응적 QoS를 지원하는 인터넷 화상전화를 설계 및 구현한다. 실험 결과, 구현된 화상전화는 음성과 화상 데이터의 패킷 손실을 크게 줄일 수 있었고 경쟁하는 다른 TCP 플로우들을 고려해서 전송률을 조절하였다.

32 비트 곱셈기를 사용한 골드스미트 배정도실수 역수 계산기 (Goldschmidt's Double Precision Floating Point Reciprocal Computation using 32 bit multiplier)

  • 조경연
    • 한국산학기술학회논문지
    • /
    • 제15권5호
    • /
    • pp.3093-3099
    • /
    • 2014
  • 최근 그래픽 프로세서, 멀티미디어 프로세서, 음성처리 프로세서 등에서 부동소수점이 주로 사용된다. 한편 C, Java 등 고급언어에서는 단정도실수와 배정도실수를 사용하고 있다. 본 논문에서는 32비트 곱셈기를 사용하여 배정도실수의 역수를 계산하는 알고리즘을 제안한다. 배정도실수 가수를 상위 부분과 하위 부분으로 나누고, 상위 부분의 역수를 골드스미스 알고리즘으로 계산하고, 이를 초기값으로 하여 배정도실수의 역수를 계산하는 알고리즘을 제안한다. 제안한 알고리즘은 입력값에 따라서 곱셈 횟수가 다르므로, 평균 곱셈 횟수를 계산하는 방식을 유도하고, 여러 크기의 근사 역수 테이블에서 평균곱셈 횟수를 계산한다.

Connection Management Scheme using Mobile Agent System

  • Lim, Hee-Kyoung;Bae, Sang-Hyun;Lee, Kwang-Ok
    • 통합자연과학논문집
    • /
    • 제11권4호
    • /
    • pp.192-196
    • /
    • 2018
  • The mobile agent paradigm can be exploited in a variety of ways, ranging from low-level system administration tasks to middle ware to user-level applications. Mobile agents can be useful in building middle-ware services such as active mail systems, distributed collaboration systems, etc. An active mail message is a program that interacts with its recipient using a multimedia interface, and adapts the interaction session based on the recipient's responses. The mobile agent paradigm is well suitable to this type of application, since it can carry a sender-defined session protocol along with the multimedia message. Mobile agent communication is possible via method invocation on virtual references. Agents can make synchronous, one-way, or future-reply type invocations. Multicasting is possible, since agents can be aggregated hierarchically into groups. A simple check-pointing facility has also been implemented. Another proposed solution is to use multi agent computer systems to access, filter, evaluate, and integrate this information. We will present the overall architectural framework, our agent design commitments, and agent architecture to enable the above characteristics. Besides, the each information needed a mobile agent system such as text, graphic, image, audio and video etc, constructed a great capacity multimedia database system. However, they have problems in establishing connections over multiple subnetworks, such as no end-to-end connections, transmission delay due to ATM address resolution, no QoS protocols. We propose a new connection management scheme in the thesis to improve the connection management involved of mobile agent systems.

A completely non-contact recognition system for bridge unit influence line using portable cameras and computer vision

  • Dong, Chuan-Zhi;Bas, Selcuk;Catbas, F. Necati
    • Smart Structures and Systems
    • /
    • 제24권5호
    • /
    • pp.617-630
    • /
    • 2019
  • Currently most of the vision-based structural identification research focus either on structural input (vehicle location) estimation or on structural output (structural displacement and strain responses) estimation. The structural condition assessment at global level just with the vision-based structural output cannot give a normalized response irrespective of the type and/or load configurations of the vehicles. Combining the vision-based structural input and the structural output from non-contact sensors overcomes the disadvantage given above, while reducing cost, time, labor force including cable wiring work. In conventional traffic monitoring, sometimes traffic closure is essential for bridge structures, which may cause other severe problems such as traffic jams and accidents. In this study, a completely non-contact structural identification system is proposed, and the system mainly targets the identification of bridge unit influence line (UIL) under operational traffic. Both the structural input (vehicle location information) and output (displacement responses) are obtained by only using cameras and computer vision techniques. Multiple cameras are synchronized by audio signal pattern recognition. The proposed system is verified with a laboratory experiment on a scaled bridge model under a small moving truck load and a field application on a footbridge on campus under a moving golf cart load. The UILs are successfully identified in both bridge cases. The pedestrian loads are also estimated with the extracted UIL and the predicted weights of pedestrians are observed to be in acceptable ranges.

Urdu News Classification using Application of Machine Learning Algorithms on News Headline

  • Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권2호
    • /
    • pp.229-237
    • /
    • 2021
  • Our modern 'information-hungry' age demands delivery of information at unprecedented fast rates. Timely delivery of noteworthy information about recent events can help people from different segments of life in number of ways. As world has become global village, the flow of news in terms of volume and speed demands involvement of machines to help humans to handle the enormous data. News are presented to public in forms of video, audio, image and text. News text available on internet is a source of knowledge for billions of internet users. Urdu language is spoken and understood by millions of people from Indian subcontinent. Availability of online Urdu news enable this branch of humanity to improve their understandings of the world and make their decisions. This paper uses available online Urdu news data to train machines to automatically categorize provided news. Various machine learning algorithms were used on news headline for training purpose and the results demonstrate that Bernoulli Naïve Bayes (Bernoulli NB) and Multinomial Naïve Bayes (Multinomial NB) algorithm outperformed other algorithms in terms of all performance parameters. The maximum level of accuracy achieved for the dataset was 94.278% by multinomial NB classifier followed by Bernoulli NB classifier with accuracy of 94.274% when Urdu stop words were removed from dataset. The results suggest that short text of headlines of news can be used as an input for text categorization process.

Development and Distribution of Deep Fake e-Learning Contents Videos Using Open-Source Tools

  • HO, Won;WOO, Ho-Sung;LEE, Dae-Hyun;KIM, Yong
    • 유통과학연구
    • /
    • 제20권11호
    • /
    • pp.121-129
    • /
    • 2022
  • Purpose: Artificial intelligence is widely used, particularly in the popular neural network theory called Deep learning. The improvement of computing speed and capability expedited the progress of Deep learning applications. The application of Deep learning in education has various effects and possibilities in creating and managing educational content and services that can replace human cognitive activity. Among Deep learning, Deep fake technology is used to combine and synchronize human faces with voices. This paper will show how to develop e-Learning content videos using those technologies and open-source tools. Research design, data, and methodology: This paper proposes 4 step development process, which is presented step by step on the Google Collab environment with source codes. This technology can produce various video styles. The advantage of this technology is that the characters of the video can be extended to any historical figures, celebrities, or even movie heroes producing immersive videos. Results: Prototypes for each case are also designed, developed, presented, and shared on YouTube for each specific case development. Conclusions: The method and process of creating e-learning video contents from the image, video, and audio files using Deep fake open-source technology was successfully implemented.

A Research of User Experience on Multi-Modal Interactive Digital Art

  • Qianqian Jiang;Jeanhun Chung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권1호
    • /
    • pp.80-85
    • /
    • 2024
  • The concept of single-modal digital art originated in the 20th century and has evolved through three key stages. Over time, digital art has transformed into multi-modal interaction, representing a new era in art forms. Based on multi-modal theory, this paper aims to explore the characteristics of interactive digital art in innovative art forms and its impact on user experience. Through an analysis of practical application of multi-modal interactive digital art, this study summarises the impact of creative models of digital art on the physical and mental aspects of user experience. In creating audio-visual-based art, multi-modal digital art should seamlessly incorporate sensory elements and leverage computer image processing technology. Focusing on user perception, emotional expression, and cultural communication, it strives to establish an immersive environment with user experience at its core. Future research, particularly with emerging technologies like Artificial Intelligence(AR) and Virtual Reality(VR), should not merely prioritize technology but aim for meaningful interaction. Through multi-modal interaction, digital art is poised to continually innovate, offering new possibilities and expanding the realm of interactive digital art.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

Spoken-to-written text conversion for enhancement of Korean-English readability and machine translation

  • HyunJung Choi;Muyeol Choi;Seonhui Kim;Yohan Lim;Minkyu Lee;Seung Yun;Donghyun Kim;Sang Hun Kim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.127-136
    • /
    • 2024
  • The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

일회성 작업 처리를 위한 통함 스마트폰 앱 (A Universal Smart-phone APP for Processing One-shot Tasks)

  • 차신;소선섭;정진만;윤영선;은성배
    • 한국멀티미디어학회논문지
    • /
    • 제20권3호
    • /
    • pp.562-570
    • /
    • 2017
  • One shot tasks like a MERSC handling policy, a cinema poster, and so on are too small, diverse, and sporadic to make them as apps or web applications. They are usually shared as the form of notes attached in the field or messages in smart phones. In order to support inter-operability with internet web sites, QR/NFC tags are attached to them. What matters in the web technology is that HTML5 standard does not supply the accessability of smart phones' resources like a camera, an audio, magnetic sensors, and etc. In this paper, we propose a universal smart phone application for handling various one-shot tasks in the same UI/UX. One-shot tasks are described with HTML5 web documents, and the URL for the web documents are stored in QR/NFC tags. A smart phone scans a tag, and then the web document is retrieved and presented finally. QR tags can be delivered to other smart phones through messages or SNS. We solve the problem of HTML5 standard supplying a resource access library with javascrippts. We suggested the whole architecture and the internal structure of QR/NFC tags. We show that our scheme is applicable to make variable one-shot tasks.