• Title/Summary/Keyword: 비디오 생성

Search Result 590, Processing Time 0.026 seconds

An Efficient P2Proxy Caching Scheme for VOD Systems (VOD 시스템을 위한 효율적인 P2Proxy 캐싱 기법)

  • Kwon Chun-Ja;Choi Chi-Kyu;Lee Chi-Hun;Choi Hwang-Kyu
    • The KIPS Transactions:PartA
    • /
    • v.13A no.2 s.99
    • /
    • pp.111-122
    • /
    • 2006
  • As VOD service over the Internet becomes popular, a large sealable VOD system in P2P streaming environment has become increasing important. In this paper, we propose a new proxy caching scheme, called P2Proxy, to replace the traditional proxy with a sealable P2P proxy in P2P streaming environment. In the proposed scheme, each client in a group stores a different part of the stream from a server into its local buffer and then uses a group of clients as a proxy. Each client receives the request stream from other clients as long as the parts of the stream are available in the client group. The only missing parts of the stream which are not in the client group are directly received from the server. We represent the caching process between clients in a group and a server and then describe a group creation process. This paper proposes the directory structure to share the caching information among clients. By using the directory information, we minimize message exchange overload for a stream caching and playing. We also propose a recovery method for failures about the irregular behavior of P2P clients. In this paper, we evaluate the performance of our proposed scheme and compare the performance with the existing P2P streaming systems.

The Directions and Tasks for the Creation of Exhibition Contents Based on Oral Records: Focused on 'A Research Project of Producing Oral History Video Clips Displayed at the Exhibition of IMF Situations' of National Museum of Korean Contemporary History (구술 기록에 기반한 박물관 전시콘텐츠 생성의 방향과 과제 - 대한민국역사박물관의 '전시 맞춤형 구술영상 제작 연구'를 중심으로 -)

  • Cho, Sungsil
    • Korean Association of Arts Management
    • /
    • no.56
    • /
    • pp.305-327
    • /
    • 2020
  • This study started with the question of whether the museum oral history recording projects, which have been increasing steadily in recent years, are being used in various forms, especially in exhibitions. This paper is emphasized on the need for the oral history-related projects to lead to various museum activities including exhibitions and educations and so on. As a practical example of this, to explore the future directions and tasks for oral history projects in museums 'A Research Project of Producing Oral History Video Clips for the Exhibition of IMF Financial Crisis Situations' of National Museum of Korean Contemporary History is analyzed. This research argues that oral history functions as an exhibition representation device that more actively reveal the reality of a specific historical event. Therefore, this study suggests that the museum can be developed as a venue for various discourses in which citizens participate actively using oral history.

Detection of video editing points using facial keypoints (얼굴 특징점을 활용한 영상 편집점 탐지)

  • Joshep Na;Jinho Kim;Jonghyuk Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Performance of Uncompressed Audio Distribution System over Ethernet with a L1/L2 Hybrid Switching Scheme (L1/L2 혼합형 중계 방법을 적용한 이더넷 기반 비압축 오디오 분배 시스템의 성능 분석)

  • Nam, Wie-Jung;Yoon, Chong-Ho;Park, Pu-Sik;Jo, Nam-Hong
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.12
    • /
    • pp.108-116
    • /
    • 2009
  • In this paper, we propose a Ethernet based audio distribution system with a new L1/L2 hybrid switching scheme, and evaluate its performance. The proposed scheme not only offers guaranteed low latency and jitter characteristics that are essentially required for the distribution of high-quality uncompressed audio traffic, and but also provide an efficient transmission of data traffic on the Ethernet environment. The audio distribution system with a proposed scheme consists of a master node and a number of relay nodes, and all nodes are mutually connected as a daisy-chain topology through up and downlinks. The master node generates an audio frame for each cycle of 125us, and the audio frame has 24 time slotted audio channels for carrying stereo 24 channels of 16-bit PCM sampled audio. On receiving the audio frame from its upstream node via the downlink, each intermediate node inserts its audio traffic to the reserved time slot for itself, then relays again to next node through its physical layer(L1) transmission - repeating. After reaching the end node, the audio frame is loopbacked through the uplink. On repeating through the uplink, each node makes a copy of audio slot that node has to receive, then play the audio. When the audio transmission is completed, each node works as a normal L2 switch, thus data frames are switched during the remaining period. For supporting this L1/L2 hybrid switching capability, we insert a glue logic for parsing and multiplexing audio and data frames at MII(Media Independent Interlace) between the physical and data link layers. The proposed scheme can provide a good delay performance and transmission efficiency than legacy Ethernet based audio distribution systems. For verifying the feasibility of the proposed L1/L2 hybrid switching scheme, we use OMNeT++ as a simulation tool with various parameters. From the simulation results, one can find that the proposed scheme can provides outstanding characteristics in terms of both jitter characteristic for audio traffic and transmission efficiency of data traffics.

A Mobile Landmarks Guide : Outdoor Augmented Reality based on LOD and Contextual Device (모바일 랜드마크 가이드 : LOD와 문맥적 장치 기반의 실외 증강현실)

  • Zhao, Bi-Cheng;Rosli, Ahmad Nurzid;Jang, Chol-Hee;Lee, Kee-Sung;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.1-21
    • /
    • 2012
  • In recent years, mobile phone has experienced an extremely fast evolution. It is equipped with high-quality color displays, high resolution cameras, and real-time accelerated 3D graphics. In addition, some other features are includes GPS sensor and Digital Compass, etc. This evolution advent significantly helps the application developers to use the power of smart-phones, to create a rich environment that offers a wide range of services and exciting possibilities. To date mobile AR in outdoor research there are many popular location-based AR services, such Layar and Wikitude. These systems have big limitation the AR contents hardly overlaid on the real target. Another research is context-based AR services using image recognition and tracking. The AR contents are precisely overlaid on the real target. But the real-time performance is restricted by the retrieval time and hardly implement in large scale area. In our work, we exploit to combine advantages of location-based AR with context-based AR. The system can easily find out surrounding landmarks first and then do the recognition and tracking with them. The proposed system mainly consists of two major parts-landmark browsing module and annotation module. In landmark browsing module, user can view an augmented virtual information (information media), such as text, picture and video on their smart-phone viewfinder, when they pointing out their smart-phone to a certain building or landmark. For this, landmark recognition technique is applied in this work. SURF point-based features are used in the matching process due to their robustness. To ensure the image retrieval and matching processes is fast enough for real time tracking, we exploit the contextual device (GPS and digital compass) information. This is necessary to select the nearest and pointed orientation landmarks from the database. The queried image is only matched with this selected data. Therefore, the speed for matching will be significantly increased. Secondly is the annotation module. Instead of viewing only the augmented information media, user can create virtual annotation based on linked data. Having to know a full knowledge about the landmark, are not necessary required. They can simply look for the appropriate topic by searching it with a keyword in linked data. With this, it helps the system to find out target URI in order to generate correct AR contents. On the other hand, in order to recognize target landmarks, images of selected building or landmark are captured from different angle and distance. This procedure looks like a similar processing of building a connection between the real building and the virtual information existed in the Linked Open Data. In our experiments, search range in the database is reduced by clustering images into groups according to their coordinates. A Grid-base clustering method and user location information are used to restrict the retrieval range. Comparing the existed research using cluster and GPS information the retrieval time is around 70~80ms. Experiment results show our approach the retrieval time reduces to around 18~20ms in average. Therefore the totally processing time is reduced from 490~540ms to 438~480ms. The performance improvement will be more obvious when the database growing. It demonstrates the proposed system is efficient and robust in many cases.

Wavelet Transform-based Face Detection for Real-time Applications (실시간 응용을 위한 웨이블릿 변환 기반의 얼굴 검출)

  • 송해진;고병철;변혜란
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.9
    • /
    • pp.829-842
    • /
    • 2003
  • In this Paper, we propose the new face detection and tracking method based on template matching for real-time applications such as, teleconference, telecommunication, front stage of surveillance system using face recognition, and video-phone applications. Since the main purpose of paper is to track a face regardless of various environments, we use template-based face tracking method. To generate robust face templates, we apply wavelet transform to the average face image and extract three types of wavelet template from transformed low-resolution average face. However template matching is generally sensitive to the change of illumination conditions, we apply Min-max normalization with histogram equalization according to the variation of intensity. Tracking method is also applied to reduce the computation time and predict precise face candidate region. Finally, facial components are also detected and from the relative distance of two eyes, we estimate the size of facial ellipse.

Scientific Thinking Types and Processes Generated in Inductive Inquiry by College Students (대학생들의 귀납적 탐구에서 나타난 과학적 사고의 유형과 과정)

  • Kwon, Yong-Ju;Choi, Sang-Ju;Park, Yun-Bok;Jeong, Jin-Su
    • Journal of The Korean Association For Science Education
    • /
    • v.23 no.3
    • /
    • pp.286-298
    • /
    • 2003
  • The purpose of this study was to analyze scientific thinking types and processes generated in inductive inquiry by college students. Subjects were three college student. Three inductive tasks were developed: Caminalcules set I which is a task consisted of 6 imaginary animals, a potato task which is a task about the interaction between juiced potato and $H_2O_2$, and Caminalcules set 2. Subjects' thinking types and processes were investigated through thinking-aloud method and interview. Subjects' performances were recorded on videotapes and analyzed. Subjects have shown 5 types of inductive thinking in the first task; observing, discovering commonness, discovering pattern, classifying, discovering hierarchy. The processes of inductive thinking shown by students are followed; observing $\rightarrow$discovering commonness $\rightarrow$classifying $\rightarrow$discovering pattern $\rightarrow$discovering hierachy. The subtypes of inductive thinking on observing were investigated by the analysis of subjects' performance on the second task. In analysis of protocol, student' thinking types on observing have been classified as simple observing and operational observing. Operational observing has been categorized conjectural observing and predictive observing. The subtypes of inductive thinking on classification and hierarchy were investigated by the analysis of subjects' performance on the third task. In analysis of protocol, students' thinking types on classification have been searching criteria for classifying and selecting criteria for classifying. Subtypes of discovering hierarchy have been classifying groups and hierarchical ordering by students. Processes of classifying groups proceeded from searching criteria for classifying to selecting criteria for classifying.

A Fast 4X4 Intra Prediction Method using Motion Vector Information and Statistical Mode Correlation between 16X16 and 4X4 Intra Prediction In H.264|MPEG-4 AVC (H.264|MPEG-4 AVC 비디오 부호화에서 움직임 벡터 정보와 16~16 및 4X4 화면 내 예측 최종 모드간 통계적 연관성을 이용한 화면 간 프레임에서의 4X4 화면 내 예측 고속화 방법)

  • Na, Tae-Young;Jung, Yun-Sik;Kim, Mun-Churl;Hahm, Sang-Jin;Park, Chang-Seob;Park, Keun-Soo
    • Journal of Broadcast Engineering
    • /
    • v.13 no.2
    • /
    • pp.200-213
    • /
    • 2008
  • H.264| MPEG-4 AVC is a new video codingstandard defined by JVT (Joint Video Team) which consists of ITU-T and ISO/IEC. Many techniques are adopted fur the compression efficiency: Especially, an intra prediction in an inter frame is one example but it leads to excessive amount of encoding time due to the decision of a candidate mode and a RDcost calculation. For this reason, a fast determination of the best intra prediction mode is the main issue for saving the encoding time. In this paper, by using the result of statistical relation between intra $16{\times}16$ and $4{\times}4$ intra predictions, the number of candidate modes for $4{\times}4$ intra prediction is reduced. Firstly, utilizing motion vector obtained after inter prediction, prediction of a block mode for each macroblock is made. If an intra prediction is needed, the correlation table between $16{\times}16$ and $4{\times}4$ intra predicted modes is created using the probability during each I frame-coding process. Secondly, using this result, the candidate modes for a $4{\times}4$ intra prediction that reaches a predefined specific probability value are only considered in the same GOP For the experiments, JM11.0, the reference software of H.264|MPEG-4 AVC is used and the experimental results show that the encoding time could be reduced by 51.24% in maximum with negligible amounts of PSNR drop and bitrate increase.

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.