Proceedings of the Korean Society of Broadcast Engineers Conference (한국방송∙미디어공학회:학술대회논문집)
The Korean Institute of Broadcast and Media Engineers
- Semi Annual
Domain
- Media/Communication/Library&Information > Media/Consumers
2009.01a
-
Due to the different camera properties of the multi-view camera system, the color properties of captured images can be inconsistent. This inconsistency makes post-processing such as depth estimation, view synthesis and compression difficult. In this paper, the method to correct the different color properties of multi-view images is proposed. We utilize a gray gradient bar on a display device to extract the color sensitivity property of the camera and calculate a look-up table based on the sensitivity property. The colors in the target image are converted by mapping technique referring to the look-up table. Proposed algorithm shows the good subjective results and reduces the mean absolute error among the color values of multi-view images by 72% on average in experimental results.
-
In JPEG2000, the Cohen-Daubechies-Feauveau (CDF) 9/7-tap wavelet filter adopted in lossy compression is implemented by the lifting scheme or by the convolution scheme while the LeGall 5/3-tap wavelet filter adopted in lossless compression is implemented just by the lifting scheme. However, these filters are not optimal in terms of Peak Signal-to-Noise Ratio (PSNR) values, and irrational coefficients of wavelet filters are complicated. In this paper, we proposed a method to optimize image quality based on wavelet filter design and on wavelet decomposition. First, we propose a design of wavelet filters by selecting the most appropriate rational coefficients of wavelet filters. These filters are shown to have better performance than previous wavelet ones. Then, we choose the most appropriate wavelet decomposition to get the optimal PSNR values of images.
-
Multi-view video broadcasting have been conducted with a variety of different system architectures. However, the previous systems have focused on wired broadcasting. On the contrary, our system is designed to be applied to mobile 3DTV broadcasting. We present a service framework adaptable to mobile clients where the load on the client is reduced as much as possible. MPEG-21 multiview DIA description schemes are presented for the efficient adaptation between the serer and client. Furthermore, we examine the feasibility of the proposed framework by analyzing the frame rate that a client device can process.
-
In this paper, we discuss two fundamental issues of hybrid scene representation: constructing and rendering. A hybrid scene consists of triangular meshes and point-set models. Consider the maturity of modeling techniques of triangular meshes, we suggest that generate a point-set model from a triangular mesh might be an easier and more economical way. We improve stratified sampling by introducing the concept of priority. Our method has the flexibility that one may easily change the importance criteria by substituting priority functions. While many works were devoted to blend rendering results of point and triangle, our work tries to render point-set models and triangular meshes as individuals. We propose a novel way to eliminate depth occlusion artifacts and to texture a point-set model. Finally, we implement our rendering algorithm with the new features of the shader model 4.0 and turns out to be easily integrated with existing rendering techniques for triangular meshes.
-
A scalable video coding (SVC) extension to the H.264/AVC standard has been developed by the Joint Video Team (JVT). SVC provides spatial, temporal and quality scalability with high coding efficiency and low complexity. SVC is now developing the extension of the first version including color format scalability. The paper proposes to remove some luminance related header and luminance coefficients when an enhancement layer adds only additional color information to its lower layer. Experimental results shows 0.6 dB PSNR gain on average in coding efficiency compared with an approach using the existing SVC standard.
-
In this paper, a new approach for solid texture synthesis from input volume data is presented. In the pre-process, feature vectors and a similarity set were constructed for input volume data. The feature vectors were used to construct neighboring vectors for more accurate neighborhood matching. The similarity set which recorded 3 candidates for each voxel helped more effective neighborhood matching. In the synthesis process, the pyramid synthesis method was used to synthesize solid textures from coarse to fine level. The results of the proposed approach were satisfactory.
-
Recently, electronic mail(E-mail) is the most popular communication manner in our society. In such conventional environments, spam increasingly congested in Internet. In this paper, Chinese spam could be effectively detected using text and image features. Using text features, keywords and reference templates in Chinese mails are automatically selected using genetic algorithm(GA). In addition, spam containing a promotion image is also filtered out by detecting the text characters in images. Some experimental results are given to show the effectiveness of our proposed method.
-
The Chinese Ink Painting is an art with long history in Chinese culture. Painters can obtain various kinds of scenery by mixing water and ink properly. These papers provides a colorization technique that can transfer gray scale paintings to color paintings. Various colorization techniques for photorealistic images have good results. But these techniques are uncertainly suitable for Chinese Ink Painting. In our method, users only provide a gray scale Chinese Ink Painting and a similar color Chinese Ink Painting subjectively, system can automatically transfer the color from color painting to gray scale painting. We also provide a method for users to refine the automatically generated result.
-
This paper develops a methodology for resizing image resolutions in an arbitrary block transform domain. To accomplish this, we represent the procedures resizing images in an arbitrary transform domain in the form of matrix multiplications from which the matrix scaling the image resolutions is produce. The experiments showed that the proposed method produces the reliable performances without increasing the computational complexity, compared to conventional methods when applied to various transforms.
-
Synchronization between media is an important aspect in the design of multimedia streaming system. This paper proposes a precise media synchronization mechanism for digital video and audio transport over IP networks. To support synchronization between video and audio bitstreams transported over IP networks, RTP/RTCP protocol suite is usually employed. To provide a precise mechanism for media synchronization between video and audio, we suggest an efficient media synchronization algorithm based on NPT (Normal Play Time) which can be derivable from the timestamp information in the header part of RTP packet generated for the transport of video and audio streams. With the proposed method, we do not need to send and process any RTCP SR (sender report) packet which is required for conventional media synchronization scheme, and accordingly could reduce the number of required UDP ports and the amount of control traffic injected into the network.
-
Automatic segmentation of brain MRI data usually leaves some segmentation errors behind that are to be subsequently removed interactively, using computer graphics tools. This interactive removal is normally performed by operating on individual 2D slices. It is very tedious and still leaves some segmentation errors which are not visible on the slices. We have proposed to perform a novel 3D interactive correction of brain segmentation errors introduced by the fully automatic segmentation algorithms. We have developed the tool which is based on 3D semi-automatic propagation algorithm. The paper describes the implementation principles of the proposed tool and illustrates its application.
-
In general, as a dynamic range of digital still camera is narrower than a real scene‘s, it is hard to represent the shadow region of scene. Thus, multi-scaled retinex algorithm is used to improve detail and local contrast of the shadow region in an image by dividing the image by its local average images through Gaussian filtering. However, if the chromatic distribution of the original image is not uniform and dominated by a certain chromaticity, the chromaticity of the local average image depends on the dominant chromaticity of original image, thereby the colors of the resulting image are shifted to a complement color to the dominant chromaticity. In this paper, a modified multi-scaled retinex method to reduce the influence of the dominant chromaticity is proposed. In multi-scaled retinex process, the local average images obtained by Gaussian filtering are divided by the average chromaticity values of the original image in order to reduce the influence of dominant chromaticity. Next, the chromaticity of illuminant is estimated in highlight region and the local average images are corrected by the estimated chromaticity of illuminant. In experiment, results show that the proposed method improved the local contrast and detail without color distortion.
-
Automatic segmentation of foreground from background in video sequences has attracted lots of attention in computer vision. This paper proposes a novel framework for the background subtraction that the foreground is segmented from the background by directly subtracting a background image from each frame. Most previous works focus on the extraction of more reliable seeds with threshold, because the errors are occurred by noise, weak color difference and so on. Our method has good segmentations from the approximate seeds by using the Random Walks with Restart (RWR). Experimental results with live videos demonstrate the relevance and accuracy of our algorithm.
-
This paper proposes an algorithm for creating stereoscopic video from a monoscopic video. A viewer uses depth perception clues called a vanishing point which is the farthest from a viewer's viewpoint in order to perceive depth information from objects and surroundings thereof to the viewer. The viewer estimates the vanishing point with geometrical features in monoscopic images, and can perceive the depth information with the relationship between the position of the vanishing point and the viewer's viewpoint. In this paper, we propose a method to estimate a vanishing point with edge direction histogram in a general monoscopic image and to create a depth map depending on the position of the vanishing point. With the conversion method proposed through the experiment results, it is seen that stable stereoscopic conversion of a given monoscopic video is achieved.
-
In this paper, we present a real-time method to detect moving objects in a rotating and zooming camera. It is useful for camera surveillance of fixed but rotating camera, camera on moving car, and so on. We first compensate the global motion, and then exploit the displaced frame difference (DFD) to find the block-wise boundary. For robust detection, we propose a kind of image to combine the detections from consecutive frames. We use the block-wise detection to achieve the real-time speed, except the pixel-wise DFD. In addition, a fast block-matching algorithm is proposed to obtain local motions and then global affine motion. In the experimental results, we demonstrate that our proposed algorithm can handle the real-time detection of common object, small object, multiple objects, the objects in low-contrast environment, and the object in zooming camera.
-
This paper investigates the performance of a vehicular rear-view camera through quantifying the image quality based on several objective criteria from the ISO (International Organization for Standardization). In addition, various experimental environments are defined considering the conditions under which a rear-view camera may need to operate. The process for evaluating the performance of a rear-view camera is composed of five objective criteria: noise test, resolution test, OECF (opto-electronic conversion function) test, color characterization test, and pincushion and barrel distortion tests. The proposed image quality quantification method then expresses the results of each test as a single value, allowing easy evaluation. In experiments, the performance evaluation results are analyzed and compared with those for a regular digital camera.
-
This paper investigates the relationship between the compression ratio and the gamut area for a reconstructed image when using JPEG and JPEG2000. Eighteen color samples from the Macbeth ColorChecker are initially used to analyze the relationship between the compression ratio and the color bleeding phenomenon, i.e. the hue and chroma shifts in the a*b* color plane. In addition, twelve natural color images, divided into two groups depending on four color attributes, are also used to investigate the relationship between the compression ratio and the variation in the gamut area. For each image group, the gamut area for the reconstructed image shows an overall tendency to increase when increasing the compression ratio, similar to the experimental results with the Macbeth ColorChecker samples. However, with a high compression ratio, the gamut area decreases due to the mixture of adjacent colors, resulting in more grey.
-
In this paper, we propose a DRF-based object detection method using the object adaptive patch in the satellite imagery. It is a Discriminative Random Fields (DRF) based work, so the detection is done by labeling to the possible patches in the image. For the feature information of each patch, we use the multi-scale and object adaptive patch and its texton histogram, instead of using the single scale and fixed grid patch. So, we can include contextual layout of texture information around the object. To make object adaptive patch, we use "superpixel lattice" scheme. As a result, each group of labeled patches represents the object or object's presence region. In the experiment, we compare the detection result with a fixed grid scheme and shows our result is more close to the object shape.
-
Embedded systems are becoming more popular as many embedded platforms have become more affordable. It offers a compact solution for many different problems including computer vision applications. Texture classification can be used to solve various problems, and implementing it in embedded platforms will help in deploying these applications into the market. This paper proposes to deploy the texture classification algorithms onto the embedded computer vision (ECV) platform. Two algorithms are compared; grey level co-occurrence matrices (GLCM) and Gabor filters. Experimental results show that raw GLCM on MATLAB could achieves 50ms, being the fastest algorithm on the PC platform. Classification speed achieved on PC and ECV platform, in C, is 43ms and 3708ms respectively. Raw GLCM could achieve only 90.86% accuracy compared to the combination feature (GLCM and Gabor filters) at 91.06% accuracy. Overall, evaluating all results in terms of classification speed and accuracy, raw GLCM is more suitable to be implemented onto the ECV platform.
-
In this paper, we propose a new channel estimation scheme for amplify and forward cooperative diversity with relay selection. In order to select best relay, it is necessary to know channel state information (CSI) at the destination. Most of the previous works, however, assume that perfect CSI is available at the destination. In addition, when the number of relay is increased it is difficult to estimate CSI through all relays within coherence time of a channel because of the large amount of frame overhead for channel estimation. In a proposed channel estimation scheme, each terminal has distinct pilot signal which is orthogonal each other. By using orthogonal property of pilot signals, CSI is estimated over two pilot signal transmission phases so that frame overhead is reduced significantly. Due to the orthogonal property among pilot signals, estimation error does not depend on the number of relays. Simulation result shows that the proposed channel estimation scheme provides accurate CSI at the destination.
-
In this paper a hierarchical stereo matching algorithm based on feature extraction is proposed. The boundary (edge) as feature point in an image is first obtained by segmenting an image into red, green, blue and white regions. With the obtained boundary information, disparities are extracted by matching window on the image boundary, and the initial disparity map is generated when assigned the same disparity to neighbor pixels. The final disparity map is created with the initial disparity. The regions with the same initial disparity are classified into the regions with the same color and we search the disparity again in each region with the same color by changing block size and search range. The experiment results are evaluated on the Middlebury data set and it show that the proposed algorithm performed better than a phase based algorithm in the sense that only about 14% of the disparities for the entire image are inaccurate in the final disparity map. Furthermore, it was verified that the boundary of each region with the same disparity was clearly distinguished.
-
This paper presents how to minimize the second-order cone programming problem occurring in the 3D reconstruction of multiple views. The
$L_{\infty}$ -norm minimization is done by a series of the minimization of the maximum infeasibility. Since the problem has many inequality constraints, we have to adopt methods of the interior point algorithm, in which the inequalities are sequentially approximated by log-barrier functions. An initial feasible solution is found easily by the construction of the problem. Actual computing is done by an iterative Newton-style update. When we apply the interior point method to the problem of reconstructing the structure and motion, every Newton update requires to solve a very large system of linear equations. We show that the sparse bundle-adjustment technique can be utilized in the same way during the Newton update, and therefore we obtain a very efficient computation. -
Significant attention has recently been drawn to digital home photo albums that use face detection technology. The tendency can be found in home photo albums that people prefer to allocate concerned objects in the center of the image rather than the boundary when they take a picture. To improve detection performance and speed that are important factors of face detection task, this paper proposes a face detection method that takes spatial context information into consideration. Experiments were performed to verify the usefulness of the proposed method and results indicate that the proposed face detection method can efficiently reduce the false positive rate as well as the runtime of face detection.
-
A new video inpainting algorithm is proposed for removing unwanted objects or error of sources from video data. In the first step, the block bundle is defined by the motion information of the video data to keep the temporal consistency. Next, the block bundles are arranged in the 3-dimensional graph that is constructed by the spatial and temporal correlation. Finally, we pose the inpainting problem in the form of a discrete global optimization and minimize the objective function to find the best temporal bundles for the grid points. Extensive simulation results demonstrate that the proposed algorithm yields visually pleasing video inpainting results even in a dynamic scene.
-
In this paper, we propose packet selection and significance based interval allocation algorithm for real-time streaming service. In real-time streaming of inter-frame (and layer) coded video, minimizing packet loss does not imply maximizing QoS. It is true that packet loss adversely affects the QoS but one single packet can have more impact than several other packets. We exploit the fact that the significance of each packet loss is different from the frame type it belongs to and its position within GoP. Using packet dependency and PSNR degradation value imposed on the video from the corresponding packet loss, we find each packet's significance value. Based on the packet significance, the proposed algorithm determines which packets to send and when to send them. The proposed algorithm is tested using publicly available MPEG-4 video traces. Our scheduling algorithm brings significant improvement on user perceivable QoS. We foresee that the proposed algorithm manifests itself in last mile connection of the network where intervals between successive packets from the source and to the destination are well preserved.
-
In this paper, we proposed automatically video colorization method with partial color sources in first frame. The input color sources are propagated to other gray pixels with the high correlation between two pixels. To robust again the errors in portion of the weak boundary, we calculate correlation between two pixels using dual-path comparison. Video colorization method should maintain the color connectivity between frames. Accordingly, we define reliability of primarily color by compare the color of neighborhood frames. We perform the color correction by blending neighboring color when the reliability of primarily color is low. We formalize this premise with energy function, and find the color to minimize the energy function. In this way, using property of video, we reduce the error caused by propagation and get result of natural changes between frames. Through simulation results, we show the proposed method derive a natural result more than previous method.
-
In this paper, we proposed an interactive GUI (Graphical User Interface) system to model buildings with an editable script. Our system also provides probabilistic finite-state machine (PFSM) to define the relationships of sub-models with transformation matrices and transition probabilities for constructing new novel building models automatically. User can not only get various building models by PFSM but also adjust the probabilities of sub-models from PFSM to get desired building models. As shown in the results, the various and vivid building models can be constructed easily and quickly for non-expert users. Besides, user can also edit the script file which is provided by our system to modify the properties directly.
-
This paper proposed a framework for handover method in continuously tracking a person of interest across cooperative pan-tilt-zoom (PTZ) cameras. The algorithm here is based on a robust non-parametric technique for climbing density gradients to find the peak of probability distributions called the mean shift algorithm. Most tracking algorithms use only one cue (such as color). The color features are not always discriminative enough for target localization because illumination or viewpoints tend to change. Moreover the background may be of a color similar to that of the target. In our proposed system, the continuous person tracking across cooperative PTZ cameras by mean shift tracking that using color and shape histogram to be feature distributions. Color and shape distributions of interested person are used to register the target person across cameras. For the first camera, we select interested person for tracking using skin color, cloth color and boundary of body. To handover tracking process between two cameras, the second camera receives color and shape cues of a target person from the first camera and using linear color calibration to help with handover process. Our experimental results demonstrate color and shape feature in mean shift algorithm is capable for continuously and accurately track the target person across cameras.
-
The recent H.264/AVC video coding standard provides a higher coding efficiency than previous standards. H.264/AVC achieves a bit rate saving of more than 50 % with many new technologies, but it is computationally complex. Most of fast mode decision algorithms have focused on Baseline profile of H.264/AVC. In this paper, a fast block mode decision scheme for P- slices in High profile is proposed to reduce the computational complexity for H.264/AVC because the High profile is useful for broadcasting and storage applications. To reduce the block mode decision complexity in P- pictures of High profile, we use the SAD value after
$16{\times}16$ block motion estimation. This SAD value is used for the classification feature to divide all block modes into some proper candidate block modes. The proposed algorithm shows average speed-up factors of 47.42${\sim}$ 67.04% for IPPP sequences. -
Few methods have dealt with segmenting multiple images with analogous content. Concurrent images of a scene and gathered images of a similar foreground are examples of these images, which we term consistent scene images. In this paper, we present a method to segment these images based on manual segmentation of one image, by iteratively propagating information via multi-level cues with adaptive confidence. The cues are classified as low-, mid-, and high- levels based on whether they pertain to pixels, patches, and shapes. Propagated cues are used to compute potentials in an MRF framework, and segmentation is done by energy minimization. Through this process, the proposed method attempts to maximize the amount of extracted information and maximize the consistency of segmentation. We demonstrate the effectiveness of the proposed method on several sets of consistent scene images and provide a comparison with results based only on mid-level cues [1].
-
High dynamic range image can describe the real world scenes that have a wide range of luminance intensity. To display high dynamic range (HDR) image into conventional displayable devices such as monitors and printers, we proposed the logarithmic based global mapping algorithm that consider the features of image using mapping parameters. Based on characteristics of image, we first modify input luminance values for reproducing perceptually tuned images and then displayable output values are obtained directly. The experimental results show that the proposed algorithm achieves good subjective results while preserving details of image, furthermore proposed algorithm has fast simple and practical structure for implementation.
-
In recent years, there has been increased interest in characterizing and extracting 3D information from 2D images for human tracking and identification. In this paper, we propose a single view-based framework for robust estimation of height and position. In the proposed method, 2D features of target object is back-projected into the 3D scene space where its coordinate system is given by a rectangular marker. Then the position and the height are estimated in the 3D space. In addition, geometric error caused by inaccurate projective mapping is corrected by using geometric constraints provided by the marker. The accuracy and the robustness of our technique are verified on the experimental results of several real video sequences from outdoor environments.
-
As one of the most interesting scenes, landmarks constitute a large percentage of the vast amount of scene images available on the web. On the other hand, a specific "landmark" usually has some characteristics that distinguish it from surrounding scenes and other landmarks. These two observations make the task of accurately estimating geographic information from a landmark image necessary and feasible. In this paper, we propose a method to identify landmark location by means of landmark recognition in view of significant viewpoint, illumination and temporal variations. We use GPS-based clustering to form groups for different landmarks in the image dataset. The images in each group rather fully express the possible views of the corresponding landmark. We then use a combination of edge and color histogram to match query to database images. Initial experiments with Zubud database and our collected landmark images show that is feasible.
-
Park, Sung-Jae;Lee, Yeo-Song;Sohn, Chae-Bong;Jeong, S.Y.;Chung, Kwang-Sue;Park, Ho-Chong;Ahn, Chang-Bum;Oh, Seoung-Jun 170
In this paper, we propose a fast intra prediction mode selection method in Scalable Video Coding(SVC) which is an emerging video coding standard as an extension of H.264/Advanced Video Coding(H.264/AVC). The proposed method decides a candidate intra prediction mode based on the characteristic of macroblock smoothness. Statistical analysis is applied to computing that smoothness in spatial enhancement layer. We also propose an early termination scheme for Intra_BL mode decision where the RD cost value of Intra_BL is utilized. Compared with JSVM software, our scheme can reduce about 55% of the computation complexity of intra prediction on average, while the performance degradation is negligible; For low QP values, the average PSNR loss is very negligible, equivalently the bit rate increases by 0.01%. For high QP values, the average PSNR loss is less than 0.01dB, which equals to 0.25% increase in bitrate on average. -
In object tracking, the template matching methods have been developed and frequently used. It is fast enough, but not robust to an object with the variation of size and shape. In order to overcome the limitation of the template matching method, this paper proposes a template update technique. After finding an object position using the correlation-based adaptive predictive search, the proposed method selects blocks which contain object's boundary. It estimates the motion of boundary using block matching, and then updates template. We applied it to IR image sequences including an approaching object. From the experimental results, the proposed method showed successful performance to track object.
-
Online image dictionary has become more and more popular in concepts cognition. However, for existing online systems, only very few images are manually picked to demonstrate the concepts. Currently, there is very little research found on automatically choosing large scale online images with the help of semantic analysis. In this paper, we propose a novel framework to utilize community-generated online multimedia content to visually illustrate certain concepts. Our proposed framework adapts various techniques, including the correlation analysis, semantic and visual clustering to produce sets of high quality, precise, diverse and representative images to visually translate a given concept. To make the best use of our results, a user interface is deployed, which displays the representative images according the latent semantic coherence. The objective and subjective evaluations show the feasibility and effectiveness of our approach.
-
To interactively share High Definition (HD)-quality visualization over emerging ultra-high-speed network infrastructure, several lossless and low-delay real-time media (i.e., uncompressed HD video and audio) transport systems are being designed and prototyped. However, most of them still rely on expensive hardware components. As an effort to reduce the building cost of system, in this paper, we propose the integration of both transmitter and receiver machines into a single bi-directional transport system. After detailed bottleneck analysis and subsequent refinements of embedded software components, the proposed integration can provide Real-time Transport Protocol (RTP)-based bi-directional transport of uncompressed HD video and audio from a single machine. We also explain how to interface the Gbps-bandwidth display output of uncompressed HD media system to the networked tiled display of 10240
$\times$ 3200 super-high-resolution. Finally, to verify the feasibility of proposed integration, several prototype systems are built and evaluated by operating them in several different experiment scenarios. -
We present a novel head tracking system for stereoscopic displays that ensures the viewer has a high degree of movement. The tracker is capable of segmenting the viewer from background objects using their relative distance. A depth camera is used to generate a key signal for head tracking application. A method of the moving parallax barrier is also introduced to supplement a disadvantage of the fixed parallax barrier that provides observation at the specific locations.
-
Generally a TV broadcast video of ball sports is composed from those of multiple cameras strategically mounted around a stadium under the supervision of a master director. The director decides which camera the current view should be from and how the camera work should be. In this paper, such a decision rule is based on the 3D location of ball which is the result of multi-view tracking. While current TV sports broadcast are accompanied with professional cameramen and expensive equipments, our system requires few video cameras and no cameraman. The resulted videos were stable and informative enough to convey the flow of a match.
-
This paper proposes a novel adaptive algorithm for deinterlacing. In the proposed algorithm, the previously developed Enhanced ELA [6], Chen [9] and Li [10] algorithms were used as a basis. The fundamental mechanism was the selection and application of the appropriate algorithm according to the correlation with the previous and next field using temporal information. Extensive simulations were conducted for video sequences and showed good performance in terms of peak signal-to-ratio (PSNR) and subjective quality.
-
This paper represents an efficient algorithm to segment objects from the background using multiple images of distinct luminous intensities. The proposed algorithm obtains images with different luminous intensities using a camera flash. From the multiple intensities for a pixel, a saturated luminous intensity is estimated together with the slope of intensity rate. Then, we measure the sensitivities of pixels from their slopes. The sensitivities show different patterns according to the distances from the light source. Therefore, the proposed algorithm segments near objects using the sensitivity information by minimizing an energy function. Experimental results on various objects show that the proposed algorithm provides accurate results without any user interaction.
-
An efficient algorithm to compress high dynamic range (HDR) videos is proposed in this work. We separate an HDR video sequence into a tone-mapped low dynamic range (LDR) sequence and a ratio sequence. Then, we encode those two sequences using the standard H.264/AVC codec. During the encoding, we allocate a limited amount of bit budget to the LDR sequence and the ratio sequence adaptively to maximize the qualities of both the LDR and HDR sequences. While a conventional LDR decoder uses only the LDR stream, an HDR decoder can reconstruct the HDR video using the LDR stream and the ratio stream. Simulation results demonstrate that the proposed algorithm provides higher performance than the conventional methods.
-
As demands for high-definition television (HDTV) increase, the implementation of real-time decoding of high-definition (HD) video becomes an important issue. The data size for HD video is so large that real-time processing of the data is difficult to implement, especially with software. In order to implement a fast moving picture expert group-2 decoder for HDTV, we compose five scenarios that use parallel processing techniques such as data decomposition, task decomposition, and pipelining. Assuming the multi digital signal processor environments, we analyze each scenario in three aspects: decoding speed, L1 memory size, and bandwidth. By comparing the scenarios, we decide the most suitable cases for different situations. We simulate the scenarios in the dual-core and dual-central processing unit environment by using OpenMP and analyze the simulation results.
-
In this paper, a new approach is proposed for the segmentation of Computed Tomography (CT) head images. The approach consists of two-stage segmentation with each stage contains two different segmentation techniques. The ultimate aim is to segment the CT head images into three classes which are abnormalities, cerebrospinal fluid (CSF) and brain matter. For the first stage segmentation, k-means and fuzzy c-means (FCM) segmentation are implemented in order to acquire the abnormalities. Whereas for the second stage segmentation, modified FCM with population-diameter independent (PDI) and expectation-maximization (EM) segmentation are adopted to obtain the CSF and brain matter. The experimental results have demonstrated that the proposed system is feasible and achieve satisfactory results.
-
We propose an on-line machine learning approach for object recognition, where new images are continuously added and the recognition decision is made without delay. Random forest (RF) classifier has been extensively used as a generative model for classification and regression applications. We extend this technique for the task of building incremental component-based detector. First we employ object descriptor model based on bag of covariance matrices, to represent an object region then run our on-line RF learner to select object descriptors and to learn an object classifier. Experiments of the object recognition are provided to verify the effectiveness of the proposed approach. Results demonstrate that the propose model yields in object recognition performance comparable to the benchmark standard RF, AdaBoost, and SVM classifiers.
-
Shot change detection is an important technique for effective management of video data, so detection scheme requires adaptive detection techniques to be used actually in various video. In this paper, we propose an adaptive shot change detection algorithm using the mean of feature value on variable reference blocks. Our algorithm determines shot change detection by defining adaptive threshold values with the feature value extracted from video frames and comparing the feature value and the threshold value. We obtained better detection ratio than the conventional methods maximally by 15% in the experiment with the same test sequence. We also had good detection ratio for other several methods of feature extraction and could see real-time operation of shot change detection in the hardware platform with low performance was possible by implementing it in TVUS model of HOMECAST Company. Thus, our algorithm in the paper can be useful in PMP or other portable players.
-
This paper represents a new approach which addresses quality degradation of a synthesized view, when a virtual camera moves forward. Generally, interpolation technique using only two neighboring views is used when a virtual view is synthesized. Because a size of the object increases when the virtual camera moves forward, most methods solved this by interpolation in order to synthesize a virtual view. However, as it generates a degraded view such as blurred images, we prevent a synthesized view from being blurred by using more cameras in multiview camera configuration. That is, we solve this by applying super-resolution concept which reconstructs a high resolution image from several low resolution images. Therefore, data fusion is executed by geometric warping using a disparity of the multiple images followed by deblur operation. Experimental results show that the image quality can further be improved by reducing blur in comparison with interpolation method.
-
In order to design a good quantizer for the underlying distribution using a training sequence (TS), the traditional approach is seeking for the empirical minimum based on the empirical risk minimization principle. As the size of TS increases, we may obtain a good quantizer for the true distribution. However, if we have a relatively small TS, searching the empirical minimum for the TS causes the overfitting problem, which even worsens the performance of the trained codebook. In this paper, the performance of codebooks trained by small TSs is studied, and it is shown that a piecewise uniform codebook can be better than an empirically minimized codebook is.
-
This paper proposes an algorithm that increases the speed of generating a Fresnel hologram using a recursive addition operation covering the whole coordinate array of a digital hologram. The 3D object designed to calculate the digital hologram used the depth-map image produced by computer graphics (CG). The proposed algorithm is a technique that performs CGH (computer generated hologram) operation with only the recursive addition from the hologram's whole coordinates by analyzing the regularity between the 3D object and the digital hologram coordinates. The experimental results showed that the proposed algorithm increased operation speed by 30% over the technique using the conventional CGH equation.
-
Due to the additive white Gaussian noise (AWGN), images are often corrupted. In recent days, Bayesian estimation techniques to recover noisy images in the wavelet domain have been studied. The probability density function (PDF) of an image in wavelet domain can be described using highly-sharp head and long-tailed shapes. If a priori probability density function having the above properties would be applied well adaptively, better results could be obtained. There were some frequently proposed PDFs such as Gaussian, Laplace distributions, and so on. These functions model the wavelet coefficients satisfactorily and have its own of characteristics. In this paper, mixture distributions of Gaussian and Laplace distribution are proposed, which attempt to corporate these distributions' merits. Such mixture model will be used to remove the noise in images by adopting Maximum a Posteriori (MAP) estimation method. With respect to visual quality, numerical performance and computational complexity, the proposed technique gained better results.
-
This paper presents NR (No Reference) Quality assessment method for IPTV or mobile IPTV. Because No Reference quality assessment method does not access the original signal so it is suitable for the real-time streaming service. Our proposed method use decoding parameters, such as quantization parameter, motion vector, and packet loss as a major network parameter. To evaluate performance of the proposed algorithm, we carried out subjective test of video quality with the ITU-T P.910 ACR (Absolute Category Rating) method and obtained the mean opinion score (MOS) value for QVGA 180 video sequence coded by H.264/AVC encoder. Experimental results show the proposed quality metric has a high correlation (84%) to subjective quality.
-
In this paper, we propose a learning-based super-resolution algorithm. In the proposed algorithm, a multi-resolution wavelet approach is adopted to perform the synthesis of local high-frequency features. To obtain a high-resolution image, wavelet coefficients of two dominant LH- and HL-bands are estimated based on wavelet frames. In order to prepare more efficient training sets, the proposed algorithm utilizes the LH-band and transposed HL-band. The training sets are then used for the estimation of wavelet coefficients for both LH- and HL-bands. Using the estimated high frequency bands, a high resolution image is reconstructed via the wavelet transform. Experimental results demonstrate that the proposed scheme can synthesize high-quality images.
-
In this paper, we propose an effective memory reduction algorithm to reduce the amount of reference frame buffer and memory bandwidth in video encoder and decoder. In general video codecs, decoded previous frames should be stored and referred to reduce temporal redundancy. Recently, reference frames are recompressed for memory efficiency and bandwidth reduction between a main processor and external memory. However, these algorithms could hurt coding efficiency. Several algorithms have been proposed to reduce the amount of reference memory with minimum quality degradation. They still suffer from quality degradation with fixed-bit allocation. In this paper, we propose an adaptive block-based min-max quantization that considers local characteristics of image. In the proposed algorithm, basic process unit is
$8{\times}8$ for memory alignment and apply an adaptive quantization to each$4{\times}4$ block for minimizing quality degradation. We found that the proposed algorithm could improve approximately 37.5% in coding efficiency, compared with an existing memory reduction algorithm, at the same memory reduction rate. -
When video packets are transmitted over error-prone networks, the leaky prediction can be used to mitigate the effect of error propagation. The leaky factor provides trade-off between coding efficiency and error resilience in the leaky prediction. In this paper, we propose an improved leaky prediction method where the leaky factor is adaptively determined for each frame by minimizing the estimated end-to-end distortion at the encoder. Experimental results show that the proposed method with the adaptive leaky factor shows the better performance of the error robustness as compared with the conventional method.
-
Video digests provide an effective way of confirming a video content rapidly due to their very compact form. By watching a digest, users can easily check whether a specific content is worth seeing in full. The impression created by the digest greatly influences the user's choice in selecting video contents. We propose a novel method of automatic digest creation that evokes a joyful impression through the created digest by exploiting smile/laughter facial expressions as emotional cues of joy from video. We assume that a digest presenting smiling/laughing faces appeals to the user since he/she is assured that the smile/laughter expression is caused by joyful events inside the video. For detecting smile/laughter faces we have developed a neural network based method for classifying facial expressions. Video segmentation is performed by automatic shot detection. For creating joyful digests, appropriate shots are automatically selected by shot ranking based on the smile/laughter detection result. We report the results of user trials conducted for assessing the visual impression with automatically created 'joyful' digests produced by our system. The results show that users tend to prefer emotional digests containing laughter faces. This result suggests that the attractiveness of automatically created video digests can be improved by extracting emotional cues of the contents through automatic facial expression analysis as proposed in this paper.
-
A compressed video stream is very sensitive to transmission errors that may severely degrade the reconstructed image. Therefore, error resilience is an essential problem in video communications. In this paper, we propose novel temporal error concealment techniques for recovering lost or erroneously received macroblock (MB). To reduce the computational complexity, the proposed method adaptively determines the search range for each lost MB to find best matched block in the previous frame. And the original corrupted MB split into for
$8{\times}8$ sub-MBs, and estimates motion vector (MV) of each sub-MB using its boundary information. Then the estimated MVs are utilized to reconstruct the damaged MB. In simulation results, the proposed method shows better performance than conventional methods in both aspects of PSNR. -
In this paper, we propose a fast partial distortion algorithm using normalized dithering matching scan to get uniform distribution of partial distortion which can reduce only unnecessary computation significantly. Our algorithm is based on normalized dithering order matching scan and calibration of threshold error using LOG value for each sub-block continuously for efficient elimination of unlike candidate blocks while keeping the same prediction quality compared with the full search algorithm. Our algorithm reduces about 60% of computations for block matching error compared with conventional PDE (partial distortion elimination) algorithm without any prediction quality, and our algorithm will be useful to real-time video coding applications using MPEG-4 AVC or MPEG-2.
-
This paper proposes a flame verification algorithm using motion and spatial persistency. Most previous vision-based methods using color information and temporal variations of pixels produce frequent false alarms due to the use of many heuristic features. To solve these problems, we used a Bayesian Networks. In addition, since the shape of flame changes upwards irregularly due to the airflow caused by wind or burning material, we distinct real flame from moving objects by checking the motion orientation and temporal persistency of flame regions to remove the misclassification. As a result, the use of two verification steps and a Bayesian inference improved the detection performance and reduced the missing rate.
-
This paper presents an effective approach to minimize recursive computations for balancing stereo pairs by using disparity vector errors and its directional histogram. A stereo balancing function is computed from the correspondent pixels between two images, and a simple approach is to find the matching blocks of two images. However, this procedure requires recursive operation, and its computation cost is very high. Therefore, in this paper, we propose an efficient balance method using structural similarity index and a partial re-searching scheme to reduce the computation cost considerably. For this purpose, we determine if re-searching for each block is necessary or not by using the errors and the directional histogram of disparity vectors. Experiment results show that the performance of the proposed approach can save the computations significantly with ignorable image quality degradation compared with full re-search approach.
-
In this paper, we propose a method to estimate pointing region in real-world from images of cameras. In general, arm-pointing gesture encodes a direction which extends from user's fingertip to target point. In the proposed work, we assume that the pointing ray can be approximated to a straight line which passes through user's face and fingertip. Therefore, the proposed method extracts two end points for the estimation of pointing direction; one from the user's face and another from the user's fingertip region. Then, the pointing direction and its target region are estimated based on the 2D-3D projective mapping between camera images and real-world scene. In order to demonstrate an application of the proposed method, we constructed an ICGS (interactive cinema guiding system) which employs two CCD cameras and a monitor. The accuracy and robustness of the proposed method are also verified on the experimental results of several real video sequences.
-
Multi-view video consists of a set of multiple video sequences from multiple viewpoints or view directions in the same scene. It contains extremely a large amount of data and some extra information to be stored or transmitted to the user. This paper presents inter-view correlations among video objects and the background to reduce the prediction complexity while achieving a high coding efficiency in multi-view video coding. Our proposed algorism is based on object-based segmentation scheme that utilizes video object information obtained from the coded base view. This set of data help us to predict disparity vectors and motion vectors in enhancement views by employing object registration, which leads to high compression and low-complexity coding scheme for enhancement views. An experimental results show that the superiority can provide an improvement of PSNR gain 2.5.3 dB compared to the simulcast.
-
This paper proposes a single sign-on scheme in which a mobile user offers his credential information to a home network running the OSGi (Open Service Gateway Initiative) service platform, to obtain user authentication and control a remote device through a mobile device using this authentication scheme, based on SAML (Security Assertion Markup Language). Especially by defining the single sign-on profile to overcome the handicap of the low computing and memory capability of the mobile device, we provide a clue to applying automated user authentication to control a remote device via a mobile device for distributed mobile environments such as a home network based on OSGi.
-
Registration of microscopic section images from an organism is of importance in analyzing and understanding the function of an organism. Microscopes usually suffer from the radial distortion due to the spherical aberration. In this paper, a correction scheme for the intra-section registration is proposed. The correction scheme uses two corresponding feature points under the radial distortion model. Proposing several variations of the proposed scheme, we extensively conducted experiments for real microscopic images. Iterative versions of the correction from multiple feature points provide good performance for the registration of the optical and scanning electron microscopic images.
-
In this paper, we present a robust shape matching approach based on bottom-up segmentation. We show how over-segmentation results can be used to overcome both ambiguity of contour matching and occlusion. To measure the shape difference between a template and the object in the input, we use oriented chamfer matching. However, in contrast to previous work, we eliminate the affection of the background clutters before calculating the shape differences using over-segmentation results. By this method, we can increase the matching cost interval between true matching and false matching, which gives reliable results. Finally, our experiments also demonstrate that our method is robust despite the presence of occlusion.
-
The ubiquitous smart home is the home of future that takes advantage of context information from user and home environment and provides automatic home services for the user. User's location and motion are the most important contexts in the ubiquitous smart home. This paper presents a method positioning user's location using four cameras and some home context parameter together with user preferences provided. Some geometry math problems would be raised to figure out approximately which area is monitored by cameras. The moving object is detected within the image frames and then simulated in a 2D window to present visually where user is located and show his moving. The moving ways are statistically recorded and used for predicting user's future moving.
-
In this paper, a reference-free perceptual quality metric is proposed for image assessment. It measures the amount of overall blockiness and blurring in the image. And edge-oriented artifacts, such as ringing, mosaic and staircase noise are also considered. In order to give a single quality score, the individual artifact scores are adaptively combined according to the difference between the edge-oriented artifacts and other artifacts. The quality score obtained by the proposed algorithm shows strong correlation with the MOS values by VQEG.
-
Sub-pel level motion estimation contributes to significant increase in R-D performance for H.264|MPEG 4 Part 10 AVC. However, several supplements, such as interpolation, block matching, and Hadamard transform which entails large computational complexity of encoding process, are essential to find best matching block in sub-pel level motion estimation and compensation. In this paper, a fast motion estimation scheme in sub-pel accuracy is proposed based on a parabolic model of SAD to avoid such computational complexity. In the proposed scheme, motion estimation (ME) is only performed in integer-pel levels and the following sub-pel level motion vectors are found from the parametric SAD model for which the model parameters are estimated from the SAD values obtained in the integer-pel levels. Fall-back check is performed to ensure the validity of the parabolic SAD model with the estimated parameters. The experiment result shows that the proposed scheme can reduce the motion estimation time up to about 30% of the total ME times in average with negligible amount of PSNR drops (0.14dB in maximum) and bit increments (2.54%in maximum).
-
This paper proposes a new video compression algorithm using an adaptive transform that is adjusted depending on the frequency contents of the input signals. The adaptive transform is based on the warped discrete cosine transform (WDCT) which is shown to provide better performance than the DCT at high bit rates, when applied to JPEG compression scheme [1, 2, 3]. The WDCT is applied to the video compression in this paper, as a new feature in the H.264/AVC. The proposed method shows the coding gain over the H.264/AVC at high bit rates. The coding gain is shown over the 35dB PSNR quality, and the gain increases as the bit rate increases. (about 1.0dB at 45dB PSNR quality at maximum)
-
This paper proposes a method to estimate the flow speed of pedestrians in surveillance videos. In the proposed method, the average moving speed of pedestrians is measured by estimating the size of real-world motion from the observed motion vectors. For this purpose, pixel-to-meter conversion factors are calculated from camera geometry. Also, the height information, which is missing because of camera projection, is predicted statistically from simulation experiments. Compared to the previous works for flow speed estimation, our method can be applied to various camera views because it separates scene parameters explicitly. Experiments are performed on both simulation image sequences and real video. In the experiments on simulation videos, the proposed method estimated the flow speed with average error of about 0.1m/s. The proposed method also showed a promising result for the real video.
-
Estimation and correction of color temperature of digital images are basis of white balance adjustment after image acquisition stage. White balance is one of the most important image processing techniques for subjective image quality enhancement. Correction of color temperature is applied for white balance adjustment or for changing the mood of a picture. A picture taken under the daylight can be changed to have a mood of sunset or cloudy day, for example. We evaluate color temperature transformation of high dynamic range images in linear and log domain, and we conclude that linear domain transformation shows better results.
-
In general, channel information of received DTV signal analyzed based on symbol timing clock with only In-phase information in DTV receiver. This paper presents technical requirements of channel analysis system for DTV reception signal. In order to meet such requirements and measure more accurate magnitude and phase of channel information, compensation method for the quadrature information from measured in-phase data is proposed. The proposed channel analysis system is implemented with a commercial DTV chipset and provides fast data analysis with good connectivity with field test vehicles. Computer simulation and laboratory test results are provided to figure out the performance of the proposed channel analysis system for DTV signal.
-
We present an augmented reality (AR) application for cell phone where users put a virtual pet on their palms and play/interact with the pet by moving their hands and fingers naturally. The application is fundamentally based on hand/palm pose recognition and finger motion estimation, which is the main concern in this paper. We propose a fast and efficient hand/palm pose recognition method which uses natural features (e.g. direction, width, contour shape of hand region) extracted from a hand image with prior knowledge for hand shape or geometry (e.g. its approximated shape when a palm is open, length ratio between palm width and pal height). We also propose a natural interaction method which recognizes natural motion of fingers such as opening/closing palm based on fingertip tracking. Based on the proposed methods, we developed and tested the AR application on an ultra-mobile PC (UMPC).
-
Digital Multimedia Broadcasting (DMB) is the mobile TV service based on a digital radio transmission system that provides high quality audio/video and other auxiliary data services. As users want to store the DMB content in their device to be consumed later or to be shared among users, a standardized format needs to be specified to guarantee the interoperability for the DMB contents for various devices. DMB AF (Application Format) specification defines a file format for DMB contents and services. It specifies how to combine the variety of DMB contents with associated information for a presentation in a well-defined format that facilitates storage, interchange, management, editing, and presentation of the DMB contents in protected, governed, and interoperable ways. In this paper we present our implementation of DMB AF as part of the development of DMB AF reference software. Our implementation of DMB AF is developed as the reference software for the standard specification that consists of a three applications: packager, media player, metadata browser and collection of supporting libraries used by the applications.
-
Distributed video coding (DVC) is a new coding paradigm that enables to exploit the statistics among sources only in decoder and to achieve extremely low complex video encoding without any loss of coding efficiency. Wyner-Ziv coding, a particular implementation of DVC, reconstructs video by correcting noise on side information using channel code. Since a good quality of side information brings less noise to be removed by the channel code, generation of good side information is very important for the overall coding efficiency. However, if there are complex motions among frames, it is very hard to generate a good quality of side information without any information of original frame. In this paper, we propose a method to enhance the quality of the side information using small amount of additional information of original frame in the form of hash. By decoder's informing encoder where the hash has to be transmitted, side information can be improved enormously with only small amount of hash data. Therefore, the proposed method gains considerable coding efficiency. Results of our experiment have verified average PSNR gain up to 1 dB, when compared to the well-known DVC codec, known as DISCOVER codec.
-
In this paper, we describe self-training super-resolution. Our approach is based on example based algorithms. Example based algorithms need training images, and selection of those changes the result of the algorithm. Consequently it is important to choose training images. We propose self-training based super-resolution algorithm which use an input image itself as a training image. It seems like other example based super-resolution methods, but we consider training phase as the step to collect primitive information of the input image. And some artifacts along the edge are visible in applying example based algorithms. We reduce those artifacts giving weights in consideration of the edge direction. We demonstrate the performance of our approach is reasonable several synthetic images and real images.
-
Variation of viewing position of the observer is one of factors of image distortion in the stereoscopic display. A rotation movement of the observer makes the stereoscopic image distortion and that is caused by different horizontal position of each eye of the observer. It is different from horizontal and depth directional movement of the observer. In this paper, we showed the numerical simulation result about the distortion analysis and the correction of the stereoscopic image in rotation movement of the observer.
-
This paper presents a scheme that generates a caption file by extracting a Closed Caption stream from DTV signal. Note that Closed-Captioning service helps to bridge "digital divide" through extending broadcasting accessibility of a neglected class such as hearing-impaired person and foreigner. In Korea, DTV Closed Captioning standard was developed in June 2007, and Closed Captioning service should be supported by an enforcing law in all broadcasting services in 2008. In this paper, we describe the method of extracting a caption data from MPEG-2 Transport Stream of ATSC-based digital TV signal and generating a caption file (SAMI and SRT) using the extracted caption data and time information. Experimental results verify the feasibility of a generated caption file using a PC-based media player which is widely used in multimedia service.
-
Pin-hole model has been widely used as a robust tool for easily understanding how to obtain a stereo image and how to present the depth-cue to an observer in stereoscopy. However, most of the processes to analyze depth cue in stereoscopy are performed that a stereo image is taken by camera model practically but depth cue of the image is analyzed by pin-hole model. Therefore, the result of depth cues by the process to be uncorrected. The reason of the uncorrected depth cue is led to the image distances of camera model due to variable focused object distances, and it makes a depth distortion. In this paper, we tried to show the contradiction such as occurring depth distortion in the process which the pin-hole model is used to analyze depth cue despite practical camera model is used in stereoscopy, and we presents the method to overcome the contradiction.
-
This paper introduces the concept of Tangible Shopping conducting in virtual world. The main idea of this paper is to combine the concept of web 2.0 mashup into shopping activities in virtual world. Feature of annotation and web browsing are also included in this concept. This research aims to enhance web shopping activities from the conventional approach into new way which deliver tangible shopping experiences to users. At the beginning, we review the state-of-the-art of virtual worlds and Web 2.0 Mashup. Next, we review our related work. Then, we address the design and implementation of tangible shopping in virtual worlds.
-
A Methodology to Estimate the Unit Price of User Contribution in P2P Streaming System - A Case StudyPeer-to-Peer content delivery technology recently begins to be used not only for file sharing applications such as eDonkey but also for every kind of content services. KBS, a broadcasting company in Korea, is aggressively driving to apply Peer-to-Peer technology to KBS's commercial internet video service but we found that there are two big huddles. First, end users may refuse to share their own resources for KBS's cost reduction using Peer-to-Peer content delivery technology. Second, it may cause that the number of free-riders increases and the efficiency of the overall system would fall. From commercial service provider's perspective, we have to avoid that end users have unfavorable impressions on the service and the usefulness of Peer-to-Peer technology decreases. In order to overcome these problems, we studied how to offer incentive to end users and how much incentive would be reasonable, and then applied the result to real service for verification.
-
In the Wyner-Ziv coding, compression performance highly depends on the quality of the side information since better quality of side information brings less channel noise and less parity bit. However, as decoder generates side information without any knowledge of the current Wyner-Ziv frame, it doesn't have optimal criterion to decide which block is more advantageous to generate better side information. Hence, in general, fixed block size motion estimation (ME) is performed in generating side information. By the fixed block size ME, the best coding performance cannot be attained since some blocks are better to be motion estimated in different block sizes. Therefore if there is a way to find appropriate ME block of each block, the quality of the side information might be improved. In this paper, we investigate the effects of variable block sizes of ME in generating side information.
-
Multi-hop wireless mesh networks (WMNs) suffer from significant packet losses due to insufficient available bandwidth and high channel error probability. To conquer packet losses, end-to-end (E2E) error control schemes have been proposed. However, in WMNs, E2E error control schemes are not effective in adapting to the time-varying network condition due to large delay. Thus, in this paper, we propose a network-adaptive error control for video streaming over WMNs that flexibly operates E2E and hop-by-hop (HbH) error control according to network condition. Moreover, to provide lightweight support at intermediate nodes for HbH error control, we use path-partition-based adaptation. To verify the proposed scheme, we implement it and evaluate its transport performance through MPEG-2 video streaming over a real IEEE 802.11a-based WMN testbed.
-
The robustness of audio fingerprinting system in a noisy environment is a principal challenge in the area of content-based audio retrieval. The selected feature for the audio fingerprints must be robust in a noisy environment and the computational complexity of the searching algorithm must be low enough to be executed in real-time. The audio fingerprint proposed by Philips uses expanded hash table lookup to compensate errors introduced by noise. The expanded hash table lookup increases the searching complexity by a factor of 33 times the degree of expansion defined by the hamming distance. We propose a new method to improve noise robustness of audio fingerprinting in noise environment using predominant pitch which reduces the bit error of created hash values. The sub-fingerprint of our approach method is computed in each time frames of audio. The time frame is transformed into the frequency domain using FFT. The obtained audio spectrum is divided into 33 critical bands. Finally, the 32-bit hash value is computed by difference of each bands of energy. And only store bits near predominant pitch. Predominant pitches are extracted in each time frames of audio. The extraction process consists of harmonic enhancement, harmonic summation and selecting a band among critical bands.
-
In this paper, we present a method to define a color similarity between color images using Octree-based quantization and similar color integration. The proposed method defines major colors from each image using Octree-based quantization. Two color palettes to consist of major colors are compared based on Euclidean distance and similar color bins between palettes are matched. Multiple matched color bins are integrated and major colors are adjusted. Color histogram based on the color palette is constructed for each image and the difference between two histograms is computed by the weighted Euclidean distance between the matched color bins in consideration of the frequency of each bin. As an experiment to validate the usefulness, we discriminated the same clothing from CCD camera images based on the proposed color similarity analysis. We retrieved the same clothing images with the success rate of 88 % using only color analysis without texture analysis.
-
EAP (Extensible Authentication Protocol) is often used as an authentication framework for two-party protocol which supports multiple authentication algorithms known as "EAP method". And PKMv2 in 802.16e networks use EAP as an authentication protocol. However, this framework is not efficient when the EAP peer executing handover. The reason is that the EAP peer and EAP server should re-run EAP method each time so that they authenticate each other for secure handover. This makes some delays, so faster re-authentication method is needed. In this paper, we propose a new design of the PKMv2 framework which provides fast re-authentication. This new framework and usage of the keys which used as a short-term credential bring better performance during handover process.
-
Bluetooth Video Distribution Profile (VDP) defines the protocol and procedures that realize the distribution of video content compressed in a specific format for the efficient use of the limited bandwidth. In this paper, we describe the design of VDP tester based on TTCN-2 (Tree and Tabular Combined Notation), a language standardized by ISO for the specification of tests for real-time and communicating systems. Our work was carried out as a part of supporting a new profile testing module for VDP in PTS (Profile Tuning Suite), a reference test system for Bluetooth interoperability testing. Test demonstration for the interoperability with various VDP solutions at the PTS session in UPF30 (Unplug Fest) showed the validity of the developed tester. Eventually, we introduce the PTS architecture, and show the design and implementation of VDP tester included in the released PTS 3.0 in this paper.
-
In this paper, a new efficient algorithm for global motion estimation is proposed. This algorithm uses a previous 4-parameter model based global motion estimation algorithm and M-estimator for improving the accuracy and robustness of the estimate. The first algorithm uses the block based motion vector fields and which generates a coarse global motion parameters. And second algorithm is M-estimator technique for getting precise global motion parameters. This technique does not increase the computational complexity significantly, while providing good results in terms of estimation accuracy. In this work, an initial estimation for the global motion parameters is obtained using simple 4-parameter global motion estimation approach. The parameters are then refined using M-estimator technique. This combined algorithm shows significant reduction in mean compensation error and shows performance improvement over simple 4-parameter global motion estimation approach.
-
The use of scalable video coding scheme has been regarded as a promising solution for guaranteeing the quality of service of the video streaming over the Internet because it is a capable coding scheme to perform quality adaptation depending on network conditions. In this paper, we use a streaming model that transmits base layer using TCP and enhancement layers using DCCP, which try to provide transmission reliability of the BL and TCP friendliness. Unlike pervious works, the proposed algorithm performs rate adaptation based on playout buffer status. The PoB status of the client is sent back periodically to the server and serves as a network congestion indicator. Experimental results show that our scheme improves streaming quality comparing with pervious scheme in the case of not only constant/dynamic background flows but also VBR-encoded video sequence.
-
Generally, algorithms for generating disparity maps can be clssified into two categories: region-based method and feature-based method. The main focus of this research is to generate a disparity map with an accuracy depth information for 3-dimensional reconstructing. Basically, the region-based method and the feature-based method are simultaneously included in the proposed algorithm, so that the existing problems including false matching and occlusion can be effectively solved. As a region-based method, regions of false matching are extracted by the proposed MMAD(Modified Mean of Absolute Differences) algorithm which is a modification of the existing MAD(Mean of Absolute Differences) algorithm. As a feature-based method, the proposed method eliminates false matching errors by calculating the vector with SIFT and compensates the occluded regions by using a pair of adjacent SIFT matching points, so that the errors are reduced and the disparity map becomes more accurate.
-
We have researched and developed the caricature generation system PICASSO. PICASSO outputs the deformed facial caricature by comparing input face with prepared mean face. We specialized it as PICASSO-2 for exhibiting a robot at Aichi EXPO2005. This robot enforced by PICASSO-2 drew a facial caricature on the shrimp rice cracker with the laser pen. We have been recently exhibiting another revised robot characterized by a brush drawing. This system takes a couple of facial images with CCD camera, extracts the facial features from the images, and generates the facial caricature in real time. We experimentally evaluated the performance of the caricatures using a lot of data taken in Aichi EXPO2005. As a result it was obvious that this system were not sufficient in accuracy of eyebrow region extraction and mouth detection. In this paper, we propose the improved methods for eyebrow region extraction and mouth detection.
-
We introduce an instant and intuitive shadowing technique for CACAni System. In traditional 2D Anime, since all frames are drawn by hand it takes long time to create an entire animation sequence. CACAni System reduces such a production cost by automatic inbetweening and coloring. In this paper, we develop a shadowing technique that enables CACAni users to easily create shadow for character or object in the image. The only inputs required are sequences of character or object layers drawn in CACAni System with alpha value. Shadows are automatically rendered on a virtual plane based on these inputs. They can then be edited by CACAni users. Achieving this, CACAni System is able to handle instant shadowing and intuitive editing in a short time.
-
The scalable vector graphics (SVG) standard has allowed the complex bitmap images to be represented by vector based graphics and provided some advantages over the raster based graphics in applications, for example, where scalability is required. This paper presents an algorithmto convert bitmap images into SVG format. The algorithm is an integration of pixel-level triangulation, data dependent triangulation, a new image mesh simplification algorithm, and a polygonization process. Both triangulation techniques enable the image quality (especially the edge features) to be preserved well in the reconstructed image and the simplification and polygonization procedures reduce the size of the SVG file. Experiments confirm the effectiveness of the proposed algorithm.
-
We have developed a real-time software tool to extract a speech feature vector whose time sequences consist of three groups of vector components; the phonetic/acoustic features such as formant frequencies, the phonemic features as outputs on neural networks, and some distances of Japanese phonemes. In those features, since the phoneme distances for Japanese five vowels are applicable to express vowel articulation, we have designed a switch, a volume control and a color representation which are operated by pronouncing vowel sounds. As examples of those vowel interface, we have developed some speech training tools to display a image character or a rolling color ball and to control a cursor's movement for aurally- or vocally-handicapped children. In this paper, we introduce the functions and the principle of those systems.
-
In this paper, we present a novel method for reconstructing a super-resolution image using multi-view low-resolution images captured for depth varying scene without requiring complex analysis such as depth estimation and feature matching. The proposed method is based on the iterative back projection technique that is extended to the 3D volume domain (i.e., space + depth), unlike the conventional superresolution methods that handle only 2D translation among captured images.
-
PROPOSAL OF AMPLITUDE ONLY LOGARITHMIC RADON DESCRIPTER -A PERFORMANCE COMPARISON OF MATCHING SCORE-Amplitude-only logarithmic Radon transform (ALR transform) for pattern matching is proposed. This method provides robustness for object translation, scaling, and rotation. An ALR image is invariant even if objects are translated in a picture. For the object scaling and rotation, the ALR image is merely translated. The objects are identified using a phase-only matched filter to the ALR image. The ratio of size, the difference of rotation angle, and the position between the two objects are detected. Our pattern matching procedure is described, herein, and its simulation is executed. We compare matching scores with the Fourier-Mellin transform, and the general phase-only matched filter.
-
Sensor network has been a hot research topic for the past decade and has moved its phase into using multimedia sensors such as cameras and microphones [1]. Combining many types of sensor data will lead to more accurate and precise information of the environment. However, the use of sensor network data is still limited to closed circumstances. Thus, in this paper, we propose a web-service based framework to deploy multimedia sensor networks. In order to unify different types of sensor data and also to support heterogeneous client applications, we used ROA (Resource Oriented Architecture [2]).
-
This paper presents a new algorithm that includes a mechanism to avoid local solutions in a motion vector detection method that uses the steepest descent method. Two different implementations of the algorithm are demonstrated using two major search methods for tree structures, depth first search and breadth first search. Furthermore, it is shown that by avoiding local solutions, both of these implementations are able to obtain smaller prediction errors compared to conventional motion vector detection methods using the steepest descent method, and are able to perform motion vector detection within an arbitrary upper limit on the number of computations. The effects that differences in the search order have on the effectiveness of avoiding local solutions are also presented.
-
This paper presents a modified QIM-JPEG2000 steganography which improves the previous JPEG2000 steganography using quantization index modulation (QIM). Post-embedding changes in file size and PSNR by the modified QIM-JPEG2000 are smaller than those by the previous QIM-JPEG2000. Steganalysis experiments to determine whether messages are embedded in given JPEG2000 images show that the modified QIM-JPEG2000 is more secure than the previous QIMJPEG2000.
-
-
Premachandra, H.Chinthaka N.;Yendo, Tomohiro;Yamasato, Takaya;Fujii, Toshiaki;Tanimoto, Masayuki;Kimura, Yoshikatsu 476
In this paper, we propose a visible light road-to-vehicle communication system at intersection as one of ITS technique. In this system, the communication between vehicle and a LED traffic light is approached using LED traffic light as the transmitter, and on-vehicle high-speed camera as the receiver. The LEDs in the transmitter are emitted with 500Hz and those emitting LEDs are captured by a high-speed camera for making communication. The images from the high-speed camera are processed to get luminance value of each LED in the transmitter. For this purpose, first transmitter should be found, then it should be tracked for each frame, and the luminance value of each LED in the transmitter should be captured. In our previous work, transmitter was found by getting the subtraction of two consecutive frames. In this paper, we mainly introduce an algorithm to detect the found transmitter in consecutive frames. Experimental results using appropriate images showed the effectiveness of the proposal -
Scale-invariant feature is an effective method for retrieving and classifying images. In this study, we analyze a scale-invariant planar curve features for developing 2D shapes. Scale-space filtering is used to determine contour structures on different scales. However, it is difficult to track significant points on different scales. In mathematics, curvature is considered to be fundamental feature of a planar curve. However, the curvature of a digitized planar curve depends on a scale. Therefore, automatic scale detection for curvature analysis is required for practical use. We propose a technique for achieving automatic scale detection based on difference of curvature. Once the curvature values are normalized with regard to the scale, we can calculate difference in the curvature values for different scales. Further, an appropriate scale and its position are detected simultaneously, thereby avoiding tracking problem. Appropriate scales and their positions can be detected with high accuracy. An advantage of the proposed method is that the detected significant points do not need to be located in the same contour. The validity of the proposed method is confirmed by experimental results.
-
The inverse halftoning is processing to restore the image made binary to former step image. There are smoothing and gaussian filtering in the technique so far. However, there are still a lot of insufficient points in past inverse halftoning. The removal of the noise and the edge enhansment are closely related in inverse halftoning. It is difficult to do both the noise rednctiom and the edge enhansment in high accuracy at the same time in the technique so far. The technique that can achieve both the removal of the noise and the emphasis of Edge at the same time is expected as future tasks. Then, it was tried to apply the Kalmann filtering to inverse halftoning. In the actual experiment, the effectiveness of the application of the Kalmann filtering to inverse halftoning comparing it with the technique so far was shown.
-
Computer-Generated Hologram (CGH) is made for three dimensional image of a virtual object. Error diffusion method is used for the phase quantization of CGH, and it is known to be effective to the image quality improvement of the reconstructed image. However, the image quality of the reconstructed image from the CGH using error diffusion method depends on the selection of error diffusion coefficient. In this paper, we derived the relational expression to obtain the error diffusion coefficient from the position of the input object and size of the input object for CGH. As a result, the method of this thesis was able to obtain an excellent reconstructed image compared with the case to derive the error diffusion coefficient from only the position of the input image.
-
As for Hadamard Transform, because the calculation time of this transform is slower than Discrete Cosine Transform (DCT) and Fast Fourier Transform (FFT), the effectiveness and the practicality are insufficient. Then, the computational complexity can be decreased by using the butterfly operation as well as FFT. We composed calculation time of FFT with that of Fast Complex Hadamard Transform by constructing the algorithm of Fast Complex Hadamard Transform. They are indirect conversions using program of complex number calculation, and immediate calculations. We compared calculation time of them with that of FFT. As a result, the reducing the calculation time of the Complex Hadamard Transform is achieved. As for the computational complexity and calculation time, the result that quadrinomial Fast Complex Hadamard Transform that don't use program of complex number calculation decrease more than FFT was obtained.
-
Color chart area is automatically extracted in image that captured a crop such as fruits with the color chart, and an approximation formula is obtained for the change in feature value of the color indexes. Comparison is made with the color value of the crop area, and the growing degree is assessed according to the correlation. Using a compact PC equipped with the program, image of fruits is captured, and the output value obtained by the system is compared to the rating by expert. In the automatic recognition of the color chart out of doors, the complete color indexes is correctly acquired in 22 of 29 images. And indoors, they are correctly acquired in all of 34 images. In the color value judgment of the Japanese pear, indoors, 32 of 34 images is within 1.0 of the judgment error (compared the value read off by experts), the average error is about 0.5. These results indicate a practicable value.
-
The infringement of the copyright is a problem by the distribution of digital contents copied illegally. The digital watermark is expected as a thing preventing unjust copying by burying information in digital data such as image, animation, the sound, TV, radio and movies. [1] [2]But a noise is included in a digital watermark reproduction image. So there is the case that the certification of the reproduction image has difficulty with. If a computer cannot recognize the information reproduced from digital watermarking, the information does not have a meaning. This paper aimed at improvement of the proof of a digital-watermarking reproduction image. And it is verified whether the difference of the form of a character affects the degree of correlation.
-
A 90-nm CMOS motion estimation (ME) processor was developed by employing dynamic voltage and frequency scaling (DVFS) to greatly reduce the dynamic power. To make full use of the advantages of DVFS, a fast ME algorithm and a small on-chip DC/DC converter were also developed. The fast ME algorithm can adaptively predict the optimum supply voltage (
$V_D$ ) and the optimum clock frequency ($f_c$ ) before each block matching process starts. Power dissipation of the ME processor, which contained an absolute difference accumulator as well as the on-chip DC/DC converter and DVFS controller, was reduced to$31.5{\mu}W$ , which was only 2.8% that of a conventional ME processor. -
In the printer and the facsimile communication, digital halftoning is extremely important technologies. Error diffusion method is applied easy for color image halftoning. But the problem in error diffusion method is that a quite unrelated color has been generated though it is necessary to express the area of the grayscale in the black and white when the image that there is an area of the grayscale on a part of the color image is processed. The halftoning was assumed to be a combinational optimization problem to solve this problem, and the method of using SA (Simulated Annealing) was proposed. However, new problem existed because the processing time was a great amount compared with error diffusion method. Then, we propose the new error diffusion method.
-
In compressed color images, colors are usually represented by luminance and chrominance (YCbCr) components. Considering characteristics of human vision system, chrominance (CbCr) components are generally represented more coarsely than luminance component. Aiming at possible recovery of chrominance components, we propose a model-based chrominance estimation algorithm where color images are modeled by a Markov random field (MRF). A simple MRF model is here used whose local conditional probability density function (pdf) for a color vector of a pixel is a Gaussian pdf depending on color vectors of its neighboring pixels. Chrominance components of a pixel are estimated by maximizing the conditional pdf given its luminance component and its neighboring color vectors. Experimental results show that the proposed chrominance estimation algorithm is effective for quality improvement of compressed color images such as JPEG and JPEG2000.
-
Recently, image classification has been an important task in various fields. Generally, the performance of image classification is not good without the adjustment of image features. Therefore, it is desired that the way of automatic feature extraction. In this paper, we propose an image classification method which adjusts image features automatically. We assume that texture features are useful in image classification tasks because natural images are composed of several types of texture. Thus, the classification accuracy rate is improved by using distribution of texture features. We obtain texture features by calculating image features from a current considering pixel and its neighborhood pixels. And we calculate image features from distribution of textures feature. Those image features are adjusted to image classification tasks using Genetic Algorithm. We apply proposed method to classifying images into "head" or "non-head" and "male" or "female".
-
This paper reports a non-photorealistic rendering method for creating stream pattern from an input image. Our method extracts potential stream pattern in the given image. The proposed approach uses a shock filter based on a partial difference equation(PDE) which is implemented by applying a selective dilation and erosion processes. However, unlike the traditional first order solution to the PDE, we employ a second order scheme and compensate for the undesired diffusive effects caused by a viscosity form. The selection of dilation or erosion for a pixel is based on an edge detector computed from a structure tensor. By adding noises on to the input image, our method also can generate stream pattern even if there is less texture in some area. The experimental results show that the stream pattern is extracted very well.
-
Recently, autonomous robots which can achieve the complex tasks have been required with the advance of robotics. Advanced robot vision for recognition is necessary for the realization of such robots. In this paper, we propose a method to recognize an object in the actual environment. We assume that a 3D-object model used in our proposal method is the voxel data. Its inside is full up and its surface has color information. We also define the word "recognition" as the estimation of a target object's condition. This condition means the posture and the position of a target object in the actual environment. The proposal method consists of three steps. In Step 1, we extract features from the 3D-object model. In Step 2, we estimate the position of the target object. At last, we estimate the posture of the target object in Step 3. And we experiment in the actual environment. We also confirm the performance of our proposal method from results.
-
There are many videos about sports. There is a large need for content based video retrievals. In sports videos, the motions and camera works have much information about shots and plays. This paper proposes the baseball game process understanding using the similar motion retrieval on videos. We can retrieve the similar motion parts based on motions shown in videos using the space-time images describing the motions. Using a finite state model of plays, we can decide the precise point of pitches from the pattern of estimated typical motions. From only the motions, we can decide the precise point of pitches. This paper describes the method and the experimental results.
-
The object recognition mechanism of human being is not well understood yet. On research of animal experiment using an ape, however, neurons that respond to simple shape (e.g. circle, triangle, square and so on) were found. And Hypothesis has been set up as human being may recognize object as combination of such simple shapes. That mechanism is called Figure Alphabet Hypothesis, and those simple shapes are called Figure Alphabet. As one way to research object recognition algorithm, we focused attention to this Figure Alphabet Hypothesis. Getting idea from it, we proposed the feature extraction algorithm for object recognition. In this paper, we described recognition of binarized images of multifont alphabet characters by the recognition model which combined three-layered neural network in the feature extraction algorithm. First of all, we calculated the difference between the learning image data set and the template by the feature extraction algorithm. The computed finite difference is a feature quantity of the feature extraction algorithm. We had it input the feature quantity to the neural network model and learn by backpropagation (BP method). We had the recognition model recognize the unknown image data set and found the correct answer rate. To estimate the performance of the contriving recognition model, we had the unknown image data set recognized by a conventional neural network. As a result, the contriving recognition model showed a higher correct answer rate than a conventional neural network model. Therefore the validity of the contriving recognition model could be proved. We'll plan the research a recognition of natural image by the contriving recognition model in the future.
-
3-D CAD (Computer Aided Design) system is an indispensable tool for manufacturing. A lot of engineers have studied for the methods to generate a curved surface on an N-sided shape, which is the basic technology of 3-D CAD systems. This surface generation, however, has three problems on the case of long and narrow shapes: the resultant surface is distorted, the surface is not continuous to adjacent surfaces, or additional user inputs are required to generate the surface. Conventional methods have not yet solved these problems at the same time. In this paper, we propose the method to generate internal curves that divide a long and narrow shape into regular N-sided sections so as to divide the shape into an N-sided section and four-sided ones. Our method controls the shape of internal curves by dividing an N-sided long and narrow shape into an N-sided section and four-sided ones, and solves distortion of the generated curved surface. In addition, each of the generated sections is interpolated with G1-continuous surfaces. This process does not require any user's further input. Therefore, the three problems mentioned above will be solved at the same time.
-
This paper presents a prototype of high resolution 3D display with a new principle. We have proposed a new 3D display which has the features of both Integral Imaging (II) and volumetric display. The proposed display consists of two lens arrays and a thin volumetric display. When the viewer watches a thin volumetric display through two lens array, he can perceive a thick 3D image. In other words the two lens arrays can play a role of a convex lens which has a large diameter as a amplification of a depth. The advantage of the proposed display is that it has higher resolution than II and it is smaller than volumetric display with a large convex lens. In this paper, we show a detail of a prototype 3D display. We took various errors into consideration when we simulated 3D display and we found suitable lenses parameter from the simulation result. Then we confirm that the prototype will be able to reconstruct 3D images.
-
Aizawa, Mitsuhiro;Sasaki, Keita;Kobayashi, Norio;Yama, Mitsuru;Kakizawa, Takashi;Nishikawa, Keiichi;Sano, Tsukasa;Murakami, Shinichi 562
This paper describes an automatic 3-dimensional (3D) segmentation method for 3D CT (Computed Tomography) images using region growing (RG) and edge detection techniques. Specifically, an augmented RG method in which the contours of regions are extracted by a 3D digital edge detection filter is presented. The feature of this method is the capability of preventing the leakage of regions which is a defect of conventional RG method. Experimental results applied to the extraction of teeth from 3D CT data of jaw bones show that teeth are correctly extracted by the proposed method. -
A calibration method for multiple sets of stereo vision cameras is proposed. To measure the three-dimensional shape of a very long object, measuring the object at different viewpoints and registration of the data are necessary. In this study, two lasers beams generate two strings of calibration targets, which form straight lines in the world coordinate system. An evaluation function is defined to calculate the sum of the squares of the distances between each transformed target and the fitted line representing the laser beam to each target, and the distances between points appearing in the data sets of two adjacent viewpoints. The calculation process for the approximation method based on data linearity is presented. The experimental results show the effectiveness of the method.
-
This paper proposes a novel reversible image authentication method that requires neither location map nor memorization of parameters. The proposed method detects image tampering and further localizes tampered regions. Though this method once distorts an image to hide data for tamper detection, it recovers the original image from the distorted image unless no tamper is applied to the image. The method extracts hidden data and recovers the original image without memorization of any location map that indicates hiding places and of any parameter used in the algorithm. This feature makes the proposed method practical. Simulation results show the effectiveness of the proposed method.
-
Computer-Generated Hologram (CGH) is generally made by Fourier Transform. CGH is made by an optical reconstruction. Computer-Generated Pseudo Hologram (CGPH) is made up Complex Hadamard Transform instead of CGH which is made by the Fourier Transform. CGPH differs from CGH in point of view the possibility of optical reconstruction. There is an advantage that it cannot be optical reconstruction, in other word, physical leakage of the confidential information is impossible. In this paper, a binary image was converted in Complex Hadamard Transform, and CGPH was made. Improvement of the reconstructed image from CGPH is done by error diffusion method and iterative method. The result that the reconstructed image is improved is shown.
-
In this paper we propose a new method of Depth-Image-Based Rendering (DIBR) for Free-viewpoint TV (FTV). In the proposed method, virtual viewpoint images are rendered with 3D warping instead of estimating the view-dependent depth since depth estimation is usually costly and it is desirable to eliminate it from the rendering process. However, 3D warping causes some problems that do not occur in the method with view-dependent depth estimation; for example, the appearance of holes on the rendered image, and the occurrence of depth discontinuity on the surface of the object at virtual image plane. Depth discontinuity causes artifacts on the rendered image. In this paper, these problems are solved by reconstructing disparity information at virtual camera position from neighboring two real cameras. In the experiments, high quality arbitrary viewpoint images were obtained.
-
This paper proposes a reversible information hiding method for binary images. A half of pixels in noisy blocks on cover images is candidate for embeddable pixels. Among the candidate pixels, we select compressive pixels by bit patterns of its neighborhood to compress the pixels effectively. Thus, embeddable pixels in the proposed method are compressive pixels in noisy blocks. We provide experimental results using several binary images binarized by the different methods.
-
In this paper, we propose a face detection technique for still pictures which sequentially uses a skin-color model and a support vector machine (SVM). SVM is a learning algorithm for solving the classification problem. Some studies on face detection have reported superior results of SVM over neural networks. The SVM method searches for a face in a picture while changing the size of the window. The detection accuracy and the processing time of SVM vary largely depending on the complexity of the background of the picture or the size of the face. Therefore, we apply a face candidate area detection method using a skin-color model as a preprocessing technique. We compared the method using SVM alone with that of the proposed method in respect to face detection accuracy and processing time. As a result, the proposed method showed improved processing time while maintaining a high recognition rate.
-
We are developing a simple low-cost wind velocity sensor based on small microphones. The sensor system consists of 4 microphones covered with specially shaped wind screens, 4 pre-amplifiers that respond to low frequency, and a commercial sound interface with multi channel inputs. In this paper, we first present the principle of the sensor, i.e., technique to successfully suppress the influence of external noise existing in the environment in order to determine the wind velocity and the wind direction from the output from a microphone. Then, we present an application for generating realistic motions of a virtual tree swaying in real wind. Although the current sensor outputs significant leaps in a measured sequence of directions, the interactive animations demonstrate that it is usable for such applications, if we could reduce the leaps to some degree.
-
We have proposed a measurement system for measuring a whole shape of an object easily. The proposed system consists of a camera and a cylinder whose inside is coated by a mirror layer. A target object is placed inside the cylinder and an image is captured by the camera from right above. The captured image includes sets of points that are observed from multiple viewpoints: one is observed directly, and others are observed via the mirror. Therefore, the whole shape of the object can be measured using stereo vision in a single shot. This paper shows that a prototype of the proposed system was implemented and an actual object was measured using the prototype. A method based on a pattern matching which uses a value of SSD (Sum of Squared Difference), and a method based on DP (Dynamic Programming) are employed to identify a set of corresponding points in warped captured images.
-
This paper describes a method for achieving a novel design within a class of 3D objects that would create a preferred impression on users. Physical parameters of the 3D objects that might strongly contribute to their visual impressions are sought through computational investigation of the impression ratings obtained for learning samples. "Car body" was selected as the class of 3D objects to be investigated. A morphable 3D model of car bodies that describes the variations in appearance using a smaller number of parameters was obtained. Based on each car body's rating for the impression of speediness obtained by paired comparison, the visual impression was transformed by manipulating the parameters defined in the morphable 3D model. The validity of the proposed method was confirmed by psychological experiments. A new scheme is also proposed to properly re-sample a novel object of a peculiar shape so that such an object could also be represented by the morphable 3D model.
-
In a DCT coding, degradations called block artifact and mosquito noise are appeared in reconstructed pictures. They should be reduced in post processing after decoding without superabundant processing. However, an estimation of mosquito noise is rare because of its difficulty. To realize an estimation of mosquito noise level, we extract a block that mosquito noise will be easy to occur. Mosquito noise level is calculated at a selected side of the block. In this processing, only the sides of high probability block are used. Then, a block value is taken by averaging. Finally, the picture value is calculated by averaging of this. Estimation method is evaluated by using the MPEG-4 decoded pictures. Quantization scale of coding and the estimated mosquito noise level are compared. As the results, we recognize the proposed method gives almost reasonable mosquito block and absolute level. Father, adaptive filter is controlled by the estimated mosquito noise level. It is recognized that the high quality of decoded picture is kept and the mosquito noise is reduced effectively at the picture with degradation.
-
An improvement of standard encoder has been saturated recently. However, new coding method does not have a compatibility with conventional standard. To solve this problem, new concept coding method that has a semicompatibility with standard may be discussed. On the other hand, cyclic Intra-picture coding is used for access and refreshment. However, I-picture spend large amount of bits. An enhancement of I-picture is desired with keeping its refreshment performance. Further, it's a problem that quality change at the border of GOP because of its independency. To respond these, we propose the coding which is applied an inter-frame processing at the border of GOP. Applied method is the reduction of quantization error using the motion compensated inter-picture processing. In this report, we check the improvement of the efficiency and the compatibility of proposed method. As a result of examination, we recognize that the total gain is maximally 1.2dB in PSNR. Generally, the degradation of performance in standard decoding is smaller than its gain. Also the refreshment performance is tested.
-
Recently, we have many kinds of picture format and display, and resizing (scaling) of picture becomes important. In this processing, quality of picture depends on re-sizing method. For this, some methods to improve the PSNR have been proposed. However, subjective picture quality is more important. Especially, degradation caused by re-sizing, such as jaggy (aliasing) and ringing, should be reduced. To solve them, we have proposed the method using directional adaptive interpolation. To improve the performance of this method, we consider the shape analysis this time. In the proposed method, directional adaptive processing is applied for pure edge only. In the texture area and flat area, 8 tap re-sampling filter is used. As the results of processing, the reductions of jaggy and incorrect interpolated pixels are recognized. The subjective picture quality of proposed method is significantly better than 8-tap re-sampling which gives good PSNR.
-
In this paper, we proposed a new authentication method using video that was taken during moving a hand-held camera in front of the face. The proposed method extracted individuality from the obtained image sequences using the parametric eigenspace scheme. Changes of facial appearance through authentication trials draw continuous tracks in the low dimensional igenspace. The similarity between their continuous tracks are calculated by DP-matching to verify their identities. Experimental results confirmed that different motions and persons change the shapes of continuous tracks, so the proposed method could identify the person.
-
Recently Renku (Haikai no Renga) is getting popular as well as Haiku in Japan. It was built up by Basho Matsuo, who was the most famous Haiku poet. It is said that Kyoshi Takahama proposed the name of "Renku" to distinguish it from "Renga" and "Haiku" in 1904. Renku meetings are held like Haiku ones regularly now in each place, and in several universities, they conduct a class exercise of Renku continuously. It is very important for plural persons to work together cooperatively. Poetry, Tanka, Haiku and Renku are usually composed of only letters. It sometimes happens that we add pictures to make them more attractive and to aim at synergy by collaboration (letters and pictures). However, the study to produce 3DCG animations of Renku has not been reported very much. Sowe studied to produce 3DCG animations work based on the rule of Renku and its evaluation.
-
A computer-generated hologram(CGH) is made for three-dimensional image reconstruction of a virtual object which is a difficult to irradiate the laser light directly. One of the adverse effect factors is quantization of wave front computed by program when a computer-generated hologram is made. Amplitude element is not considered in Kinoform, it needs processing to reduce noise or false image. So several investigation was reported that the improvement of reconstructed image of Kinoform. Means to calculate the most suitable complex amplitude distribution are iterative algorithm, simulated annealing algorithm and genetic Algorithm. Error diffusion method reconstructed to separate the object as for the noise that originated in the quantization error. So it is efficient method to obtain high quality image with not many processing.
-
Accurate rendering of a virtual scene in real time has been one of important issues for virtual reality (VR) technology. Specular reflection of light has been studied a long time, which is always seen on a metallic object and causes occasionally very strong brightness (highlight). Due to restriction of number of gradation of brightness (usually 256), maximum brightness and contrast ratio, the highlight is relatively weakly represented by displays and projectors. In addition, specular reflection will be take influence of binocular parallax and motion parallax, because it is light to reflect in a specific course. Therefore in this paper, an emphasized highlight model of a metallic object on the CAVE system is proposed. Decreasing brightness slightly on neighbor area of highlighted area, the proposed method increases contrast ratio between the highlighted area and neighbor area. Furthermore, using features of CAVE, the proposed method also represents glance (blink). When a metallic object moves, the method alternatively represents images with highlight and without highlight for both eyes. Since the difference of images for both eyes influences binocular parallax and motion parallax, a userfeels glance more realistically.
-
In this paper, we propose "InFra-VC", an interactive Fractal viewer on virtual reality system. Fractal impresses us very much and is expected to model some natural phenomenon effectively. Therefore, their visualization is necessary for us to study their beauty as well as structure. InFra-VC can represents a cutting plane of the fractal figure, so that the user can see and enjoy its internal structure. Moving position of the cutting surface, we can easily understand its internal structure. Additionally, InFra-VC can store situation as VRML format file at any time. This feature enables us to see the structure by a VRML viewer on common PC.
-
Fractal dimension has been used for texture analysis as it is highly correlated with human perception of surface roughness and applied to quantifying the structures of wide range of objects in biology and medicine. On the other hand, the evaluation of the human skin state is based solely on the subjective assessment of clinicians; this assessment may vary from moment to moment and from rater to rater. Therefore we attempt to analysis of skin texture image using fractal dimension and discuss its application to evaluating human skin state. It can be helpful for extracting human features and also can be useful for detection of many human skin diseases. This paper presents a method to calculate fractal dimension of skin with use of camera lens magnification. We take multiple pictures frequently from skin with different camera lens magnification as a magnification factor of fractal set, and counting the number of objects (cells) in each picture as a number of self similar pieces of fractal set.
-
In this paper, we propose a rectification method that can convert ray space data obtained by controlled camera array to ideal data. Here, Ideal data is obtained by getting longitudinal and transversal epipolar line between cameras vertical and horizontal. However it is actually difficult to arrange cameras strictly because we arrange cameras by hand. As conventional method, we have use camera-calibration method. But if we use this method there are some errors on the output image. When we generate arbitrary viewpoint images this error is critical problem. We focus attention on ideal trajectory of characteristic point. And to minimize the error directly we parallelize the real one. And we showed usefulness of proposed technique. Then using the proposed technique, we were successful reducing the error to less than 0.5 pixels.
-
There is strong demand to create wearable PC systems that can support the user outdoors. When we are outdoors, our movement makes it impossible to use traditional input devices such as keyboards and mice. We propose a hand gesture interface based on image processing to operate wearable PCs. The semi-transparent PC screen is displayed on the head mount display (HMD), and the user makes hand gestures to select icons on the screen. The user's hand is extracted from the images captured by a color camera mounted above the HMD. Since skin color can vary widely due to outdoor lighting effects, a key problem is accurately discrimination the hand from the background. The proposed method does not assume any fixed skin color space. First, the image is divided into blocks and blocks with similar average color are linked. Contiguous regions are then subjected to hand recognition. Blocks on the edges of the hand region are subdivided for more accurate finger discrimination. A change in hand shape is recognized as hand movement. Our current input interface associates a hand grasp with a mouse click. Tests on a prototype system confirm that the proposed method recognizes hand gestures accurately at high speed. We intend to develop a wider range of recognizable gestures.
-
In wireless ad-hoc network, knowing the available bandwidth of the time varying channel is imperative for live video streaming applications. This is because the available bandwidth is varying all the time and strictly limited against the large data size of video streaming. Additionally, adapting the encoding rate to the suitable bit-rate for the network, where an overlarge encoding rate induces congestion loss and playback delay, decreases the loss and delay. While some effective rate controlling methods have been proposed and simulated well like VTP (Video Transport Protocol) [1], implementing to cooperate with the encoder and tuning the parameters are still challenging works. In this paper, we show our result of the implementation experiment of VTP based encoding rate controlling method and then introduce some techniques of our parameter tuning for a video streaming application over wireless environment.
-
Takano, Kunihiko;Kabutoya, Yuta;Noguchi, Mikihiro;Hochido, Syunsuke;Lan, Tian;Sato, Koki;Muto, Kenji 673
In this paper, a transmitting process of a sequence of holograms describing 3D moving objects over the communicating wireless-network system is presented. A sequence of holograms involves holograms is transformed into a bit stream data, and then it is transmitted over the wireless LAN and Bluetooth. It is shown that applying this technique, holographic data of 3D moving object is transmitted in high quality and a relatively good reconstruction of holographic images is performed. -
A lot of researches [1] have been conducted on digital watermark embedding in brightness. A prerequisite for the digital watermark is that the image quality does not change even if the volume of the embedded information increases. Generally, the noise on complex images is perceived than the noise on fiat images. Thus, we present a method for watermarking an image by embedding complex areas by priority. The proposed method has achieved higher image quality of digital watermarking compared to other method that do not take into consideration the complexity of blocks, although the PSNR of the proposed method is lower than for a method not based on block complexity.
-
In this paper, it proposes the technique to restore from a binary image in the color image. The color image is composed of three element images of red, green and blue. Therefore, the color image is first divided into a red, green, and blue element, and the Inverse Halftoning[2]
$\sim$ [4] is processed to each element images. Finally, each element images is collectively displayed. In that case, the Kalman filter was applied to the Inverse Halftoning for the restoration accuracy improvement of the image. As a result, it was possible to restore it in the color image as well as the time of a monochrome image. Moreover, the result that the restoration accuracy had improved even when which combining with the technique by using the Kalman filter for the Inverse Halftoning so far came out. -
We developed a facial image generating technique that can manipulate facial impressions. The present study applied this impression transferring method to higher-order impressions such as "elegance" or "attractiveness" and confirmed the psychological validity of this method using the semantic differential method. Subsequently, we applied this method to two types of cognitive experiments. First, we examined the contributions of texture and shape on the facial impressions by using those face images for which the impressions have already been quantitatively manipulated based on this method. Second, we used such stimuli to examine the effect of facial impressions and attractiveness on the "mere exposure effect." Thus, we concluded that the impression transfer vector method is an effective tool to quantitatively manipulate the facial impressions in various cognitive studies.
-
CONSIDERATION OF THE RELATION BETWEEN DISTANCE AND CHANGE OF PANEL COLOR BASED ON AERIAL PERSPECTIVEThree-dimensional (3D) shape recognition and distance recognition methods utilizing monocular camera systems have been required for field of virtual-reality, computer graphics, measurement technology and robot technology. There have been many studies regarding 3D shape and distance recognition based on geometric and optical information, and it is now possible to accurately measure the geometric information of an object at short range distances. However, these methods cannot currently be applied to long range objects. In the field of virtual-reality, all visual objects must be presented at widely varying ranges, even though some objects will be hazed over. In order to achieve distance recognition from a landscape image, we focused on the use of aerial perspective to simulate a type of depth perception and investigated the relationship between distance and color perception. The applicability of our proposed method was demonstrated in experimental results.
-
This paper proposes a gender identification using shoeprint images. It is difficult for the proposed method to identify an individual if shoeprint images for identification leaked out. Because the proposed method identifies gender without the faces, the type of dress and the hair types images. Therefore we can use safely the proposed method in public place. In addition, a sensor mat which we developed is reasonable to use mechanical switches arranged in a matrix pattern without pressure switches. We had shoeprint images with the sensor mat. We measure feature parameters from shoeprint images. The feature parameters are length, width and area of shoeprint. Utilizing the feature parameters, we identified gender. In order to verify the gender identification rate of the proposed method, we set up the sensor mat at an entrance of buildings and took shoeprint images of 100 men and 100 women. As a result, we achieved about 86 percent of the gender identification rate.
-
The present paper describes the application of an improved impression transfer vector method (Sakurai et al., 2007) to transform the three basic dimensions (Evaluation, Activity, and Potency) of higher-order impression. First, a set of shapes and surface textures of faces was represented by multi-dimensional vectors. Second, the variation among faces was coded in reduced parameters derived by applying principal component analysis. Third, a facial attribute along a given impression dimension was analyzed to select discriminative parameters from among principal components with higher sensitivity to impressions, and obtain an impression transfer vector. Finally, the parametric coordinates were changed by adding or subtracting the impression transfer vector and the image was manipulated so that its facial appearance clearly exhibits the transformed impression. A psychological rating experiment confirmed that the impression transfer vector modulated three dimensions of higher-order impression. We discussed the versatility of the impression transfer vector method.
-
Recently, a lot of research that applies data acquired from devices such as cameras and RFIDs to context aware services is being performed in the field on Life-Log and the sensor network. A variety of analytical techniques has been proposed to recognize various information from the raw data because video and audio data include a larger volume of information than other sensor data. However, manually watching a huge amount of media data again has been necessary to create supervised data for the update of a class or the addition of a new class because these techniques generally use supervised learning. Therefore, the problem was that applications were able to use only recognition function based on fixed supervised data in most cases. Then, we proposed a method of acquiring supervised data from a video sharing site where users give comments on any video scene because those sites are remarkably popular and, therefore, many comments are generated. In the first step of this method, words with a high utility value are extracted by filtering the comment about the video. Second, the set of feature data in the time series is calculated by applying functions, which extract various feature data, to media data. Finally, our learning system calculates the correlation coefficient by using the above-mentioned two kinds of data, and the correlation coefficient is stored in the DB of the system. Various other applications contain a recognition function that is used to generate collective intelligence based on Web comments, by applying this correlation coefficient to new media data. In addition, flexible recognition that adjusts to a new object becomes possible by regularly acquiring and learning both media data and comments from a video sharing site while reducing work by manual operation. As a result, recognition of not only the name of the seen object but also indirect information, e.g. the impression or the action toward the object, was enabled.
-
3D reconstruction of a human face from an image sequence remains an important problem in computer vision. We propose a method, based on a factorization algorithm, that reconstructs a 3D face model from short image sequences exhibiting rotational motion. Factorization algorithms can recover structure and motion simultaneously from one image sequence, but they usually require that all feature points be well tracked. Under rotational motion, however, feature tracking often fails due to occlusion and frame out of features. Additionally, the paucity of images may make feature tracking more difficult or decrease reconstruction accuracy. The proposed 3D reconstruction approach can handle short image sequences exhibiting rotational motion wherein feature points are likely to be missing. We implement the proposal as a reconstruction method; it employs image sequence division and a feature tracking method that uses Active Appearance Models to avoid the failure of feature tracking. Experiments conducted on an image sequence of a human face demonstrate the effectiveness of the proposed method.
-
The purpose of this paper is to develop a robot that moves independently, communicates with a human, and explicitly extracts information from the human mind that is rarely expressed verbally. In a spoken dialog system for information collection, it is desirable to continue communicating with the user as long as possible, but not if the user does not wish to communicate. Therefore, the system should be able to terminate the communication before the user starts to object to using it. In this paper, to enable the construction of a decision model for a system to decide when to stop communicating with a human, we acquired speech and motion data from individuals who were asked many questions by another person. We then analyze their speech and body motion when they do not mind answering the questions, and also when they wish the questioning to cease. From the results, we can identify differences in speech power, length of pauses, speech rate, and body motion.
-
This paper describes a non-causal interpolative prediction method for B-picture encoding. Interpolative prediction uses correlations between neighboring pixels, including non-causal pixels, for high prediction performance, in contrast to the conventional prediction, using only the causal pixels. For the interpolative prediction, the optimal quantizing scheme has been investigated for preventing conding error power from expanding in the decoding process. In this paper, we extend the optimal quantization sceme to inter-frame prediction in video coding. Unlike H.264 scheme, our method uses non-causal frames adjacent to the prediction frame.
-
In this paper, we propose a method to generate a free viewpoint image using multi-viewpoint images which are taken by cameras arranged circularly. In past times, we have proposed the method to generate a free viewpoint image based on Ray-Space method. However, with that method, we can not generate a walk-through view seen from a virtual viewpoint among objects. The method we propose in this paper realizes the generation of such view. Our method gets information of the positions of objects using shape from silhouette method at first, and selects appropriate cameras which acquired rays needed for generating a virtual image. A free viewpoint image can be generated by collecting rays which pass over the focal point of a virtual camera. However, when the requested ray is not available, it is necessary to interpolate it from neighboring rays. Therefore, we estimate the depth of the objects from a virtual camera and interpolate ray information to generate the image. In the experiments with the virtual sequences which were captured at every 6 degrees, we set the virtual camera at user's choice and generated the image from that viewpoint successfully.
-
In this paper we focus on EPI (Epipolar-Plane Image), the horizontal cross section of Ray-Space, and we propose a novel method that chooses objects we want and edits scenes by using multi-view images. On the EPI acquired by camera arrays uniformly distributed along a line, all the objects are represented as straight lines, and the slope of straight lines are decided by the distance between objects and camera plane. Detecting a straight line of a specific slope and removing it mean that an object in a specific depth has been detected and removed. So we propose a scheme to make a layer of a specific slope compete with other layers instead of extracting layers sequentially from front to back. This enables an effective removal of obstacles, object manipulation and a clearer 3D scene with what we want to see will be made.
-
This paper addresses the factorization method to estimate the projective structure of a scene from feature (points) correspondences over images with occlusions. We propose both a column and a row space approaches to estimate the depth parameter using the subspace constraints. The projective depth parameters are estimated by maximizing projection onto the subspace based either on the Joint Projection matrix (JPM) or on the the Joint Structure matrix (JSM). We perform the maximization over significant observation and employ Tardif's Camera Basis Constraints (CBC) method for the matrix factorization, thus the missing data problem can be overcome. The depth estimation and the matrix factorization alternate until convergence is reached. Result of Experiments on both real and synthetic image sequences has confirmed the effectiveness of our proposed method.
-
This paper proposes a wireless LAN antenna system that tracks an object automatically by using image-based tracking. The proposed system consists of a camera and a pan-tilt unit in addition to a directional wireless LAN antenna. The camera and the directional antenna are set in same direction and they are set on the pan-tilt unit. A target object which has a wireless LAN receiver is tracked by using images captured by the camera. And the directional antenna faces in same direction as the camera by the pan-tilt unit. Therefore, the directional antenna keeps pointing the receiver, and a transmitting efficiency is improved. A result of a fundamental experiment shows that a receiver attached to a flying airship was tracked by a prototype of the proposed antenna system. The airship flied about, and the proposed antenna system was set on a roof of a building. The experimental result indicates an effectiveness of the proposed system compared to the conventional directional LAN antenna.
-
In this paper we focus on the Personal Space (PS) as a nonverbal communication concept to build a new Human Computer Interaction. The analysis of people positions with respect to their PS gives an idea on the nature of their relationship. We propose to analyze and model the PS using Computer Vision (CV), and visualize it using Computer Graphics. For this purpose, we define the PS based on four parameters: distance between people, their face orientations, age, and gender. We automatically estimate the first two parameters from image sequences using CV technology, while the two other parameters are set manually. Finally, we calculate the two-dimensional relationship of multiple persons and visualize it as 3D contours in real-time. Our method can sense and visualize invisible and unconscious PS distributions and convey the spatial relationship of users by an intuitive visual representation. The results of this paper can be used to Human Computer Interaction in public spaces.
-
This paper deals with the Cell-based distributed processing for generating free viewpoint images by merging multiple differently focused images. We previously proposed the method of generating free viewpoint images without any depth estimation. However, it is not so easy to realize real-time image reconstruction based on our previous method. In this paper, we discuss the method to reduce the processing time by dimension reduction for image filtering and Cell-based distributed processing. Especially, the method of high-speed image reconstruction by the Cell processor on SONY PLAYSTATION3(PS3) is described in detail. We show some experimental results by using real images and we discuss the possibility of real-time free viewpoint image reconstruction.
-
This paper proposes an HDR display system using multi-projectors for presentation of HDR contents to multi-users. An HDR image is resolved by luminance and the resolved images are assigned to several projectors. The proposed system projects the HDR contents onto a large screen, and the system can display the HDR contents to multi-users. The proposed system realized to output the broad luminance by emitting the light of multi-projectors onto the same screen. In addition, tonal steps of the proposed system increase so that other projectors cover the tonal steps in the region where the dynamic range is expanded. In this paper, we indicate an effectiveness of the proposed system.
-
"Watoji" is a Japanese traditional book binding technique. An aim of this research develops a support system which everyone can make Watoji easily by. This system uses a working situation by recognizing a position of a needle and annotating to a book directly by mixed reality technique. Additionally, a technique of recognizing a working situation at a book is sewn by a needle is proposed. The proposed system of recognizing a position of a needle is build, then we experiment recognizing of a needle from an image. Furthermore, setting up Watoji on the system, we experiment recognizing of a position of a needle. An experimental result shows recognizing a needle from an acquired image from a camera. Using this result, a working situation can be recognized. Then, suitable information to a working situation can be presented.
-
Motion capture systems allow to measure the precise position of markers on the human body in real time. These captured motion data, the marker position data, have to be fitted by a human skeleton model to represent the motion of the human. Typical human skeleton models approximate the joints using a ball joint model. However, because this model cannot represent the human skeleton precisely, errors between the motion data and the movements of the simplified human skeleton model happen. We propose in this paper a method for measuring a translation component of wrist, and elbow joints on upper limb using optical motion capture system. Then we study the errors between the ball joint model and acquired motion data. In addition, we discuss the problem to estimate motion of human joint using optical motion capture system.
-
This paper proposes a method for measurement three-dimensional trajectories of bubbles generated around a swimmer's arms from stereo high-speed camera videos. This method is based on two techniques: two-dimensional trajectory estimation in single-camera images and trajectory pair matching in stereo-camera images. The two-dimensional trajectory is estimated by block matching using similarity of bubble shape and probability of bubble displacement. The trajectory matching is achieved by a consistensy test using epipolar constraint in multiple frames. The experimental results in two-dimensional trajectory estimation showed the estimation accuracy of 47% solely by the general optical flow estimation, whereas 71% taking the bubble displacement into consideration. This concludes bubble displacement is an efficient aspect in this estimation. In three-dimensional trajectory estimation, bubbles were visually captured moving along the flow generated by an arm; which means an efficient material for swimmers to swim faster.
-
In recent years, the field of synthesizing voice has been developed rapidly, and the technologies such as reading aloud an email or sound guidance of a car navigation system are used in various scenes of our life. The sound quality is monotonous like reading news. It is preferable for a text such as a novel to be read by the voice that expresses emotions wealthily. Therefore, we have been trying to develop a system reading aloud novels automatically that are expressed clear emotions comparatively such as juvenile literature. At first it is necessary to identify emotions expressed in a sentence in texts in order to make a computer read texts with an emotionally expressive voice. A method on the basis of the meaning interpretation that utilized artificial intelligence technology for a method to specify emotions of texts is thought, but it is very difficult with the current technology. Therefore, we propose a method to determine only emotion every sentence in a novel by a simpler way. This method determines the emotion of a sentence according to an emotion that words such as a verb in a Japanese verb sentence, and an adjective and an adverb in a adjective sentence, have. The emotional characteristics that these words have are prepared beforehand as a emotional words dictionary by us. The emotions used here are seven types: "joy," "sorrow," "anger," "surprise," "terror," "aversion" or "neutral."
-
In this research, we propose a method to extract scallop areas in seabed images in order to construct a system that can measure automatically the number, size, and state of fishery resources, especially scallops, by analyzing seabed images. Our algorithm is based on information about the hue, characteristic pattern, and the shape of scallop shells. The effectiveness of the proposed method is illustrated through an experiment.
-
We propose a novel method of procedurally generating climbing plants using L-systems. The goal of this research is to generate geometry for 3D-modelers, where procedurally generated content is used as a base for the final design. The algorithm is fast and efficiently simulates external tropisms such as gravitropism and heliotropism, as well pseudo-tropisms. The structure of the generated climbing plants is discretized into strings of particles expressed using L-systems. The tips of the plant extend the branches by adding particles in its path, forming internodes. A climbing heuristic has been developed that uses the environment as leverage when the plant is climbing, and effectively covers objects on which it grows. A fast method that sprouts leaves on the surface on which the plant is growing has also been developed, along with a heuristic that simulates the decrease in length, radius and leaf size.
-
In the case of projection type display, it needs to use the screen in order to project the image clearly and wide viewing angle. We have been developing the step in type display system using the smoke screen. However, the image with smoke screen was flickered by gravity and air flow. Then we considered to reduce the flicker of the image and we found that flicker can be reduced and viewing angle becomes more large. This time we report the large viewing angle step in type display system using screen made up with very small particle size smoke and flow controlled nozzle. Hence, at first we considered the most suitable particle for the screen and then the shape of screen and then we constructed the array of flow controlled smoke screen. By the results of experiment we could get considerably high contrast flicker-less image and get the viewing angle more than
$60^{\circ}$ by this flow controlled nozzle attached new type smoke screen and make clear the efficiency of this method. -
We introduce an implementation of plug-ins on PLUTO. These plug-ins discriminate inflammatory nodules from other types of nodules in chest X-ray CT images. The PLUTO is a common platform for computer-aided diagnosis systems on Microsoft Windows series and it is easy to add new functions as plug-ins. We coded two plug-ins. One of the them calculates features based on medical knowledge. The other plug-in calculates parameters to classify the type of nodules, and it also classifies nodules into inflammatory nodules and others using SVM. These plug-ins are coded using MIST library which is produced at Nagoya University, Japan. In our previous study, the MIST library was parallelized, so that we can utilize a number of CPUs to calculate features and SVM learning/classifying depending on the amount of computation. Using these plug-ins, it became easy to extract features to discriminate inflammatory nodules from other types of nodules and to change parameters for feature extraction and SVM learning/classifying with GUI interface. The accuracy of the classifying result is 100% with 78 solid nodules which contains 43 inflammatory nodules and 35 other type of nodules.
-
We have been studying the next generation of video creation solution based on TVML (TV program Making Language) technology. TVML is a well-known scripting language for computer animation and a TVML Player interprets the script to create video content using real-time 3DCG and synthesized voices. TVML has a long history proposed back in 1996 by NHK, however, the only available Player has been the one made by NHK for years. We have developed a new TVML Player from scratch and named it T2V (Text-To-Vision) Player. Due to the development from scratch, the code is compact, light and fast, and extendable and portable. Moreover, the new T2V Player performs not only a playback of TVML script but also a Text-To-Vision conversion from input written in XML format or just a mere plane text to videos by using 'Text-filter' that can be added as a plug-in of the Player. We plan to make it public as freeware from early 2009 in order to stimulate User-Generated-Content and a various kinds of services running on the Internet and media industry. We think that our T2V Player would be a key technology for upcoming new movement.
-
The bobsleigh is a winter sport which use a sled to slide down an ice-covered course. There is a big expectation for having a training environment and being able to train year round. At present, training is very limited due to the season or course facilities. A variety of VR (Virtual Reality) equipment has been developed in recent years, and it is beginning to spread. We have also made our contribution in bobsleigh simulation. The reactive force applied in our bobsleigh simulation is much smaller than that of a real bobsleigh. This paper proposes a method to enhance reactive force of bobsleigh simulation in real time. The reactive force is magnified instantly in the physically-based simulation. The Laplacian filter is applied to the sequence of reactive force, this technique is often used in the field of image processing. The simulation is comprised of four large scale surround screens and a 6-D.O.F. (Degree Of Freedom) motion system. We also conducted an experiment with several motion patterns to evaluate the effectiveness of enhancement. The experimental results proved useful in some cases.
-
Robovie-R2 [1], developed by ATR, is a 110cm high, 60kg weight, two wheel drive, human like robot. It has two arms with dynamic fingers. It also has a position sensitive detector sensor and two cameras as eyes on his head for recognizing his surrounding environment. Recent years, we have carried out a project to integrate new functions into Robovie-R2 so as to make it possible to be used in a dining room in healthcare center for helping serving meal for elderly. As a new function, we have developed software system for adaptive movement control of Robovie-R2 that is primary important since a robot that cannot autonomously control its movement would be a dangerous object to the people in dining room. We used the cameras on Robovie-R2's head to catch environment images, applied our original algorithm for recognizing obstacles such as furniture or people, so as to control Roboie-R2's movement. In this paper, we will focus our algorithm and its results.
-
This paper discusses a searching method for special markers attached with persons in a surveillance video stream. The marker is a small plate with infrared LEDs, which is called a spatiotemporal marker because it shows a 2-D sequential pattern synchronized with video frames. The search is based on the motion vectors which is the same as one in video compression. The experiments using prototype markers show that the proposed method is practical. Though the method is applicable to a video stream independently, it can decrease total computation cost if motion vector analyses of a video compression and the proposed method is unified.
-
In landscape simulatin, it is necessary to express very realistic image generated by computer graphics. One solution is to use texture mapping; however, it needs a lot of work and time to obtain images for texture mapping since there are huge variety of images for buildings, roads, stations and so on, and the landscape image is diverse due to the weather and time. Especially, weathered images such as stain on walls, crack on roads and so forth, are needed to make the landscape image very realistic. These weathered images do not have to be strict so that it saves a lot of work and time for obtaining the images for texture mapping if we can generate a variety of weathered images automatically. Therefore, this paper describes how to generate a variety of weathered images automatically by changing the weathered shape of the original image.
-
A method to visualize human body is proposed for various human pose. The method affords three 3D-styles of the same body: firstly, one which wares clothes specified from pattern of dresses, second, body shape, lastly bone structure of body. For this objective, standard body data are prepared which is constructed from CT images. Individual body is measured by 3D body scanner. The present status of our research is limited to offer still images, though we are engaged to accommodate various poses.
-
In this paper, we examine the fundamental performance of image coding schemes based on multipulse model. First, we introduce several kinds of pulse search methods (i.e., correlation method, pulse overlap search method and pulse amplitude optimization method) for the model. These pulse search methods are derived from auto-correlation function of impulse responses and cross-correlation function between host signals and impulse responses. Next, we explain the basic procedure of multipulse image coding scheme, which uses the above pulse search methods in order to encode the high frequency component of an original image. Finally, by means of computer simulation for some test images, we examine the PSNR(Peak Signal-to-Noise Ratio) and computational complexity of these methods.
-
Deblocking Filter (DF) is newly introduced into H.264/AVC to remove blocky artifacts. It improves the picture quality and the improved picture is set to the frame buffer for motion compensation. As a result, higher coding efficiency is achieved by DF. However, if the original image has heavily-slanted patterns, DF removes the edges to be kept because it is applied only perpendicularly to the block boundaries. In this paper, we propose Edge Adaptive Deblocking Filter (EADF) which is applied not only for the perpendicular but also for several slanted directions to deal with the problem. Simulation results showed us that EADF was especially effective for the sequence "Foreman" with PSNR gain of 0.04 dB.
-
Augmented Reality (AR) is a useful technology for various industrial systems. This paper suggests a new playground system which uses markerless AR technology. We developed a virtual playground system that can learn physics and kinematics from the physical play of people. The virtual playground is a space in which real scenes and CG are mixed. As for the CG objects, physics of the real world is used. This is realized by a physics engine. Therefore it is necessary to analyze information from cameras, so that CG reflects the real world. Various games options are possible using real world images and physics simulation in the virtual playground. We think that the system is effective for education. Because CG behaves according to physics simulation, users can learn physics and kinematics from the system. We think that the system can take its place in the field of education through entertainment.
-
This paper proposes a novel lossless coding scheme for Bayer color filter array (CFA) images which are generally used as internal data of color digital cameras having a single image sensor. The scheme employs a block-adaptive prediction method to exploit spatial and spectral correlations in local areas containing different color signals. In order to allow adaptive prediction suitable for the respective color signals, four kinds of linear predictors which correspond to 2
${\times}$ 2 samples of Bayer CFA are simultaneously switched block-by-block. Experimental results show that the proposed scheme outperforms other state-of-the-art lossless coding schemes in terms of coding efficiency for Bayer CFA images.