통합 검색 | Korea Science

Sentence design for speech recognition database

Zu Yiqing
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 1996년도 10월 학술대회지
- /
- pp.472-472
- /
- 1996
The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]
PDF

도시농업 활동 유형화 연구 (Segmentation and Characteristic Analysis of Urban Farmers Behavior)

황정임;최윤지;장보경;이상영
- 한국지역사회생활과학회지
- /
- 제21권4호
- /
- pp.619-631
- /
- 2010
The purpose of this study is to segment and examine urban farmers behavior by applying a two-step cluster analysis and multi-nominal logit model. The data were collected by a telephone survey with two-staged stratified random sampling in the cities around the country for the purpose of acquiring representative data. Respondents were asked to describe their awareness of urban agriculture, their agricultural activity, and sociodemographic characteristics. Among 2,000 cases, 381 cases(19.1%) which were of participants in urban agriculture were analysed in SPSS. From the findings, 27.3% of respondents had heard the word 'urban agriculture', and 25.5% of them regarded themselves as urban farmers. Four different clusters were derived from two-step clusters based on motive, place, companion, area and hours. They were 'Large scale hobby farming(cluster 1)', ‘Weekend farm/ hobby farming(cluster 2)', 'Land/ Self-supporting farming(cluster 3)', and 'Small scale hobby farming(cluster 4)'. The result of multinomial logistic regression showed that there were significant differences among these four segmented groups in terms of age, city size and housing type. In other words, there is quite a possibility that urbanites select different urban farming types according to their socio-demographic profiles. Therefore, the urbanite profiles can be used as the basis for promoting policy of several urban agriculture types. According to the result, policy directions for facilitating urban agriculture were presented.
PDF KSCI

음성인식 기반 응급상황관제 (Emergency dispatching based on automatic speech recognition)

이규환;정지오;신대진;정민화;강경희;장윤희;장경호
- 말소리와 음성과학
- /
- 제8권2호
- /
- pp.31-39
- /
- 2016
In emergency dispatching at 119 Command & Dispatch Center, some inconsistencies between the 'standard emergency aid system' and 'dispatch protocol,' which are both mandatory to follow, cause inefficiency in the dispatcher's performance. If an emergency dispatch system uses automatic speech recognition (ASR) to process the dispatcher's protocol speech during the case registration, it instantly extracts and provides the required information specified in the 'standard emergency aid system,' making the rescue command more efficient. For this purpose, we have developed a Korean large vocabulary continuous speech recognition system for 400,000 words to be used for the emergency dispatch system. The 400,000 words include vocabulary from news, SNS, blogs and emergency rescue domains. Acoustic model is constructed by using 1,300 hours of telephone call (8 kHz) speech, whereas language model is constructed by using 13 GB text corpus. From the transcribed corpus of 6,600 real telephone calls, call logs with emergency rescue command class and identified major symptom are extracted in connection with the rescue activity log and National Emergency Department Information System (NEDIS). ASR is applied to emergency dispatcher's repetition utterances about the patient information. Based on the Levenshtein distance between the ASR result and the template information, the emergency patient information is extracted. Experimental results show that 9.15% Word Error Rate of the speech recognition performance and 95.8% of emergency response detection performance are obtained for the emergency dispatch system.
https://doi.org/10.13064/KSSS.2016.8.2.031 인용 PDF KSCI

SNS 이용자의 가치체계의 특징이 SNS 이용동기, 사회적 자본, 이용행위 등에 미치는 영향 분석 (Investigating the Effect of Value Characteristics of SNS Users on SNS Usage Motivation, Social Capital, and Usage Behavior)

조형오
- 디지털콘텐츠학회 논문지
- /
- 제19권2호
- /
- pp.351-362
- /
- 2018
본 연구에서는 Schwartz의 가치체계 이론(1992)에 근거하여 개인적 가치가 SNS 이용자의 동기적 특성 및 행태적 반응에 미치는 효과를 분석하였다. 본 조사 결과 크게 '개방성,' '상호호혜성,' '자기 향상성,' '규범 준수성,' '안전성'의 5가지 가치차원이 주요하였으며, 이들 각 가치차원은 SNS 이용동기, 사회적 자본, 광고반응, 구전의도 등에 차별적 영향을 미치는 것으로 나타났다. 또한 이들 제 가치차원을 중심으로 SNS 이용자들을 군집화한 결과 '경험 추구형,' '상호의존적 공감형,' '자기 향상형,' '규범 준수형'의 4개의 가치집단으로 세분화되었다. 각 가치집단은 SNS 인식 및 행태적 특성에 있어서 차별적 특징을 보여주었을 뿐만 아니라, SNS 서비스 유형과도 체계적인 관련성을 갖는 것으로 나타났다. 본 연구의 결과 Schwartz의 가치체계이론은 SNS 이용과 관련된 심리적 기제를 이해하는데 매우 유용한 틀로 활용될 수 있음을 보여주었다.
https://doi.org/10.9728/dcs.2018.19.2.351 인용 PDF KSCI

Synthetic data augmentation for pixel-wise steel fatigue crack identification using fully convolutional networks

Zhai, Guanghao;Narazaki, Yasutaka;Wang, Shuo;Shajihan, Shaik Althaf V.;Spencer, Billie F. Jr.
- Smart Structures and Systems
- /
- 제29권1호
- /
- pp.237-250
- /
- 2022
Structural health monitoring (SHM) plays an important role in ensuring the safety and functionality of critical civil infrastructure. In recent years, numerous researchers have conducted studies to develop computer vision and machine learning techniques for SHM purposes, offering the potential to reduce the laborious nature and improve the effectiveness of field inspections. However, high-quality vision data from various types of damaged structures is relatively difficult to obtain, because of the rare occurrence of damaged structures. The lack of data is particularly acute for fatigue crack in steel bridge girder. As a result, the lack of data for training purposes is one of the main issues that hinders wider application of these powerful techniques for SHM. To address this problem, the use of synthetic data is proposed in this article to augment real-world datasets used for training neural networks that can identify fatigue cracks in steel structures. First, random textures representing the surface of steel structures with fatigue cracks are created and mapped onto a 3D graphics model. Subsequently, this model is used to generate synthetic images for various lighting conditions and camera angles. A fully convolutional network is then trained for two cases: (1) using only real-word data, and (2) using both synthetic and real-word data. By employing synthetic data augmentation in the training process, the crack identification performance of the neural network for the test dataset is seen to improve from 35% to 40% and 49% to 62% for intersection over union (IoU) and precision, respectively, demonstrating the efficacy of the proposed approach.
https://doi.org/10.12989/sss.2022.29.1.237 인용 KSCI

한국어 음성의 스펙트럼 변화에 관한 연구 (A Study on the Spectrum Variation of Korean Speech)

이수길;송정영
- 인터넷정보학회논문지
- /
- 제6권6호
- /
- pp.179-186
- /
- 2005
음성학에서 음성이 가지고 있는 주파수 특성을 이용하여 스펙트럼을 추출할 수 있고 이를 이용하여 음성을 분석할 수 있다. 그러나 음성의 스펙트럼은 단모음의 경우 어느 정도 일정한 형태를 유지하지만 음절. 단어 등과 같이 자음과 모음이 서로 결합되었을 때는 상당한 변화가 발생된다. 이는 음소단위 음성인식에 있어서 가장 큰 장애가 되고 있다. 본 논문에서는 주파수 영역과 청각적 인상을 고려한 멜 대역 그리고 멜 켑스트럼을 이용하여 각 자음과 모음이 가지고 있는 스펙트럼을 분석하고, 청각적 특성을 반영한 음성의 변화를 체계화하여 음성을 음소단위로 분할할 수 있는 기반을 제공한다.
PDF

동영상에서 모양 시퀀스를 이용한 동작 검색 방법 (Movement Search in Video Stream Using Shape Sequence)

최민석
- 한국멀티미디어학회논문지
- /
- 제12권4호
- /
- pp.492-501
- /
- 2009
동영상에서 객체의 동작 정보는 장면의 내용을 분류하고 구분하는 중요한 정보로 이용될 수 있다. 본 논문에서는 동영상에서 객체의 동작을 효과적으로 찾기 위한 모양기반 동작 검색 방법을 제안한다. 객체의 동작 정보는 동영상 프레임에서 객체 영역을 추출하여 연속된 2차원 모양 정보로 표현되고, 각각의 2차원 모양 정보는 모양 기술자를 이용하여 1차원 모양 특정값으로 변환된다. 순서에 따라 나열된 모양 기술자 시퀀스를 이용하여 개별 동작의 분할 과정 없이 문서에서 단어를 검색하듯이 동영상에서 객체의 동작을 검색할 수 있다. MPEG-7 모양 변화 기술자와의 성능 비교 실험을 통하여 제안된 방법이 객체의 동작 정보를 보다 효과적으로 표현할 수 있으며, 동작 검색 및 분석 응용에 적용할 수 있음을 보였다.
PDF

연역적이고 국부적인 영문자의 폰트 분류법 ($\emph{A Priori}$ and the Local Font Classification)

정민철
- 한국산학기술학회논문지
- /
- 제3권4호
- /
- pp.245-250
- /
- 2002
본 연구에서는 영문 단어로부터 폰트를 분류하기 위해 연역적이고 국부적인 폰트 분류 방법을 제안한다. 이는 문자 인식 전에 한 단어의 폰트를 분류하는 것을 말한다. 폰트 분류를 위해 활자 특성인 Ascender, Descender와 Serif가 사용된다. 입력 단어로부터 Ascender, Descender 와 Serif가 추출되어 경사도 특징 벡터가 추출되고, 그 특징 벡터는 인공 신경망에 의해 입력 단어에 대한 폰트 스타일, 폰트 그룹, 폰트 이름이 분류된다. 제안된 연역적이고 국부적인 폰트 분류 방법은 폰트 정보가 문자 분할기와 문자 인식기에 사용될 수 있게 한다. 나아가, 특정 폰트에 따른 Mono-Font 문자 분할기와 Mono-Font 문자 인식기로 구성되는 OCR 시스템을 구성할 수 있는 것을 가능하게 한다.
PDF

영상 대 영상 매칭을 이용한 한글 문서 영상에서의 단어 검색 (Keyword Spotting on Hangul Document Images Using Image-to-Image Matching)

박상철;손화정;김수형
- 정보처리학회논문지B
- /
- 제12B권3호
- /
- pp.357-364
- /
- 2005
본 논문에서는 두 단계 이미지 매칭을 이용하여 한글 문서영상에서 사용자 검색어를 빠르고 정확하게 검색할 수 있는 시스템을 제안한다. 본 시스템은 문자 분리, 검색어 영상 생성, 특징 추출 그리고 이미지 매칭 과정으로 구성된다. 매칭 과정에서 차원이 다른 두 가지 특징 벡터를 이용한다. 8쪽 분량의 문서 영상을 한국정보과학회 웹사이트에서 다운로드하였고, 그 문서로부터 1600개의 한글단어 영상을 획득하여 실험데이터로 사용하였다 그 결과 제안한 시스템은 기존에 제안된 영상-기반 한글 단어 검색 시스템보다 성능이 크게 향상되었음을 알 수 있었다.
https://doi.org/10.3745/KIPSTB.2005.12B.3.357 인용 PDF KSCI

소셜 뉴스를 위한 시간 종속적인 메타데이터 기반의 컨텍스트 공유 프레임워크 (Context Sharing Framework Based on Time Dependent Metadata for Social News Service)

가명현;오경진;홍명덕;조근식
- 지능정보연구
- /
- 제19권4호
- /
- pp.39-53
- /
- 2013
인터넷의 발달과 SNS의 등장으로 정보흐름의 방식이 크게 바뀌었다. 이러한 변화에 따라 소셜 미디어가 급부상하고 있으며 소셜 미디어와 비디오 콘텐츠가 융합된 소셜 TV, 소셜 뉴스의 중요성이 강조되고 있다. 이러한 환경 속에서 사용자들은 단순히 콘텐츠를 탐색만 하는 것이 아니라 같은 콘텐츠를 이용하고 있는 친구들이나 지인들과 콘텐츠에 대한 정보나 경험들을 공유하고 더 나아가 새로운 콘텐츠를 만들어내기도 한다. 하지만 기존의 소셜 뉴스에서는 이러한 사용자들의 특성을 반영해 주지 못하고 있다. 특히 이용자들의 참여성만을 고려하고 있어서 서비스간의 차별화가 어렵고 뉴스 콘텐츠에 대한 정보나 경험 공유 시 컨텍스트 공유가 어렵다는 문제가 있다. 이를 해결하기 위해 본 논문에서는 뉴스를 내용별로 분할하고 분할된 뉴스에서 추출된 시간 종속적인 메타데이터를 제공하는 프레임워크를 제안한다. 제안하는 프레임워크에서는 스토리 분할 방법을 이용하여 뉴스 대본을 내용별로 분할한다. 또한 뉴스 전체내용을 대표하는 태그, 분할된 뉴스를 나타내는 서브 태그, 분할된 뉴스가 비디오에서 시작하는 위치 즉, 시간 종속적인 메타데이터를 제공한다. 소셜 뉴스 이용자들에게 시간 종속적인 메타데이터를 제공한다면 이용자들은 전체의 뉴스 내용 중에 자신이 원하는 부분만을 탐색 할 수 있으며 이 부분에 대한 견해를 남길 수 있다. 그리고 뉴스의 전달이나 의견 공유 시 메타데이터를 함께 전달함으로써 전달하고자 하는 내용에 바로 접근이 가능하며 프레임워크의 성능은 추출된 서브 태그가 뉴스의 실제 내용을 얼마나 잘 나타내 주느냐에 따라 결정된다. 그리고 서브 태그는 스토리 분할의 정확성과 서브 태그를 추출하는 방법에 따라 다르게 추출된다. 이 점을 고려하여 의미적 유사도 기반의 스토리 분할 방법을 프레임워크에 적용하였고 벤치마크 알고리즘과 성능 비교 실험을 수행하였으며 분할된 뉴스에서 추출된 서브 태그들과 실제 뉴스의 내용을 비교하여 서브 태그들의 정확도를 분석하였다. 결과적으로 의미적 유사도를 고려한 스토리 분할 방법이 더 우수한 성능을 보였으며 추출된 서브 태그들도 컨텍스트와 관련된 단어들이 추출 되었다.
https://doi.org/10.13088/jiis.2013.19.4.039 인용 PDF KSCI

검색결과 135건 처리시간 0.046초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)