• Title/Summary/Keyword: 정규화 텍스트

Search Result 36, Processing Time 0.024 seconds

Pet Behavior Detection through Sensor Data Synthesis (센서 데이터 합성을 통한 반려동물 행동 감지)

  • Kim, Hyungju;Park, Chan;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.606-608
    • /
    • 2022
  • 센서 데이터를 활용한 행동 감지 연구는 인간 행동 인식을 선행연구로 진행되었으며, 인식의 정확도를 높이기 위해 전처리, 보간, 증강 등을 통한 연구가 활발히 진행되고 있다. 이에 본 논문에서는 시계열 센서 데이터 증강을 통하여 반려동물의 행동 감지를 제안한다. ODROID 단일 보드 컴퓨터와 6축 센서(가속도, 자이로) 데이터를 탑재한 소형 디바이스를 사용하여 블루투스 통신을 통해 웹 서버 DB에 저장한다. 저장된 데이터는 이상치, 결측치 처리 후 정규화를 통해 시퀀스를 구성하는 전처리 과정을 거친다. 이후 GAN을 기반으로 한 시계열 데이터 증강을 진행한다. 이때, 데이터 증강은 입력된 텍스트에 따라 센서 데이터로 변환하여 데이터를 증강한다. 학습된 딥러닝 모델을 바탕으로 행동을 감지 후 평가 지표에 따라 모델 성능을 검증한다.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

A Study on Identifying Personal Information on Conversational Text Data (대화형 텍스트 데이터 내 개인정보 식별에 대한 연구)

  • Cha, Do Hyun;Kown, Bo Keun;Youn, Hee Chang;Lee, Gu Hyup;Joo, Jong Wha J.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.11-13
    • /
    • 2022
  • 데이터 3 법을 필두로, 기업은 개인정보가 포함된 데이터를 활용하기 위해 비식별 처리가 필요하게 되었다. 기존 방식은, 비정형 텍스트 데이터에서 정규표현식을 통한 개인정보 식별은 데이터의 다양성에 의해 한계가 명확하며, 기존의 Named Entity Recognition(NER) 태스크로 해결하기에는 언어의 중의적 표현과 2 인 대화에서 나타나는 개인정보가 누구의 것인지 판단하지 못한다는 한계가 존재한다. 따라서 우리는 기존의 한계점을 극복하고 개선하기 위해 BERT 언어 모델에 화자 정보를 학습시키고, 하나의 어절에 2 개의 tag 를 labeling 하는 방법을 제안하여 정확한 개인정보 식별을 시도하였다.

A Method for the Detection of an Open/Closed Eye and a Pupil using Black and White Bipolarization (흑백 양극화를 이용한 눈의 개폐 및 눈동자 검출 방법)

  • Moon, Bong-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.12
    • /
    • pp.89-96
    • /
    • 2009
  • A lot of information is contained in an image or a movie rather than in a text, and it is very important thing to extract context from them. In this study, we propose a method to detect an open/closed eye and determine the location of a pupil in an eye image which is extracted from a movie. The image is normalized using transformation into bipolarization with white and black color and horizontalizing, and we measure width and height of an eye. With these information, we can determine the open or closed eye and the location of the pupil. Experiments were done with 52 images of eyes from movies using this method, and we get good results with 98% of correctness in detection of open/closed eyes and 95% in detection of pupil's location.

A Study on the Visual System of Object - Oriented Based on Abstract Information (객체지향을 기반으로한 추상화 정보의 시각화 시스템에 대한 연구)

  • Kim, Haeng-Kon;Han, Eun-Ju;Chung, Youn-Ki
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2434-2444
    • /
    • 1997
  • As software industry progresses, the necessity of visual information have increased more than text-oriented information. So, automatic tools are required to satisfy a user's desire for visual design representation of various source information in the real-world. In this paper, we discuss the methodology and tools for parsing abstract information through semantic analysis and extracting visual information through visual mapping. Namely, as to abstract informations are represented as relational structure and then mapped into visual structure using regular rule, user can obtain visual information. We suggest VOLS(Visual Object Layout System) to transform a abstract information to visual information. It can improve user understandability and assist a maintenance for existing source code.

  • PDF

Object Detection and Optical Character Recognition for Mobile-based Air Writing (모바일 기반 Air Writing을 위한 객체 탐지 및 광학 문자 인식 방법)

  • Kim, Tae-Il;Ko, Young-Jin;Kim, Tae-Young
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.5
    • /
    • pp.53-63
    • /
    • 2019
  • To provide a hand gesture interface through deep learning in mobile environments, research on the light-weighting of networks is essential for high recognition rates while at the same time preventing degradation of execution speed. This paper proposes a method of real-time recognition of written characters in the air using a finger on mobile devices through the light-weighting of deep-learning model. Based on the SSD (Single Shot Detector), which is an object detection model that utilizes MobileNet as a feature extractor, it detects index finger and generates a result text image by following fingertip path. Then, the image is sent to the server to recognize the characters based on the learned OCR model. To verify our method, 12 users tested 1,000 words using a GALAXY S10+ and recognized their finger with an average accuracy of 88.6%, indicating that recognized text was printed within 124 ms and could be used in real-time. Results of this research can be used to send simple text messages, memos, and air signatures using a finger in mobile environments.

A Study on the Construction of Specialized NER Dataset for Personal Information Detection (개인정보 탐지를 위한 특화 개체명 주석 데이터셋 구축 및 분류 실험)

  • Hyerin Kang;Li Fei;Yejee kang;Seoyoon Park;Yeseul Cho;Hyeonmin Seong;Sungsoon Jang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.185-191
    • /
    • 2022
  • 개인정보에 대한 경각심 및 중요성 증대에 따라 텍스트 내 개인정보를 탐지하는 태스크가 주목받고 있다. 본 연구에서는 개인정보 탐지 및 비식별화를 위한 개인정보 특화 개체명 태그셋 7개를 고안하는 한편 이를 바탕으로 비식별화된 원천 데이터에 가상의 데이터를 대치하고 개체명을 주석함으로써 개인정보 특화 개체명 데이터셋을 구축하였다. 개인정보 분류 실험에는 KR-ELECTRA를 사용하였으며, 실험 결과 일반 개체명 및 정규식 바탕의 규칙 기반 개인정보 탐지 성능과 비교하여 특화 개체명을 활용한 딥러닝 기반의 개인정보 탐지가 더 높은 성능을 보임을 확인하였다.

  • PDF

A Study on Image Indexing Method based on Content (내용에 기반한 이미지 인덱싱 방법에 관한 연구)

  • Yu, Won-Gyeong;Jeong, Eul-Yun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.903-917
    • /
    • 1995
  • In most database systems images have been indexed indirectly using related texts such as captions, annotations and image attributes. But there has been an increasing requirement for the image database system supporting the storage and retrieval of images directly by content using the information contained in the images. There has been a few indexing methods based on contents. Among them, Pertains proposed an image indexing method considering spatial relationships and properties of objects forming the images. This is the expansion of the other studies based on '2-D string. But this method needs too much storage space and lacks flexibility. In this paper, we propose a more flexible index structure based on kd-tree using paging techniques. We show an example of extracting keys using normalization from the from the raw image. Simulation results show that our method improves in flexibility and needs much less storage space.

  • PDF

A Comparative Analysis of Content-based Music Retrieval Systems (내용기반 음악검색 시스템의 비교 분석)

  • Ro, Jung-Soon
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.3
    • /
    • pp.23-48
    • /
    • 2013
  • This study compared and analyzed 15 CBMR (Content-based Music Retrieval) systems accessible on the web in terms of DB size and type, query type, access point, input and output type, and search functions, with reviewing features of music information and techniques used for transforming or transcribing of music sources, extracting and segmenting melodies, extracting and indexing features of music, and matching algorithms for CBMR systems. Application of text information retrieval techniques such as inverted indexing, N-gram indexing, Boolean search, truncation, keyword and phrase search, normalization, filtering, browsing, exact matching, similarity measure using edit distance, sorting, etc. to enhancing the CBMR; effort for increasing DB size and usability; and problems in extracting melodies, deleting stop notes in queries, and using solfege as pitch information were found as the results of analysis.