• Title/Summary/Keyword: Text detection

Search Result 402, Processing Time 0.025 seconds

Design and Implementation of Automated Detection System of Personal Identification Information for Surgical Video De-Identification (수술 동영상의 비식별화를 위한 개인식별정보 자동 검출 시스템 설계 및 구현)

  • Cho, Youngtak;Ahn, Kiok
    • Convergence Security Journal
    • /
    • v.19 no.5
    • /
    • pp.75-84
    • /
    • 2019
  • Recently, the value of video as an important data of medical information technology is increasing due to the feature of rich clinical information. On the other hand, video is also required to be de-identified as a medical image, but the existing methods are mainly specialized in the stereotyped data and still images, which makes it difficult to apply the existing methods to the video data. In this paper, we propose an automated system to index candidate elements of personal identification information on a frame basis to solve this problem. The proposed system performs indexing process using text and person detection after preprocessing by scene segmentation and color knowledge based method. The generated index information is provided as metadata according to the purpose of use. In order to verify the effectiveness of the proposed system, the indexing speed was measured using prototype implementation and real surgical video. As a result, the work speed was more than twice as fast as the playing time of the input video, and it was confirmed that the decision making was possible through the case of the production of surgical education contents.

An Extension of Data Flow Analysis for Detecting Polymorphic Script Virus (다형성 스크립트 바이러스 탐지를 위한 자료 흐름 분석기법의 확장)

  • Kim, Chol-Min;Lee, Hyoung-Jun;Lee, Seong-Uck;Hong, Man-Pyo
    • The KIPS Transactions:PartC
    • /
    • v.10C no.7
    • /
    • pp.843-850
    • /
    • 2003
  • Script viruses are easy to make a variation because they can be built easily and be spread in text format. Thus signature-based method has a limitation in detecting script viruses. In a consequence, many researches suggest simple heuristic methods, but high false-positive error is always being an obstacle. In order to overcome this problem, our previous study concentrated on analyzing data flow of codes and has low-false positive error, but still could not detect a polymorphic virus because polymorphic virus loads self body and changes it before make a descendent. We suggest a heuristic detection method which expands the detection range of previous method to include polymorphic script viruses. Expanded data flow analysis heuristic has an expanded grammar to detect Polymorphic copy Propagation. Finally, we will show the experimental result for the effectiveness of suggested method.

Monitoring System for the Elderly Living Alone Using the RaspberryPi Sensor (라즈베리파이 센서를 활용한 독거노인 모니터링 시스템)

  • Lee, Sung-Hoon;Lee, June-Yeop;Kim, Jung-Sook
    • Journal of Digital Contents Society
    • /
    • v.18 no.8
    • /
    • pp.1661-1669
    • /
    • 2017
  • In 2017, Korea has reached 1.3 million elderly people living alone. The government is promoting the basic care service for the elderly by using care workers to check the security of the elderly living alone. However, due to lack of service personnel and service usage rate of elderly care workers, it is difficult to manage. To improve these environmental constraints, this study attempted to construct a monitoring system for elderly people living alone by using sensors such as temperature, humidity, motion detection, and gas leak detection. The sensor periodically collects the current status data of the elderly and sends them to the server, creates a real time graph based on the data, and monitors it through the web. In the monitoring process, when the sensor is out of the range of the specified value, it sends a warning text message to the guardian to inform the current situation, and is designed and implemented so as to support the safety life of the elderly living alone.

Seal Detection in Scanned Documents (스캔된 문서에서의 도장 검출)

  • Yu, Kyeonah;Kim, Kyung-Hye
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.65-73
    • /
    • 2013
  • As the advent of the digital age, documents are often scanned to be archived or to be transmitted over the network. The largest proportion of documents is texts and the next is seal images indicating the author of the documents. While a lot of research has been conducted to recognize texts in scanned documents and commercialized text recognizing products are developed as highlighted the importance of the scanned document, information about seal images is discarded. In this paper, we study how to extract the seal image area from the color or black and white document containing the seal image and how to save the seal image. We propose a preprocessing step to remove other components except for the candidate outlines of the seal imprint from scanned documents and a method to select the final region of interest from these candidates by using the feature of seal images. Also in case of a seal imprint overlapped with texts, the most similar image among those stored in the database is selected through the template matching process. We verify the implemented system for a various type of documents produced in schools and analyze the results.

Development of Hand-drawn Clothing Matching System Based on Neural Network Learning (신경망 모델을 이용한 손그림 의류 매칭 시스템 개발)

  • Lim, Ho-Kyun;Moon, Mi-Kyeong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.6
    • /
    • pp.1231-1238
    • /
    • 2021
  • Recently, large online shopping malls are providing image search services as well as text or category searches. However, in the case of an image search service, there is a problem in that the search service cannot be used in the absence of an image. This paper describes the development of a system that allows users to find the clothes they want through hand-drawn images of the style of clothes when they search for clothes in an online clothing shopping mall. The hand-drawing data drawn by the user increases the accuracy of matching through neural network learning, and enables matching of clothes using various object detection algorithms. This is expected to increase customer satisfaction with online shopping by allowing users to quickly search for clothing they are looking for.

A Study on Fraud Detection in the C2C Used Trade Market Using Doc2vec

  • Lim, Do Hyun;Ahn, Hyunchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.173-182
    • /
    • 2022
  • In this paper, we propose a machine learning model that can prevent fraudulent transactions in advance and interpret them using the XAI approach. For the experiment, we collected a real data set of 12,258 mobile phone sales posts from Joonggonara, a major domestic online C2C resale trading platform. Characteristics of the text corresponding to the post body were extracted using Doc2vec, dimensionality was reduced through PCA, and various derived variables were created based on previous research. To mitigate the data imbalance problem in the preprocessing stage, a complex sampling method that combines oversampling and undersampling was applied. Then, various machine learning models were built to detect fraudulent postings. As a result of the analysis, LightGBM showed the best performance compared to other machine learning models. And as a result of SHAP, if the price is unreasonably low compared to the market price and if there is no indication of the transaction area, there was a high probability that it was a fraudulent post. Also, high price, no safe transaction, the more the courier transaction, and the higher the ratio of 0 in the price also led to fraud.

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

A Study of the Improvement Method of I-pin Mass Illegal Issue Accident (아이핀 대량 부정발급 사고에 대한 개선방법 연구)

  • Lee, Younggyo;Ahn, Jeonghee
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.2
    • /
    • pp.11-22
    • /
    • 2015
  • The almost of Web page has been gathered the personal information(Korean resident registration number, name, cell-phone number, home telephone number, E-mail address, home address, etc.) using the membership and log-in. The all most user of Web page are concerned for gathering of the personal information. I-pin is the alternative means of resident registration number and has been used during the last ten-year period in the internet. The accident of I-pin mass illegal issue was happened by hacker at February, 2015. In this paper, we analysis the problems of I-pin system about I-pin mass illegal issue accident and propose a improvement method of it. First, I-pin issue must be processed by the off-line of face certification in spite of user's inconvenience. Second, I-pin use must be made up through second certification of password or OTP. The third, the notification of I-pin use must be sent to the user by the text messaging service of cell-phone or the E-mail. The forth, I-pin must be used an alternative means of Korean resident registration number in Internet. The methods can reduce the problems of I-pin system.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Continuous Issue Event Analysis in Social Media (소셜미디어에 나타난 연속성 이슈 이벤트 분석)

  • Oh, Hyo-Jung;Kim, Hyunki;Yun, Bo-Hyun
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.2
    • /
    • pp.31-38
    • /
    • 2014
  • This paper reveals continuity of related events which are occurred and changing from moment to moment accident/events collected from various social media channels. Among them, we especially define the events which have big social influence as "issue event" and investigate the type and characteristics of continuous issue event for each domain. We also introduce a automatic issue detection system in social media text. Based on the extracted issue event results in a particular domain, we analyse the continuity of those events by illustrating in time and place-axis. Furthermore, we identify the relationship between social media in terms of issue events propagation.

  • PDF