• Title/Summary/Keyword: Text comparing

Search Result 270, Processing Time 0.02 seconds

Text Region Detection using Edge and Regional Minima/Maxima Transformation from Natural Scene Images (에지 및 국부적 최소/최대 변환을 이용한 자연 이미지로부터 텍스트 영역 검출)

  • Park, Jong-Cheon;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.2
    • /
    • pp.358-363
    • /
    • 2009
  • Text region detection from the natural scene images used in a variety of applications, many research are needed in this field. Recent research methods is to detect the text region using various algorithm which it is combination of edge based and connected component based. Therefore, this paper proposes an text region detection using edge and regional minima/maxima transformation algorithm from natural scene images, and then detect the connected components of edge and regional minima/maxima, labeling edge and regional minima/maxima connected components. Analysis the labeled regions and then detect a text candidate regions, each of detected text candidates combined and create a single text candidate image, Final text region validated by comparing the similarity and adjacency of individual characters, and then as the final text regions are detected. As the results of experiments, proposed algorithm improved the correctness of text regions detection using combined edge and regional minima/maxima connected components detection methods.

Is Text Mining on Trade Claim Studies Applicable? Focused on Chinese Cases of Arbitration and Litigation Applying the CISG

  • Yu, Cheon;Choi, DongOh;Hwang, Yun-Seop
    • Journal of Korea Trade
    • /
    • v.24 no.8
    • /
    • pp.171-188
    • /
    • 2020
  • Purpose - This is an exploratory study that aims to apply text mining techniques, which computationally extracts words from the large-scale text data, to legal documents to quantify trade claim contents and enables statistical analysis. Design/methodology - This is designed to verify the validity of the application of text mining techniques as a quantitative methodology for trade claim studies, that have relied mainly on a qualitative approach. The subjects are 81 cases of arbitration and court judgments from China published on the website of the UNCITRAL where the CISG was applied. Validation is performed by comparing the manually analyzed result with the automatically analyzed result. The manual analysis result is the cluster analysis wherein the researcher reads and codes the case. The automatic analysis result is an analysis applying text mining techniques to the result of the cluster analysis. Topic modeling and semantic network analysis are applied for the statistical approach. Findings - Results show that the results of cluster analysis and text mining results are consistent with each other and the internal validity is confirmed. And the degree centrality of words that play a key role in the topic is high as the between centrality of words that are useful for grasping the topic and the eigenvector centrality of the important words in the topic is high. This indicates that text mining techniques can be applied to research on content analysis of trade claims for statistical analysis. Originality/value - Firstly, the validity of the text mining technique in the study of trade claim cases is confirmed. Prior studies on trade claims have relied on traditional approach. Secondly, this study has an originality in that it is an attempt to quantitatively study the trade claim cases, whereas prior trade claim cases were mainly studied via qualitative methods. Lastly, this study shows that the use of the text mining can lower the barrier for acquiring information from a large amount of digitalized text.

New Text Steganography Technique Based on Part-of-Speech Tagging and Format-Preserving Encryption

  • Mohammed Abdul Majeed;Rossilawati Sulaiman;Zarina Shukur
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.170-191
    • /
    • 2024
  • The transmission of confidential data using cover media is called steganography. The three requirements of any effective steganography system are high embedding capacity, security, and imperceptibility. The text file's structure, which makes syntax and grammar more visually obvious than in other media, contributes to its poor imperceptibility. Text steganography is regarded as the most challenging carrier to hide secret data because of its insufficient redundant data compared to other digital objects. Unicode characters, especially non-printing or invisible, are employed for hiding data by mapping a specific amount of secret data bits in each character and inserting the character into cover text spaces. These characters are known with limited spaces to embed secret data. Current studies that used Unicode characters in text steganography focused on increasing the data hiding capacity with insufficient redundant data in a text file. A sequential embedding pattern is often selected and included in all available positions in the cover text. This embedding pattern negatively affects the text steganography system's imperceptibility and security. Thus, this study attempts to solve these limitations using the Part-of-speech (POS) tagging technique combined with the randomization concept in data hiding. Combining these two techniques allows inserting the Unicode characters in randomized patterns with specific positions in the cover text to increase data hiding capacity with minimum effects on imperceptibility and security. Format-preserving encryption (FPE) is also used to encrypt a secret message without changing its size before the embedding processes. By comparing the proposed technique to already existing ones, the results demonstrate that it fulfils the cover file's capacity, imperceptibility, and security requirements.

An Extracting Text Area Using Adaptive Edge Enhanced MSER in Real World Image (실세계 영상에서 적응적 에지 강화 기반의 MSER을 이용한 글자 영역 추출 기법)

  • Park, Youngmok;Park, Sunhwa;Seo, Yeong Geon
    • Journal of Digital Contents Society
    • /
    • v.17 no.4
    • /
    • pp.219-226
    • /
    • 2016
  • In our general life, what we recognize information with our human eyes and use it is diverse and massive. But even the current technologies improved by artificial intelligence are exorbitantly deficient comparing to human visual processing ability. Nevertheless, many researchers are trying to get information in everyday life, especially concentrate effort on recognizing information consisted of text. In the fields of recognizing text, to extract the text from the general document is used in some information processing fields, but to extract and recognize the text from real image is deficient too much yet. It is because the real images have many properties like color, size, orientation and something in common. In this paper, we applies an adaptive edge enhanced MSER(Maximally Stable Extremal Regions) to extract the text area in those diverse environments and the scene text, and show that the proposed method is a comparatively nice method with experiments.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Applying CPM-GOMS to Two-handed Korean Text Entry Task on Mobile Phone

  • Back, Ji-Seung;Myung, Ro-Hae
    • Journal of the Ergonomics Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.303-310
    • /
    • 2011
  • In this study, we employ CPM-GOMS analysis for explaining physical and cognitive processes and for quantitatively predicting when users are typing Korean text messages on mobile phones using both hands. First, we observe the behaviors of 10 subjects, when the subjects enter keypads with both hands. Then, basing upon MHP, we categorize the behaviors into perceptual, cognitive, motor operators, and then we analyze those operators. After that, we use the critical paths to model two task sentences. Also, we used Fitts' law method which was applied many times to predict text entering time on mobile phone to compare with the results of our CPM-GOMS model. We followed Lee's (2008) method that is well suited for text entry task using both hands and calculate total task time for each task sentences. For the sake of comparison between the actual data and the results predicted from our CPM-GOMS model, we empirically tested 10 subjects and concluded that there were no significant differences between the predicted values and the actual data. With the CPM-GOMS model, we can observe the human information processes composed on the physical and cognitive processes. Also we verified that the CPM-GOMS model can be well applied to predict the users' performance when they input text messages on mobile phones using both hands by comparing the predicted total task time with the real execution time.

Development of Text Mining-Based Accounting Terminology Analyzer for Financial Information Utilization (재정정보 활용을 위한 텍스트 마이닝 기반 회계용어 형태소 분석기 구축)

  • Jung, Geon-Yong;Yoon, Seung-Sik;Kang, Ju-Young
    • The Journal of Information Systems
    • /
    • v.28 no.4
    • /
    • pp.155-174
    • /
    • 2019
  • Purpose Social interest in financial statement notes has recently increased. However, contrary to the keen interest in financial statement notes, there is no morphological analyzer for accounting terms, which is why researchers are having considerable difficulty in carrying out research. In this study, we build a morphological analyzer for accounting related text mining techniques. This morphological analyzer can handle accounting terms like financial statements and we expect it to serve as a springboard for growth in the text mining research field. Design/methodology/approach In this study, we build customized korean morphological analyzer to extract proper accounting terms. First, we collect Company's Financial Statement notes, financial information data published by KPFIS(Korea Public Finance Information Service), K-IFRS accounting terms data. Second, we cleaning and tokeninzing and removing stopwords. Third, we customize morphological analyzer using n-gram methodology. Findings Existing morphological analyzer cannot extract accounting terms because it split accounting terms to many nouns. In this study, the new customized morphological analyzer can detect more appropriate accounting terms comparing to the existing morphological analyzer. We found that accounting words that were not detected by existing morphological analyzers were detected in new customized morphological analyzers.

A Recognition Method for Korean Spatial Background in Historical Novels (한국어 역사 소설에서 공간적 배경 인식 기법)

  • Kim, Seo-Hee;Kim, Seung-Hoon
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.245-253
    • /
    • 2016
  • Background in a novel is most important elements with characters and events, and means time, place and situation that characters appeared. Among the background, spatial background can help conveys topic of a novel. So, it may be helpful for choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels. In case of English text, It can be recognize spatial background easily because it use upper and lower case and words used with the spatial information such as Bank, University and City. But, in case Korean text, it is difficult to recognize that spatial background because there is few information about usage of letter. In the previous studies, they use machine learning or dictionaries and rules to recognize about spatial information in text such as news and text messages. In this paper, we build a nation dictionaries that refer to information such as 'Korean history' and 'Google maps.' We Also propose a method for recognizing spatial background based on patterns of postposition in Korean sentences comparing to previous works. We are grasp using of postposition with spatial background because Korean characteristics. And we propose a method based on result of morpheme analyze and frequency in a novel text for raising accuracy about recognizing spatial background. The recognized spatial background can help readers to grasp the atmosphere of a novel and to understand the events and atmosphere through recognition of the spatial background of the scene that characters appeared.

A novel, reversible, Chinese text information hiding scheme based on lookalike traditional and simplified Chinese characters

  • Feng, Bin;Wang, Zhi-Hui;Wang, Duo;Chang, Ching-Yun;Li, Ming-Chu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.1
    • /
    • pp.269-281
    • /
    • 2014
  • Compared to hiding information into digital image, hiding information into digital text file requires less storage space and smaller bandwidth for data transmission, and it has obvious universality and extensiveness. However, text files have low redundancy, so it is more difficult to hide information in text files. To overcome this difficulty, Wang et al. proposed a reversible information hiding scheme using left-right and up-down representations of Chinese characters, but, when the scheme is implemented, it does not provide good visual steganographic effectiveness, and the embedding and extracting processes are too complicated to be done with reasonable effort and cost. We observed that a lot of traditional and simplified Chinese characters look somewhat the same (also called lookalike), so we utilize this feature to propose a novel information hiding scheme for hiding secret data in lookalike Chinese characters. Comparing to Wang et al.'s scheme, the proposed scheme simplifies the embedding and extracting procedures significantly and improves the effectiveness of visual steganographic images. The experimental results demonstrated the advantages of our proposed scheme.

A Comparative Study on Korean Reading Comprehension by Adjusting Vocabulary Levels (수준별 어휘 조정에 따른 한국어 읽기 텍스트 이해도 비교 연구)

  • Ju, Jae-hwan
    • Journal of Korean language education
    • /
    • v.29 no.4
    • /
    • pp.201-223
    • /
    • 2018
  • The purpose of this study is to observe the effects of text modification by comparing differences in Korean reading comprehension levels that arise from differences in vocabulary levels in texts. This study intends to use simplified texts with the vocabulary difficulty adjusted differently from the original text to measure reading comprehension levels of Korean learners and analyze the result. To measure reading comprehension, the researcher divided 55 Korean learners of intermediate to advanced level of fluency into two groups; the control group read the original text and the treatment group read a simplified text in which complex vocabulary were substituted with easier words of medium difficulty. Then the two groups were tested with the same questionnaire to measure comprehension levels of each group. The result showed that the groups that read simplified texts scored higher than the control group; this suggests that the reading comprehension level was increased in the treatment group. The experiment confirmed that unknown vocabulary density has direct impact on Korean reading comprehension. The result shows that the proportion of unknown vocabulary should be reduced for meaning-focused reading. It also demonstrates that comprehension of the learner was enhanced with lexical simplification rather than structural simplification i.e. simplification of grammar or sentences. Thus, diverse reading materials adjusted to the learners' level of fluency should be developed to enable reading for learning Korean. By reducing the burden of understanding the meaning of each vocabulary, learners will be able to achieve the initial goal of reading.