• Title/Summary/Keyword: Data dictionary

Search Result 346, Processing Time 0.019 seconds

A Study on the Use of Criminal Justice Information Big Data in terms of the Structuralization and Categorization (형사사법정보의 빅데이터 활용방안 연구: 구조화 범주화 관점으로)

  • Kim, Mi Ryung;Roh, Yoon Ju;Kim, Seonghun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.253-277
    • /
    • 2019
  • In the era of the 4th Industrial Revolution, the importance of data is intensifying, but there are many cases where it is not easy to use data due to personal information protection. Although criminal justice information is expected to have various useful values such as crime prediction and prevention, scientific investigation of criminal investigations, and rationalization of sentencing, the use of criminal justice information is currently limited as a matter of legal interpretation related to privacy protection and criminal justice information. This study proposed to convert criminal justice information into 'crime data' and use it as big data through the structuralization and categorization of criminal justice information. And when using "crime data," legal issues, value in use, considerations for data generation and use were verified by experts, and future strategic development plans were identified. Finally we found that 'crime data' seems to have solved the privacy problem, but it is necessary to specify in the criminal justice information related law and it is urgent to be organized in a standardized form for analysis to use big data. Future directions are to derive data elements, construct a dictionary thesaurus, define and classify personal sensitive information for data grading, and develop algorithms for shaping unstructured data.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Fusion research on positive psychological capital (PPC) in accordance with physical disabilities participate in swimming classes for 10 weeks (10주간의 수영교실 참여에 따른 지체장애인의 긍정심리자본(PPC)에 미치는 융합 연구)

  • Kim, Dong-Won
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.3
    • /
    • pp.159-165
    • /
    • 2016
  • The purpose of this study is to investigate the change in the positive psychology movement of capital represented by the performance of physical disabilities to participate in swimming classes for 10 weeks. The study was conducted with 30 to 40-men group participation handicapped total of 21 patients (10 patients) and non-participation group (11 patients), the duration of the experiment was performed three times a week for 10 weeks, 50 minutes. Data processing is a dictionary, post-test data was calculated the mean and standard deviation, experimental design group two won repeated measures analysis of variance for (swimming participating groups, miserable Lady) and time (before and after) using the SPSS 21.0 statistical program It was performed (2-way [2] RM ANOVA), all the statistical significance level was set at .05. Study, classroom participation of the handicapped swimmer can see that has had a positive effect positive psychological capital.

Study on Automatic Mapping Method for Reference of Scholarly Papers (학술논문의 참고문헌 자동매핑 방법에 관한 연구)

  • Han, Jeong-Min;Jang, Hyun-Chul;Kim, Jin-Hyun;Yea, Sang-Jun;Kim, Sang-Kyun;Kim, Chul;Song, Mi-Young
    • Journal of Information Management
    • /
    • v.41 no.3
    • /
    • pp.155-173
    • /
    • 2010
  • With the advanced learning and the diversity of topics, researchers on each area keenly feel the need of precise and a quick discovery of required information at any time. This study presents a way of constructing the automatic mapping system that can compare and analyze duplicated data and that describes the result by building an effective reference extraction method and another way of correcting the wrong form of used Chinese characters with Traditional Korean Medicine dictionary. With this innovation, data duplication on references and Chinese characters errors can be fixed. Under the situation that a number of references of newly published papers that can continuously be extracted.

Study on Confucian Politics about the Annals of the Choson Dynasty through Big Data Analysis (조선왕조실록의 빅데이터분석을 통한 유교정치 연구)

  • Moon, HyeJung
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.253-261
    • /
    • 2018
  • The purpose of this study is to find the theories of public policy in Confucian politics during Choson Dynasty. As a result of the analysis, there are five implications. First, the area of Confucian policy of Choson consisted in authority, organization, financial policy, affection for the people, and li(ritual propriety). Second, major political context had been maintained from King Se-Jong, through King Sung-Jong and King Yeong-Jo to King Jeong-Jo in the perspective of dynasties' characteristic. Third, there were major ideas on Confucius's idea for li in early period, $Zh{\bar{u}}z{\check{i}}^{\prime}s$ idea for the authority in late period and Mencius's idea for financial policy in major risk situation. Fourth, there were five periods with establishment, foundation, crisis, restoration and collapse in the change of public policy. Fifth, $Zh{\bar{u}}z{\check{i}}^{\prime}$ and $Ch{\acute{e}}ng{\cdot}zi$ had influenced bigger than Confucius as a factors of policy making. This study has been promoted the complement of context analysis and understanding of semantic analysis with implementing dictionary using two language with Korean and Chinese.

Database Management System Parameter Tuning Processes for Improving Database System Performance (데이터베이스 시스템 성능 향상을 위한 데이터베이스 관리 시스템 파라미터 튜닝 프로세스)

  • 최용락;윤병권;정기원
    • The Journal of Society for e-Business Studies
    • /
    • v.7 no.1
    • /
    • pp.107-127
    • /
    • 2002
  • Database system parameter tuning is one of database system tuning that achieve to improve performance of database system with application program tuning and data model tuning. By parameter tuning adjusts value of entry that is staled in data dictionary's parameter file that is included to database system, it is thing which make relevant database system can display performance of most suitable. And, it is that achievement is one o( possible tuning method immediately without occurrence of additional expense or involved hardware for database system performance elevation and ashes composition of software. But, it is actuality that administration about parameter practical use is not achieved, and is using Default Value of parameter that database management system offers just as it is systematically. So, this paper presents parameter tuning process that can :achieve Parameter tuning of database system that is operating present systematically, and parameter tuning process each activity important input urea and tuning achievement product. And explain about effect and result that happen by sort database system performance and parameters that it is affinity systematically, and grasp relationships between parameter, and change parameter of string database system. And not that parameter uses contents that specify by fixing when establish database administration system, is going to emphasize and explain that must utilize changing continuously during database system operation. It changes parameter entry value how in various kinds different operation environment and present if must apply, and will arrange effect that this parameter enoy value alteration gets in performance liking into account point that is actuality that is using parameter that define database administrators when install the database system just as it is continually without alteration.

  • PDF

Utilizing Local Bilingual Embeddings on Korean-English Law Data (한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩)

  • Choi, Soon-Young;Matteson, Andrew Stuart;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.45-53
    • /
    • 2018
  • Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.

Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign (인터넷 감정기호를 이용한 긍정/부정 말뭉치 구축 및 감정분류 자동화)

  • Jang, Kyoungae;Park, Sanghyun;Kim, Woo-Je
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.512-521
    • /
    • 2015
  • Internet users purchase goods on the Internet and express their positive or negative emotions of the goods in product reviews. Analysis of the product reviews become critical data to both potential consumers and to the decision making of enterprises. Therefore, the importance of opinion mining techniques which derive opinions by analyzing meaningful data from large numbers of Internet reviews. Existing studies were mostly based on comments written in English, yet analysis in Korean has not actively been done. Unlike English, Korean has characteristics of complex adjectives and suffixes. Existing studies did not consider the characteristics of the Internet language. This study proposes an emotional classification method which increases the accuracy of emotional classification by analyzing the characteristics of the Internet language connoting feelings. We can classify positive and negative comments about products automatically using the Internet emoticon. Also we can check the validity of the proposed algorithm through the result of high precision, recall and coverage for the evaluation of this method.

A Study on Small-sized Index Structure and Fast Retrieval Method Using The RCB trio (RCB트라이를 이용한 빠른 검색과 소용량 색인 구조에 관한 연구)

  • Jung, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.4
    • /
    • pp.11-19
    • /
    • 2007
  • This paper proposes RCB(Reduced Compact Binary) tie to correct faults of both CB(Compact Binary) tie and HCB(Hierarchical Compact Binary) trie. First, in the case of CB trie, a compact structure was tried for the first time, but as the amount of data was increasing, that of inputted data gained and much difficulty was experienced in insertion due to the dummy nods used in balancing trees. On the other hand, if the HCB trie realized hierarchically, given certain depth to prevent the map from increasing on the right, reached the depth, the method for making new trees and connecting to them was used. Eventually, fast progress could be made in the inputting and searching speed, but this had a disadvantage of the storage space becoming bigger because of the use of dummy nods like CB trie and of many tree links. In the case of RCB trie in this thesis, the tree-map could be reduced by about 35% by completely cutting down dummy nods and the whole size by half, compared with the HCB trie.

  • PDF

Target Word Selection for English-Korean Machine Translation System using Multiple Knowledge (다양한 지식을 사용한 영한 기계번역에서의 대역어 선택)

  • Lee, Ki-Young;Kim, Han-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.75-86
    • /
    • 2006
  • Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the translation accuracy of machine translation systems. In this paper, we present a new approach to select Korean target word for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean local context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. The experiment showed promising results for diverse sentences from web documents.

  • PDF