• Title/Summary/Keyword: BigData Analysis

Search Result 3,389, Processing Time 0.035 seconds

A Methodology for Automatic Multi-Categorization of Single-Categorized Documents (단일 카테고리 문서의 다중 카테고리 자동확장 방법론)

  • Hong, Jin-Sung;Kim, Namgyu;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.77-92
    • /
    • 2014
  • Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we propose a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. First, we attempt to find the relationship between documents and topics by using the result of topic analysis for single-categorized documents. Second, we construct a correspondence table between topics and categories by investigating the relationship between them. Finally, we calculate the matching scores for each document to multiple categories. The results imply that a document can be classified into a certain category if and only if the matching score is higher than the predefined threshold. For example, we can classify a certain document into three categories that have larger matching scores than the predefined threshold. The main contribution of our study is that our methodology can improve the applicability of traditional multi-category classifiers by generating multi-categorized documents from single-categorized documents. Additionally, we propose a module for verifying the accuracy of the proposed methodology. For performance evaluation, we performed intensive experiments with news articles. News articles are clearly categorized based on the theme, whereas the use of vulgar language and slang is smaller than other usual text document. We collected news articles from July 2012 to June 2013. The articles exhibit large variations in terms of the number of types of categories. This is because readers have different levels of interest in each category. Additionally, the result is also attributed to the differences in the frequency of the events in each category. In order to minimize the distortion of the result from the number of articles in different categories, we extracted 3,000 articles equally from each of the eight categories. Therefore, the total number of articles used in our experiments was 24,000. The eight categories were "IT Science," "Economy," "Society," "Life and Culture," "World," "Sports," "Entertainment," and "Politics." By using the news articles that we collected, we calculated the document/category correspondence scores by utilizing topic/category and document/topics correspondence scores. The document/category correspondence score can be said to indicate the degree of correspondence of each document to a certain category. As a result, we could present two additional categories for each of the 23,089 documents. Precision, recall, and F-score were revealed to be 0.605, 0.629, and 0.617 respectively when only the top 1 predicted category was evaluated, whereas they were revealed to be 0.838, 0.290, and 0.431 when the top 1 - 3 predicted categories were considered. It was very interesting to find a large variation between the scores of the eight categories on precision, recall, and F-score.

Analysis of the Weight of SWOT Factors of Korean Venture Companies Based on the Industry 4.0 (4차 산업혁명 기반 한국 벤처기업의 SWOT요인에 대한 중요도 분석)

  • Lee, Dongik;Lee, Sangsuk
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.16 no.4
    • /
    • pp.115-133
    • /
    • 2021
  • This study examines the concept and related technologies of the 4th industrial revolution that has been mixed so far and examines the socio-economic changes and influences resulting from it, and the cases of responding to the 4th industrial revolution in major countries. Based on this, by deriving SWOT factors and calculating the importance of each factor for Korean venture companies to prepare for the forth industrial revolution, it was intended to help the government and policymakers in suggesting directions for establishing related policies. Furthermore, the purpose of this study was to suggest a direction for securing global competitiveness to Korean venture entrepreneurs and to help with basic and systematic analysis for further academic in-depth research. For this study, a total of 21 items derived through extensive literature research and data research to understand what are the necessary competency factors for internal and external environmental changes in order for Korean venture companies to have global competitiveness in the era of the 4th Industrial Revolution. After reviewing SWOT factors by three expert groups and confirming them through Delphi survey, the importance of each item was analyzed by using AHP, a systematic decision-making technique. As a result of the analysis, it was shown that Strength(48%), Opportunity(25%), Threat(16%), Weakness(11%) were considered important in order. In terms of sub-items, 'quick and flexible commercialization capability', 'platform/big data/non-face-to-face service activation', and 'ICT infrastructure and it's utilization' were shown to be of the comparatively high importance. On the other hand, in the lower three items, 'macro-economic stability and social infrastructure', 'difficulty in entering overseas markets due to global protectionism', and 'absolutely inferior in foreign investment' were found to have low priority. As a result of the correlation verification by item to see differences in opinions by industry, academia, and policy expert groups, there was no significant difference of opinion, as industry and academic experts showed a high correlation and industry experts and policy experts showed a moderate correlation. The correlation between the academic and policy experts was not statistically significant (p<0.01), so it was analyzed that there was a difference of opinion on importance. This was due to the fact that policy experts highly valued 'quick and flexible commercialization', which are strengths, and 'excellent educational system and high-quality manpower' and 'creation of new markets' which are opportunity items, while academic experts placed great importance on 'support part of government policy', which are strengths. The implication of this study is that in order for Korean venture companies to secure competitiveness in the field of the 4th industrial revolution, it is necessary to have a policy that preferentially supports the relevant items of strengths and opportunity factors. The difference in the details of strength factors and opportunity factors, which shows a high level of variability, suggests that it is necessary to actively review it and reflect it in the policy.

Migrant Multi-Cultural Family Women's Life Quality Related to Oral Health: Survey in Dae-Gu (다문화가족 이주여성의 구강건강관련 삶의 질: 대구지역 조사)

  • Jeon, Eun-Suk;An, Seo-Young;Choi, Yeon-Hee
    • Journal of dental hygiene science
    • /
    • v.11 no.3
    • /
    • pp.181-187
    • /
    • 2011
  • This study conducted oral examinations and individual interviews on migrant multi-cultural family women in Daegu and measured their socio-demographic characters, oral health conditions and OHIP-14 in an aim to investigate the relevance between the oral health of migrant multi-cultural family women living in some big cities and their quality of life. Based on data finally collected from 189 women, the t-test, ANOVA and binary logistic regression analysis were conducted and the conclusions are as follows: The average number of decayed teeth was 2.23, loss teeth was 1.48, and treated teeth was 5.58. Women from the Philippines had more number of loss teeth than those from other countries, and women from China relatively had a small number of filled permanent teeth. The quality of life related to oral health was found to be poor in proportion to the number of loss teeth. A comparison of life quality related to oral health depending on loss teeth showed that life quality related to oral health was lowest in the areas of mental discomfort, physical ability decrease, mental ability decrease, social ability decrease and social disadvantage. Life quality related to oral health was found to be low in proportion to the number of permanent teeth with decay experience and poor monthly household income, which shows that the number of permanent teeth with decay experience and monthly income are mostly related to life quality related to oral health. As migrant multi-cultural family women's life quality related to oral health is low in proportion to the number of loss teeth and decayed teeth, it needs to develop a program to improve their oral healthrelated life quality and conduct follow-up research to verify its effect.

Enterprise Human Resource Management using Hybrid Recognition Technique (하이브리드 인식 기술을 이용한 전사적 인적자원관리)

  • Han, Jung-Soo;Lee, Jeong-Heon;Kim, Gui-Jung
    • Journal of Digital Convergence
    • /
    • v.10 no.10
    • /
    • pp.333-338
    • /
    • 2012
  • Human resource management is bringing the various changes with the IT technology. In particular, if HRM is non-scientific method such as group management, physical plant, working hours constraints, personal contacts, etc, the current enterprise human resources management(e-HRM) appeared in the individual dimension management, virtual workspace (for example: smart work center, home work, etc.), working time flexibility and elasticity, computer-based statistical data and the scientific method of analysis and management has been a big difference in the sense. Therefore, depending on changes in the environment, companies have introduced a variety of techniques as RFID card, fingerprint time & attendance systems in order to build more efficient and strategic human resource management system. In this paper, time and attendance, access control management system was developed using multi camera for 2D and 3D face recognition technology-based for efficient enterprise human resource management. We had an issue with existing 2D-style face-recognition technology for lighting and the attitude, and got more than 90% recognition rate against the poor readability. In addition, 3D face recognition has computational complexities, so we could improve hybrid video recognition and the speed using 3D and 2D in parallel.

Comparison Analysis for Using the Habitat Pattern Between the Korean Endangered Species, Mauremys reevesii, and the Exotic Species, Trachemys scripta elegans (한국산 남생이와 외래종 붉은귀거북의 서식지 이용 패턴 비교 분석)

  • Jo, Shin-il;Na, Sumi;An, Chi-Kyung;Kim, Hyun-jung;Jeong, Yu-Jeong;Lim, Yang-Mook;Kim, Seon Du;Song, Jae Yong;Yi, Hoonbok
    • Korean Journal of Environment and Ecology
    • /
    • v.31 no.4
    • /
    • pp.397-408
    • /
    • 2017
  • The purpose of this study is to identify the home range and habitat using pattern of the native species, Mauremys reevesii, and the exotic species, Trachemys scripta elegans, and to analyze the mutual competition relationship of the two species. This study was conducted at the Goldfish square pond, which is located in the upper part of the valley of Cheonggye mountain from August 2, 2010 to January 30, 2011. We used the three artificially proliferating M. reevesii and three T. scripta elegans which were inhabited in the ponds and reservoirs for monitoring study after attaching the transmitter to each of them. We measured the home range and the habitat utilization radius of three individuals of each species and the environmental factors such as temperature, humidity and soil and water temperature around the Goldfish square pond. As our results, it was analyzed that the M. reevesii and T. scripta elegans have a redundant ecological positions in various aspects such as limited sunbathing places, food resource utilization, hibernation place, etc. We also found that the relatively small M. reevesii was being pushed out of the competition by the relatively big. Further investigation of food competition and habitat utilization should be necessary for these two species for the natural habitats, their home range, food competition, and habitat utilization. The result of this study will be the basic data M. reevesii's restoration project.

A Study on the Improvement Scheme of University's Software Education

  • Lee, Won Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.243-250
    • /
    • 2020
  • In this paper, we propose an effective software education scheme for universities. The key idea of this software education scheme is to analyze software curriculum of QS world university rankings Top 10, SW-oriented university, and regional main national university. And based on the results, we propose five improvements for the effective SW education method of universities. The first is to enhance the adaptability of the industry by developing courses based on the SW developer's job analysis in the curriculum development process. Second, it is necessary to strengthen the curriculum of the 4th industrial revolution core technologies(cloud computing, big data, virtual/augmented reality, Internet of things, etc.) and integrate them with various fields such as medical, bio, sensor, human, and cognitive science. Third, programming language education should be included in software convergence course after basic syntax education to implement projects in various fields. In addition, the curriculum for developing system programming developers and back-end developers should be strengthened rather than application program developers. Fourth, it offers opportunities to participate in industrial projects by reinforcing courses such as capstone design and comprehensive design, which enables product-based self-directed learning. Fifth, it is necessary to develop university-specific curriculum based on local industry by reinforcing internship or industry-academic program that can acquire skills in local industry field.

A Study of the Research the Right to be Forgotten from 2010 (잊힐 권리에 관한 연구동향 분석: 2010년 이후 국내 연구를 중심으로)

  • Shim, Mina
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.4
    • /
    • pp.1073-1084
    • /
    • 2016
  • The purpose of this study is to present the correct direction of research in related fields by analyzing the trends in the domestic study right to be forgotten. In this study, the final selection of 80 pieces of research papers in various disciplines to search for the study and were analyzed by setting the seven criteria and three research questions. Results, notice that significantly increase the amount of research around the social sciences, starting with the EU rules(draft) has been published in 2012, and around the problem navigating the Law oriented research actively done through a literature review and legal research methods can. Intensive study of the protected rights and the conflict in time towards the latter subject was also increased. The right to be forgotten when considering that big data, digital information such diverse and complex technical issues (service), which still lacks support the implementation of the rights ithil research is desperately needed to know the future with the realization that the scope and research methods. The purpose of this study is to present the research direction of the limits intended for domestic research, but realize effective right to be forgotten by future foreign comparative analysis.

Concrete Mixture Design for RC Structures under Carbonation - Application of Genetic Algorithm Technique to Mixture Conditions (탄산화에 노출된 콘크리트 구조물의 배합설계에 대한 연구 - 유전자 알고리즘 적용성 평가)

  • Lee, Sung-Chil;Maria, Q. Feng;Kwon, Sung-Jun
    • Journal of the Korea Concrete Institute
    • /
    • v.22 no.3
    • /
    • pp.335-343
    • /
    • 2010
  • Steel corrosion in reinforced concrete (RC) structures is a critical problem to structural safety and many researches are being actively conducted on developing methods to maintain the required performance of the RC structures during their intended service lives. In this study, concrete mixture proportioning technique through genetic algorithm (GA) for RC structures under carbonation, which is considered to be serious in underground site and big cities, is investigated. For this, mixture proportions and diffusion coefficients of $CO_2$ from the previous researches were analyzed and fitness function for $CO_2$ diffusion coefficient was derived through regression analysis. This function based on the 12 experimental results consisted of 5 variables including water-cement ratio (W/C), cement content, sand percentage, coarse aggregate content per unit volume of concrete in unit, and relative humidity. Through genetic algorithm (GA) technique, simulated mixture proportions were proposed for 3 cases of verification and they showed reasonable results with less than relative error of 10%. Finally, assuming intended service life, different exposure conditions, design parameters, intended $CO_2$ diffusion coefficients, and cement contents were determined and related mixture proportions were simulated. This proposed technique is capable of suggesting reasonable mix proportions and can be modified based on experimental data which consider various mixing components like mineral admixtures.

A study on the user's emotional change when they are using a product by using emotional word logging software (감성어휘 로깅 소프트웨어를 이용한 제품 사용중 사용자의 감성변화 연구)

  • Jeong, Sang-Hoon;Lee, Kun-Pyo
    • Science of Emotion and Sensibility
    • /
    • v.9 no.spc3
    • /
    • pp.167-177
    • /
    • 2006
  • In this study, we developed a tool for measuring user's emotions expressed while using a product in the natural and accessible environment for the design field. Also, using emotional word logging software VideoTAME, we measured a user's emotions expressed while using a product. In the testing module of VideoTAME, participants evaluate their emotional changes through playing and watching the video clips of their performing tasks in the experiment room. In the analyzing module, the researchers replay the results created by participants during the experiment and analyze the results using Microsoft Excel. In this research, we have asked users to examine their emotional changes while watching the recorded video clip of them in the experiment room performing a series of tasks using a cellular phone. In this experiment, there were no big differences in the representative emotions expressed for each characteristics of task. The reason for this can be assumed it is because of the emotional changes occurred while facing specific situations when performing a task rather than the task itself. If more data is collected and concrete statistical analysis is done, it is expected that we can clarify what effect a product's usability has on user's emotions.

  • PDF

The study on the breast types and characteristics of Chinese female adults. (Ver. 2) - Focused on the female college students in Shanghai - (중국 성인여성용 유방유형 및 특성에 관한 연구(제 II보) - 상해지역 20대 전반 여성을 중심으로 -)

  • Cha, Su-Joung;Sohn, Hee-Soon
    • Journal of Fashion Business
    • /
    • v.14 no.1
    • /
    • pp.57-75
    • /
    • 2010
  • This study is done in Shanghai area by sample survey of female college students in their lower 20s. Through direct contact survey, this study collected and analyzed information on figure to understand feature of breasts and measurements of body to provide base information to improve product of brassiere for adult female in China. Data was analyzed by using SPSSWIN 13.0 Program and SAS 9.0. 1. From a result of factor analysis on 40 items of measures to derive the components of the shape of the breast, 6 factors were derived such as the factors to show the obesity of the bust, to show the ratio of the upper and lower area of the lower bust, droop and volume, to show the internal shape and broadness of the bust, to show the location and vertical size of the bust, to show the protrusion of the bust and the eternal shape of the bust. 2. From a result of classification on the shape of the breast of the Chinese women in their early 20s, 4 types were selected. Type 1 is the protrusion type that a woman has a broad drooped breast due to the development in the upper and lower parts of the chest as well as the highest height, a high degree of obesity in the bust part, big volume and much protrusion. Type 2 is the hemisphere type that the degree of obesity in the breast is second to Type 1 and a breast is located at a region higher than Type 1 as the degree of obesity in the breast is secondly highest, the lower part of the bust is bigger than the Type 1 and the degree of being broad and drooped is second to Type 1. Type 3 is the cone type with a breast of being drooped and broad a little bit and a certain degree of a volume. Type 4 is the flat type with the smallest value in the item that shows the obesity of the bust area and with the smallest value in the R$\ddot{o}$hrer index to have the small and slender body type for the bust area, which is somewhat flat due to a low slope at the internal side of the bust.