• Title/Summary/Keyword: unstructured format

Search Result 38, Processing Time 0.025 seconds

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

AUTOMATIC GENERATION OF UNSTRUCTURED SURFACE GRID SYSTEM USING CAD SURFACE DATA (CAD 형상 데이터를 이용한 비정렬 표면 격자계의 자동 생성 기법)

  • Lee, B.J.;Kim, B.S.
    • Journal of computational fluids engineering
    • /
    • v.12 no.4
    • /
    • pp.68-73
    • /
    • 2007
  • Computational Fluid Dynamics (CFD) approach is now playing an important role in the engineering process in these days. Generating proper grid system in time for the region of interest is prerequisite for the efficient numerical calculation of flow physics using CFD approach. Grid generation is, however, usually considered as a major obstacle for a routine and successful application of numerical approaches in the engineering process. CFD approach based on the unstructured grid system is gaining popularity due to its simplicity and efficiency for generating grid system compared to the structured grid approaches, especially for complex geometries. In this paper an automated triangular surface grid generation using CAD(Computer Aided Design) surface data is proposed. According to the present method, the CAD surface data imported in the STL(Stereo-lithography) format is processed to identify feature edges defining the topology and geometry of the surface shape first. When the feature edges are identified, node points along the edges are distributed. The initial fronts which connect those feature edge nodes are constructed and then they are advanced along the CAD surface data inward until the surface is fully covered by triangular surface grid cells using Advancing Front Method. It is found that this approach can be implemented in an automated way successfully saving man-hours and reducing human-errors in generating triangular surface grid system.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

A Study on Collaborative Design System using Design Issue Modeling and Performance-oriented Design Service in CLOUD BIM based Design Process (CLOUD BIM 기반 설계 프로세스에서 설계정보의 구조화 및 성능지향적 설계서비스를 통한 협업설계 지원 방안)

  • Jung, Jae Hwan;Kim, Jin Wooung;Song, Yu Mi;Kim, Sung-Ah
    • Journal of KIBIM
    • /
    • v.6 no.1
    • /
    • pp.9-17
    • /
    • 2016
  • Building information modeling refers to combination or set of technologies and organizational solutions that are expected to increase collaboration in the construction industry and to improve the productivity and quality of the design, construction, and maintenance of buildings. For enhanced communication among project participants, various information which BIM model usually includes is provided, furthermore data which contain exchange of unstructured information is needed. If the extension of BIM standard file format for practical use of design Issue information about collaborative design process is fulfilled, the productivity and quality of design will be improved.

Development of the Unified Database Design Methodology for Big Data Applications - based on MongoDB -

  • Lee, Junho;Joo, Kyungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.3
    • /
    • pp.41-48
    • /
    • 2018
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we develop and propose the integrated design methodology based on MongoDB for big data applications. The proposed methodology is more scalable than the existing methodology, so it is easy to handle big data.

Cause Analysis of Accidents Associated with Industrial Machines and Devices (산업용 기계 및 기구 관련 산업재해 원인분석)

  • Choi, Gi Heung
    • Journal of the Korean Society of Safety
    • /
    • v.33 no.1
    • /
    • pp.16-21
    • /
    • 2018
  • Cause analysis of accidents associated with industrial machines and devices is essential to improve the effectiveness and the efficiency of industrial safety system in Korea. This study focuses on cause analysis of accidents associated with industrial machines and devices. In particular, analysis of abstracts of accidents which are written in descriptive format and, therefore, inherently unstructured and exhibits characteristics of big data is suggested and tested. Automatic analysis of such big data performed in this study reveals the consistent results with the manual analysis results in previous studies. Analysis results also suggest that incorporating transition from the current user-oriented indirect regulations to more manufacturer and user balanced direct regulations will guarantee more effective prevention of industrial accidents at the early stage of generation of danger.

Information Pollution, a Mounting Threat: Internet a Major Causality

  • Pandita, Ramesh
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.4
    • /
    • pp.49-60
    • /
    • 2014
  • The present discourse lasts around, information pollution, causes and concerns of information pollution, internet as a major causality and how it affects the decision making ability of an individual. As, information producers in the process to not to lose the readership of their content, and to cater the information requirements of both the electronic and the print readers, reproduce almost the whole of the printed information in digital form as well. Abundant literature is also equally produced in electronic format only, thereon, sharing this information on hundreds of social networking sites, like, Facebook, Twitter, Blogs, Flicker, Digg, LinkedIn, etc. without attributions to original authors, have created almost a mess of this whole information produced and disseminated. Accordingly, the study discusses about the sources of information pollution, the aspects of unstructured information along with plagiarism. Towards the end of the paper stress has been laid on information literacy, as how it can prove handy in addressing the issue with some measures, which can help in regulating the behaviour of information producers.

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

A linked data system framework for sharing construction defect information

  • Lee, Doyeop;Park, Chansik
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.232-235
    • /
    • 2015
  • Defect data contains experiential knowledge about specific work conditions. And the number of projects performed by a company is too limited for an individual to experience the various defects under the current complex construction environment. Therefore, in order to manage and prevent a reoccurrence of defects, a proper data feedback mechanism is required. However, most defect data are stored in unstructured ways, resulting in the fundamental problem of data utilization. In this paper, a new framework is proposed by using linked data technologies to improve defect data utilization. The target of this framework is to convert defect data to the ontology-based linked data format for sharing defect data from different data sources. To demonstrate it, some technical solutions are implemented by using real cases. The proposed approach can reduce data search time and improve the accuracy of search results as well. Moreover, the proposed approach can be applied to other domains that need to refer to external sources such as safety, specification, product, and regulation.

  • PDF

TET2MCNP: A Conversion Program to Implement Tetrahedral-mesh Models in MCNP

  • Han, Min Cheol;Yeom, Yeon Soo;Nguyen, Thang Tat;Choi, Chansoo;Lee, Hyun Su;Kim, Chan Hyeong
    • Journal of Radiation Protection and Research
    • /
    • v.41 no.4
    • /
    • pp.389-394
    • /
    • 2016
  • Background: Tetrahedral-mesh geometries can be used in the MCNP code, but the MCNP code accepts only the geometry in the Abaqus input file format; hence, the existing tetrahedral-mesh models first need to be converted to the Abacus input file format to be used in the MCNP code. In the present study, we developed a simple but useful computer program, TET2MCNP, for converting TetGen-generated tetrahedral-mesh models to the Abacus input file format. Materials and Methods: TET2MCNP is written in C++ and contains two components: one for converting a TetGen output file to the Abacus input file and the other for the reverse conversion process. The TET2MCP program also produces an MCNP input file. Further, the program provides some MCNP-specific functions: the maximum number of elements (i.e., tetrahedrons) per part can be limited, and the material density of each element can be transferred to the MCNP input file. Results and Discussion: To test the developed program, two tetrahedral-mesh models were generated using TetGen and converted to the Abaqus input file format using TET2MCNP. Subsequently, the converted files were used in the MCNP code to calculate the object- and organ-averaged absorbed dose in the sphere and phantom, respectively. The results show that the converted models provide, within statistical uncertainties, identical dose values to those obtained using the PHITS code, which uses the original tetrahedral-mesh models produced by the TetGen program. The results show that the developed program can successfully convert TetGen tetrahedral-mesh models to Abacus input files. Conclusion: In the present study, we have developed a computer program, TET2MCNP, which can be used to convert TetGen-generated tetrahedral-mesh models to the Abaqus input file format for use in the MCNP code. We believe this program will be used by many MCNP users for implementing complex tetrahedral-mesh models, including computational human phantoms, in the MCNP code.