• Title/Summary/Keyword: Data Cleaning

Search Result 422, Processing Time 0.025 seconds

Exposure assessment of musculoskeletal disorder risk factors in non routinized work: An application of PATH-KOSHA observational tool to hospital workers (비정형작업 근골격계질환 위험요인의 노출평가: 일부 병원근로자에 대한 PATH-KOSHA 관찰도구 적용사례)

  • Park, Jung-Keun;Han, Young-Sun
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.19 no.4
    • /
    • pp.412-422
    • /
    • 2009
  • This study was conducted to assess exposure to musculoskeletal disorder(MSD) risk factors in hospital personnel who performed non-routinized work tasks. A tool ("PATH-KOSHA" version) was newly revised from PATH(Posture, Activity, Tools and Handling) method and uploaded into a personal digital assistant(PDA). The version was used, on a basis of direct-observation, to collect PATH data at the 2 hospital settings in different regions. Job analysis was performed to get various information (e.g., work and rest time, task type) as well. The data collected were visually checked for data cleaning and stored for future data analysis. A total of 1,992 PATH observations were made for 37 hospital workers. Exposure levels varied across 18 items of the MSD risk factors. The highest percent time spent on non-neutral postures was 53% for wrist deviation, followed by 47%(pinch grip), 35%(trunk posture), 23%(neck posture), and 20%(shoulder/arm posture). The highest percent time spent among hand activity level(HAL) variables was 55% for HAL-cat2 (HAL: 3.3 - <6.7). The percent time of items with respect to both loads with more than 5kg and contact stress was less than 4%. Vibration was not exposed in the study workers. Different aspects were discussed for findings. The study results showed that wrist deviation was highest in percent time spent on awkward posture while HAL-cat2 was highest in hand repetition. The study suggests that distal upper extremity posture and HAL should be primarily addressed and controlled in non-routinized work including the hospital settings.

Estimation of Completeness of Cancer Registration for Patients Referred to Shiraz Selected Centers through a Two Source Capture Re-capture Method, 2009 Data

  • Sharifian, Roxana;SedaghatNia, Mohammad Hossein;Nematolahi, Mohtram;Zare, Najaf;Barzegari, Saeed
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.13
    • /
    • pp.5549-5556
    • /
    • 2015
  • Background: Cancer has important social consequences with cancer registration as the basis of moving towards prevention. The present study aimed to estimate the completeness of registration of the ten most common cancers in patients referred to selected hospitals in Shiraz, Iran by using capture-recapture method. Materials and Methods: This cross-sectional analytical study was performed in 2014 based on the data of 2009, on a total of 4,388 registered cancer patients. After cleaning data from two sources, using capture-recapture common findings were identified. Then, the percentage of the completeness of cancer registration was estimated using Chapman and Chao methods. Finally, the effects of demographic and treatment variables on the completeness of cancer registration were investigated. Results: The results showed that the percentages of completeness of cancer registration in the selected hospitals of Shiraz were 58.6% and 58.4%, and influenced by different variables. The age group between 40-49 years old was the highest represented and for the age group under 20 years old was the lowest for cancer registration. Breast cancer had the highest registration level and after that, thyroid and lung cancers, while colorectal cancer had the lowest registration level. Conclusions: According to the results, the number of cancers registered was very few and it seems that factors like inadequate knowledge of some doctors, imprecise diagnosis about the types of cancer, incorrectly filled out medical documents, and lack of sufficient accuracy in recording data on the computer cause errors and defects in cancer registration. This suggests a necessity to educate and teach doctors and other medical workers about the methods of documenting information related to cancer and also conduct additional measures to improve the cancer registration system.

Pre-Processing of Query Logs in Web Usage Mining

  • Abdullah, Norhaiza Ya;Husin, Husna Sarirah;Ramadhani, Herny;Nadarajan, Shanmuga Vivekanada
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.82-86
    • /
    • 2012
  • In For the past few years, query log data has been collected to find user's behavior in using the site. Many researches have studied on the usage of query logs to extract user's preference, recommend personalization, improve caching and pre-fetching of Web objects, build better adaptive user interfaces, and also to improve Web search for a search engine application. A query log contain data such as the client's IP address, time and date of request, the resources or page requested, status of request HTTP method used and the type of browser and operating system. A query log can offer valuable insight into web site usage. A proper compilation and interpretation of query log can provide a baseline of statistics that indicate the usage levels of website and can be used as tool to assist decision making in management activities. In this paper we want to discuss on the tasks performed of query logs in pre-processing of web usage mining. We will use query logs from an online newspaper company. The query logs will undergo pre-processing stage, in which the clickstream data is cleaned and partitioned into a set of user interactions which will represent the activities of each user during their visits to the site. The query logs will undergo essential task in pre-processing which are data cleaning and user identification.

Estimation of Water Quality of Fish Farms using Multivariate Statistical Analysis

  • Ceong, Hee-Taek;Kim, Hae-Ran
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.4
    • /
    • pp.475-482
    • /
    • 2011
  • In this research, we have attempted to estimate the water quality of fish farms in terms of parameters such as water temperature, dissolved oxygen, pH, and salinity by employing observational data obtained from a coastal ocean observatory of a national institution located close to the fish farm. We requested and received marine data comprising nine factors including water temperature from Korea Hydrographic and Oceanographic Administration. For verifying our results, we also established an experimental fish farm in which we directly placed the sensor module of an optical mode, YSI-6920V2, used for self-cleaning inside fish tanks and used the data measured and recorded by a environment monitoring system that was communicating serially with the sensor module. We investigated the differences in water temperature and salinity among three areas - Goheung Balpo, Yeosu Odongdo, and the experimental fish farm, Keumho. Water temperature did not exhibit significant differences but there was a difference in salinity (significance <5%). Further, multiple regression analysis was performed to estimate the water quality of the fish farm at Keumho based on the data of Goheung Balpo. The water temperature and dissolved-oxygen estimations had multiple regression linear relationships with coefficients of determination of 98% and 89%, respectively. However, in the case of the pH and salinity estimated using the oceanic environment with nine factors, the adjusted coefficient of determination was very low at less than 10%, and it was therefore difficult to predict the values. We plotted the predicted and measured values by employing the estimated regression equation and found them to fit very well; the values were close to the regression line. We have demonstrated that if statistical model equations that fit well are used, the expense of fish-farm sensor and system installations, maintenances, and repairs, which is a major issue with existing environmental information monitoring systems of marine farming areas, can be reduced, thereby making it easier for fish farmers to monitor aquaculture and mariculture environments.

Screening of Workers with Presumed Occupational Methanol Poisoning: The Applicablility of a National Active Occupational Disease Surveillance System

  • Eom, Huisu;Lee, Jihye;Kim, Eun-A
    • Safety and Health at Work
    • /
    • v.10 no.3
    • /
    • pp.265-274
    • /
    • 2019
  • Background: Methyl alcohol poisoning in mobile phone-manufacturing factories during 2015-2016 was caused by methyl alcohol use for cleaning in computerized numerical control (CNC) processes. To determine whether there were health complications in other workers involved in similar processes, the Occupational Safety and Health Research Institute conducted a survey. Methods: We established a national active surveillance system by collaborating with the Ministry of Employment and Labor and National Health Insurance Service. Employment and national health insurance data were used. Overall, 12,048 employees of major domestic mobile phone companies and CNC process dispatch workers were surveyed from 2016 to 2017. We investigated methyl alcohol poisoning by using the national health insurance data. Questionnaires were used to investigate diseases due to methyl alcohol poisoning. Results: Overall, 24.9% of dispatched workers were employed in at least five companies, and 23.9% of dispatched workers had missing employment insurance history data. The prevalence of blindness including visual impairment, optic neuritis, visual disturbances, and alcohol toxicity in the study participants was higher than that reported in the national health insurance database (0.02%, 0.07%, 0.23%, and 0.03% versus 0.01%, 0.07%, 0.13%, and 0.01%, respectively, in 2015). Moreover, 430 suspicious workers were identified; 415 of these provided an address and phone number, of whom 48 responded (response rate, 11.6%). Among the 48 workers, 10 had diseases at the time of the survey, of whom 3 workers were believed to have diseases related to methyl alcohol exposure. Conclusion: This study revealed that active surveillance data can be used to assess health problems related to methyl alcohol poisoning in CNC processes and dispatch workers.

A Study on Energy Usage Monitoring and Saving Method in the Sewage Treatment Plant (공공하수처리시설에서 에너지 사용현황 및 절감방안 연구)

  • Kim, Jongrack;Rhee, Gahee;You, Kwangtae;Kim, Dongyoun;Lee, Hosik
    • Journal of Korean Society on Water Environment
    • /
    • v.36 no.6
    • /
    • pp.535-545
    • /
    • 2020
  • This study aims to conserve and monitor energy use in public sewage treatment plants by utilizing data from the SCADA system and by controlling the aeration rate required for maintaining effluent water quality. Power consumption in the sewage treatment process was predicted using the equipment's uptime, efficiency, and inherent power consumption. The predicted energy consumption was calibrated by measured data. Additionally, energy efficiency indicators were proposed based on statistical data for energy use, capacity, and effluent quality. In one case study, a sewage treatment plant operated via the SBR process used ~30% of energy consumed in maintaining the bioreactors and treated water tanks (included decanting pump and cleaning systems). Energy consumption analysis with the K-ECO Tool-kit was conducted for unit processing. The results showed that about 58.7% of total energy consumed was used in the preliminary and biological treatment rotating equipment such as the blower and pump. In addition, the energy consumption rate was higher to the order of 19.2% in the phosphorus removal process, 16.0% during sludge treatment, and 6.1% during disinfection and discharge. In terms of equipment energy usage, feeding and decanting pumps accounted for 40% of total energy consumed following 27% for blowers. By controlling the aeration rate based on the proposed feedback control system, the DO concentration was reduced by 56% compared pre-controls and the aeration amount decreased by 28%. The overall power consumption of the plant was reduced by 6% via aeration control.

Status and Quality Analysis on the Biodiversity Data of East Asian Vascular Plants Mobilized through the Global Biodiversity Information Facility (GBIF) (세계생물다양성정보기구(GBIF)에 출판된 동아시아 관속식물 생물다양성 정보 현황과 자료품질 분석)

  • Chang, Chin-Sung;Kwon, Shin-Young;Kim, Hui
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.2
    • /
    • pp.179-188
    • /
    • 2021
  • Biodiversity informatics applies information technology methods in organizing, accessing, visualizing, and analyzing primary biodiversity data and quantitative data management through the scientific names of accepted names and synonyms. We reviewed the GBIF data published by China, Japan, Taiwan, and internal institutes, such as NIBR, NIE, and KNA of the Republic of Korea, and assessed data in diverse aspects of data quality using BRAHMS software. Most data from four Asian countries have quality problems with the lack of data consistency and missing information on georeferenced data, collectors, collection date, and place names (gazetteers) or other invalid data forms. The major problem is that biodiversity management institutions in East Asia are using unstructured databases and simple spreadsheet-type data. Owing to the nature of the biodiversity information, if data relationships are not structured, it would be impossible to secure the data integrity of scientific names, human names, geographical names, literature, and ecological information. For data quality, it is essential to build data integrity for database management and training systems for taxonomists who are continuous data managers to correct errors. Thus, publishers in East Asia play an essential role not only in using specialized software to manage biodiversity data but also in developing structured databases and ensuring their integration and value within biodiversity publishing platforms.

Direction of Program Development for Supporting U-turn Farmers' Rural Settlement (귀농자들의 농촌정착지원을 위한 프로그램 개발 방향)

  • Kim, Sung-Soo;Cheong, Ji-Woong;Lim, Hyung-Baek;Koh, Woon-Mee;Kim, Jung-Tae;Lee, Sung
    • Journal of Agricultural Extension & Community Development
    • /
    • v.11 no.1
    • /
    • pp.53-65
    • /
    • 2004
  • The purposes for this study was to provide information for developing educational programs for U-turn farmers' based on their needs on rural settlement. Special objectives of the study were; 1) to survey the general characteristics of U-turn farmers including motives. preparation, and education, 2) to investigate problems and difficulties of the U-turn farmers in rural settlement 3) to identify the reasons for success and failure in U-turn filming, and 4) to provide information in developing programs for U-turn farmers. Data for the study were collected from 526 U-turn farmers throughout the country, and after data cleaning, 494 questionnaires were used for data analysis. Based on the results of this study, the following were recommended for further development of U-turn farming programs; 1) to facilitate and expand continuous surveys on the motives. preparation, education and information for U-turn farmers will be neceassary to update the important and current information on U-turn farming. 2) Further examination of the problems and difficulties of U-turn farmers would be necessary to develop appropriate policies and educational programs for U-turn farming. 3) continuous investigations on the reasons for success and failure of U-turn farming would be necessary to develop appropriate apicultural policies. 4) for more effective educational programs for U-turn farmers. selection of educators, institution, curricular and timing etc. Should be carefully designed to meet the practical needs of the U-turn farmers. 5) more research activities should be encouraged to improve program development and implementation of U-turn farming.

  • PDF

A Study on Analysis of the Trend of Blockchain by Key Words Network Analysis (키워드 네트워크 분석 방법을 활용한 블록체인 트렌드 분석에 관한 연구)

  • Cho, Seong-Hwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.550-555
    • /
    • 2018
  • This study aims to identify and compare contents and keywords used in articles related to blockchain applications to various industries. The text mining and Semantic Network Analysis, as methods of keyword network analysis, were used to analyze articles including terms of 'finance' 'energy' and 'logistics', which media and government frequently mentioned as areas that can apply blockchain technologies. For this study, data were collected from 43,093 articles from January, 2017 through July, 2018. Data crawling was carried out by using Python BeautifulSoup and data cleaning was performed in order to eliminate mutual redundancies of the three terms. After that, text mining and semantic network analysis were performed using Textom and UCInet for network analysis between keywords. The results showed that all the three terms were similar in terms of 'technology', but there were differences in the contents of 'government policy' or 'industry' issues. In addition, there were differences in frequencies and centralities of these terms.

Study on the Speed-Power Characteristics Through a Speed Trial of a Large Container Vessel During a Commercial Voyage Part I (상업 운항 중인 대형 컨테이너선의 항차 중 속력 시운전을 통한 선속-동력 특성 연구 Part I)

  • Kim, Ho;Lee, Joon-Hyoung;Jang, Jin-Ho;Ahn, Hae-Seong;Kang, Dae-Youl;Byeon, Sang-Su
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.58 no.6
    • /
    • pp.366-374
    • /
    • 2021
  • This paper presents the analysis of the speed-power performance in the real sea using a large container vessel data provided as a test bed from a shipping company. To perform a speed trial of the vessel during a commercial voyage, the on-board measuring device and various operation data acquisition systems were mounted on the vessel for long-term performance monitoring and the voyage operated under the container loading condition close to the design draft was adopted. The content of this paper consists of Part I and Part II. Part I, such as this paper, contains the speed trial method and analysis results of the operating vessel. Part II contains the analysis of the speed-power characteristics change over time and before and after hull cleaning using operation data measured from the voyage operated under a condition similar to the speed trial.