• Title/Summary/Keyword: Data Set Comparing

Search Result 409, Processing Time 0.031 seconds

The Parallel Corpus Approach to Building the Syntactic Tree Transfer Set in the English-to- Vietnamese Machine Translation

  • Dien Dinh;Ngan Thuy;Quang Xuan;Nam Chi
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.382-386
    • /
    • 2004
  • Recently, with the machine learning trend, most of the machine translation systems on over the world use two syntax tree sets of two relevant languages to learn syntactic tree transfer rules. However, for the English-Vietnamese language pair, this approach is impossible because until now we have not had a Vietnamese syntactic tree set which is correspondent to English one. Building of a very large correspondent Vietnamese syntactic tree set (thousands of trees) requires so much work and take the investment of specialists in linguistics. To take advantage from our available English-Vietnamese Corpus (EVC) which was tagged in word alignment, we choose the SITG (Stochastic Inversion Transduction Grammar) model to construct English- Vietnamese syntactic tree sets automatically. This model is used to parse two languages at the same time and then carry out the syntactic tree transfer. This English-Vietnamese bilingual syntactic tree set is the basic training data to carry out transferring automatically from English syntactic trees to Vietnamese ones by machine learning models. We tested the syntax analysis by comparing over 10,000 sentences in the amount of 500,000 sentences of our English-Vietnamese bilingual corpus and first stage got encouraging result $(analyzed\;about\;80\%)[5].$ We have made use the TBL algorithm (Transformation Based Learning) to carry out automatic transformations from English syntactic trees to Vietnamese ones based on that parallel syntactic tree transfer set[6].

  • PDF

The Integrated Methodology of Rough Set Theory and Artificial Neural Network for Business Failure Prediction (도산 예측을 위한 러프집합이론과 인공신경망 통합방법론)

  • Kim, Chang-Yun;Ahn, Byeong-Seok;Cho, Sung-Sik;Kim, Soung-Hie
    • Asia pacific journal of information systems
    • /
    • v.9 no.4
    • /
    • pp.23-40
    • /
    • 1999
  • This paper proposes a hybrid intelligent system that predicts the failure of firms based on the past financial performance data, combining neural network and rough set approach, We can get reduced information table, which implies that the number of evaluation criteria such as financial ratios and qualitative variables and objects (i.e., firms) is reduced with no information loss through rough set approach. And then, this reduced information is used to develop classification rules and train neural network to infer appropriate parameters. Through the reduction of information table, it is expected that the performance of the neural network improve. The rules developed by rough sets show the best prediction accuracy if a case does match any of the rules. The rationale of our hybrid system is using rules developed by rough sets for an object that matches any of the rules and neural network for one that does not match any of them. The effectiveness of our methodology was verified by experiments comparing traditional discriminant analysis and neural network approach with our hybrid approach. For the experiment, the financial data of 2,400 Korean firms during the period 1994-1996 were selected, and for the validation, k-fold validation was used.

  • PDF

A Study on Reliability Analysis According to the Number of Training Data and the Number of Training (훈련 데이터 개수와 훈련 횟수에 따른 과도학습과 신뢰도 분석에 대한 연구)

  • Kim, Sung Hyeock;Oh, Sang Jin;Yoon, Geun Young;Kim, Wan
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.29-37
    • /
    • 2017
  • The range of problems that can be handled by the activation of big data and the development of hardware has been rapidly expanded and machine learning such as deep learning has become a very versatile technology. In this paper, mnist data set is used as experimental data, and the Cross Entropy function is used as a loss model for evaluating the efficiency of machine learning, and the value of the loss function in the steepest descent method is We applied the Gradient Descent Optimize algorithm to minimize and updated weight and bias via backpropagation. In this way we analyze optimal reliability value corresponding to the number of exercises and optimal reliability value without overfitting. And comparing the overfitting time according to the number of data changes based on the number of training times, when the training frequency was 1110 times, we obtained the result of 92%, which is the optimal reliability value without overfitting.

Characteristics of the Extratropical Transition of Tropical Cyclones over the Western North Pacific using the Cyclone Phase Space (CPS) Diagram (북서태평양에서 저기압 위상 공간도법을 이용한 태풍의 온대저기압화 특성 분석)

  • Lee, Ji-Yun;Park, Jong-Suk;Kang, KiRyong;Chung, Kwan-Young
    • Atmosphere
    • /
    • v.18 no.3
    • /
    • pp.159-169
    • /
    • 2008
  • The characteristics of the typhoon's extratropical transition (ET) over the western North Pacific area were investigated using the cyclone phase space (CPS) diagram method suggested by Hart (2003). The data used in this study were the global data assimilation prediction system (GDAPS) and NCEP data set. The number of typhoons selected were 75 cases during 2002 to 2007, and the three parameters were analyzed : the motion relative thickness asymmetry of the storm (B), the upper thermal wind shear and the lower thermal wind shear. Comparing the best-track data provided by the Regional Specialized Meteorological Center /Tokyo, the time of the ET based on CPS was 2~6 hours earlier than the best-track data. And it was shown that the 400- km and 30 kt wind radius of storm for the CPS method were better agreement than the previous suggested radius 500- km.

Influence on overfitting and reliability due to change in training data

  • Kim, Sung-Hyeock;Oh, Sang-Jin;Yoon, Geun-Young;Jung, Yong-Gyu;Kang, Min-Soo
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.2
    • /
    • pp.82-89
    • /
    • 2017
  • The range of problems that can be handled by the activation of big data and the development of hardware has been rapidly expanded and machine learning such as deep learning has become a very versatile technology. In this paper, mnist data set is used as experimental data, and the Cross Entropy function is used as a loss model for evaluating the efficiency of machine learning, and the value of the loss function in the steepest descent method is We applied the GradientDescentOptimize algorithm to minimize and updated weight and bias via backpropagation. In this way we analyze optimal reliability value corresponding to the number of exercises and optimal reliability value without overfitting. And comparing the overfitting time according to the number of data changes based on the number of training times, when the training frequency was 1110 times, we obtained the result of 92%, which is the optimal reliability value without overfitting.

Vegetation Change Detection in the Sihwa Embankment using Multi-Temporal Satellite Data (다중시기 위성영상을 이용한 시화 방조제 내만 식생변화탐지)

  • Jeong, Jong-Chul;Suh, Young-Sang;Kim, Sang-Wook
    • Journal of Environmental Science International
    • /
    • v.15 no.4
    • /
    • pp.373-378
    • /
    • 2006
  • The western coast of South Korea is famous for its large and broad tidal lands. Nevertheless, land reclamation, which has been conducted on a large scale, such as Sihwa embankment construction project has accelerated coastal environmental changes in the embankment inland. For monitoring of environmental change, vegetation change detecting of the embankment inland were carried out and field survey data compared with Landsat TM, ETM+, IKONOS, and EOC satellite remotely sensed data. In order to utilize multi-temporal remotely sensed images effectively, all data set with pixel size were analyzed by same geometric correction method. To detect the tidal land vegetation change, the spectral characteristics and spatial resolution of Landsat TM and ETM+ images were analyzed by SMA(spectral mixture analysis). We obtained the 78.96% classification accuracy and Kappa index 0.2376 using March 2000 Landsat data. The SMA(spectral mixture analysis) results were considered with comparing of vegetation seasonal change detection method.

Development of tool-life prediction program to determine the optimal machining conditions in mold machining (금형 가공 시 최적 가공조건을 결정하기 위한 공구수명 예측 프로그램 개발)

  • Soon-Ok Park;Min-Hak Kim;Sun-Kyung Lee;Sung-Taek Jung
    • Design & Manufacturing
    • /
    • v.17 no.1
    • /
    • pp.7-12
    • /
    • 2023
  • Recently, with the emergence of the 4th industrial revolution, the demand for smart factories and factory automation is increasing. In this study, a tool life prediction program was developed to select optimal machining conditions using CNC milling equipment, which is widely used in flexible production and automation. The equipment used in the experiment was Hwacheon Machine Tool's 5-axis machining equipment, and the tool used was a 17F2R tool. For the machining path, the down-milling cutting method was selected and long-term machining was performed. The analysis standard for side wear on the tool was set at 0.1 to 0.2 mm, and tool life data and wear data were obtained in the cutting experiment. The program was created through the data obtained from the experiment, and a prediction rate of over 90% was secured when comparing the experimental data and the predicted data.

  • PDF

Predicting nutrient excretion from dairy cows on smallholder farms in Indonesia using readily available farm data

  • Al Zahra, Windi;van Middelaar, Corina E.;de Boer, Imke J.M;Oosting, Simon J.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.12
    • /
    • pp.2039-2049
    • /
    • 2020
  • Objective: This study was conducted to provide models to accurately predict nitrogen (N) and phosphorus (P) excretion of dairy cows on smallholder farms in Indonesia based on readily available farm data. Methods: The generic model in this study is based on the principles of the Lucas equation, describing the relation between dry matter intake (DMI) and faecal N excretion to predict the quantity of faecal N (QFN). Excretion of urinary N and faecal P were calculated based on National Research Council recommendations for dairy cows. A farm survey was conducted to collect input parameters for the models. The data set was used to calibrate the model to predict QFN for the specific case. The model was validated by comparing the predicted quantity of faecal N with the actual quantity of faecal N (QFNACT) based on measurements, and the calibrated model was compared to the Lucas equation. The models were used to predict N and P excretion of all 144 dairy cows in the data set. Results: Our estimate of true N digestibility equalled the standard value of 92% in the original Lucas equation, whereas our estimate of metabolic faecal N was -0.60 g/100 g DMI, with the standard value being -0.61 g/100 g DMI. Results of the model validation showed that the R2 was 0.63, the MAE was 15 g/animal/d (17% from QFNACT), and the RMSE was 20 g/animal/d (22% from QFNACT). We predicted that the total N excretion of dairy cows in Indonesia was on average 197 g/animal/d, whereas P excretion was on average 56 g/animal/d. Conclusion: The proposed models can be used with reasonable accuracy to predict N and P excretion of dairy cattle on smallholder farms in Indonesia, which can contribute to improving manure management and reduce environmental issues related to nutrient losses.

Effect of different arch widths on the accuracy of three intraoral scanners

  • Kaewbuasa, Narin;Ongthiemsak, Chakree
    • The Journal of Advanced Prosthodontics
    • /
    • v.13 no.4
    • /
    • pp.205-215
    • /
    • 2021
  • PURPOSE. The purpose of this study was to compare the accuracy of three intraoral scanner (IOS) systems with three different dental arch widths. MATERIALS AND METHODS. Three dental models with different intermolar widths (small, medium, and large) were attached to metal bars of different lengths (30, 40, and 50 mm). The bars were measured with a coordinate measuring machine and used as references. Three IOSs were compared: TRIOS 3 (TRI), True Definition (TD), and Dental Wings (DW). The relative length and angular deviation of both ends of the metal bars from the scan data set (n = 15) were calculated and analyzed. RESULTS. Comparing among scanners in terms of trueness, the relative length deviation of DW in the small (1.28%) and medium (1.08%) arches were significantly higher than TRI (0.46% and 0.48%) and TD (0.33% and 0.18%). The angular deviation of DW in the small (1.75°) and medium (1.83°) arches were also significantly greater than TRI (0.63° and 0.40°) and TD (0.55° and 0.89°). Comparing within scanner, the large arch of DW showed better accuracy than other arch sizes (P < .05). On the other hand, the larger arch of TD presented a greater tendency of angular deviation in terms of trueness. No significant differences were found in terms of trueness between the arch widths of TRI group. CONCLUSION. The different widths of the dental arches can affect the accuracy of some intraoral scanners in full arch scan.

Comparing Survival Functions with Doubly Interval-Censored Data: An Application to Diabetes Surveyed by Korean Cancer Prevention Study (이중구간중도절단된 생존자료의 생존함수 비교를 위한 검정: 한국인 암 예방연구 중 당뇨병에의 응용)

  • Jee, Sun-Ha;Nam, Chung-Mo;Kim, Jin-Heum
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.595-606
    • /
    • 2009
  • Two tests were introduced for comparing several survival functions with doubly interval-censored data and illustrated with data surveyed by Korean Cancer Prevention Study (Jee et al., 2005). The test which extended Kim et al. (2006)'s test to the doubly interval-censored data has an advantage over Sun (2006)'s test in terms of saving computation time because the proposed test only depends on the size of risk set, and also the proposed test is applicable to continuous failure time data as well as discrete failure time data unlike Sun's test. Comparing male with female groups on the incubation time of diabetes was highly different and the survival of female group was longer than that of male one. Regardless of gender, the difference in survival functions of four age groups was highly significant with p-value of less than 0.001. This trend was more remarkable for female group than for male one. Simulation results showed that the significance level of both tests was well controlled and the proposed test was better than Sun's test in terms of power.