1 |
S.E. Whang and J.-G. Lee, "Data Collection and Quality Challenges for Deep Learning," Proceedings of the VLDB Endowment 13.12, pp. 3429-3431, 2020.
|
2 |
E. Caveness, et al., "Tensorflow Data Validation: Data Analysis and Validation in Continuous ML Pipelines," Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. pp. 2793-2794, 2020.
|
3 |
A. Jain, et al., "Overview and Importance of Data Quality for Machine Learning Tasks," Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 3561-3562, 2020.
|
4 |
A. Paleyes, R.-G. Urma, and N.D. Lawrence, "Challenges in Deploying Machine Learning: A Survey of Case Studies," arXiv preprint, arXiv:2011.09926, pp. 1-3, pp. 15-16, 2020.
|
5 |
T. Rukat, et al., "Towards Automated ML Model Monitoring: Measure, Improve and Quantify Data Quality," ML Ops Workshop at the Conference on Machine Learning and Systems (MLSys), pp. 1-2, 2019.
|
6 |
S. Saria and A. Subbaswamy, "Tutorial: Safe and Reliable Machine Learning," ACM Conference on Fairness, Accountability, and Transparency, Atlanta, Ga. pp. 1-3, 2019.
|
7 |
M. Armbrust, et al., "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics," CIDR, pp. 1-4, 2021.
|
8 |
V. Shah, K. Yang, and K. Kumar, "Improving Feature Type Inference Accuracy of TFDV with SortingHat," Corpus ID: 235273771, pp. 1-7, 2020.
|
9 |
J.-C. Kim, et al., "A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data," Journal of Korea Multimedia Society, Vol. 24. No. 2, pp. 245-246, 2021.
|
10 |
N. Hynes, D. Sculley, and M. Terry, "The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets," NIPS MLSys Workshop. pp. 1-3, 2017.
|
11 |
TFDV(2021), https://www.tensorflow.org/tfx/guide/tfdv (accessed October 8, 2021).
|
12 |
Cerberus(2021), https://docs.python-cerberus.org/en/stable/ (accessed October 8, 2021).
|
13 |
Voluptuous(2021), https://github.com/alecthomas/voluptuous (accessed October 8, 2021).
|
14 |
Pandera(2021), https://pandera.readthedocs.io/en/stable/ (accessed October 8, 2021).
|
15 |
K. Kumar, New Trends in Data Warehousing Techniques, ResearchGate, 2020.
|
16 |
E. Breck, et al., "Data Validation for Machine Learning," MLSys. pp. 2-4, 2019.
|
17 |
Laparoscopic Endoscopy Open Dataset(2021), https://opencas.webarchiv.kit.edu/?q=node/30 (accessed October 8, 2021).
|
18 |
L. Ruff, et al., "Deep One-class Classification," International Conference on Machine Learning, PMLR 80, pp. 3-5, 2018.
|
19 |
M. Hulsebos, et al., "Sherlock: A Deep Learning Approach to Semantic Data Type Detection," Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1500-1504, 2019.
|
20 |
P. Marquez-Neila and R. Sznitman, "Image Data Validation for Medical Systems," International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 1-2, 2019.
|
21 |
Y. Roh, et al., "A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective," IEEE Transactions on Knowledge and Data Engineering, Vol. 33, No. 4, pp. 1328-1330. 2021.
DOI
|
22 |
S.-H. Jeong, et al., "A Study on Classification Evaluation Prediction Model by Cluster for Accuracy Measurement of Unsupervised Learning Data," Journal of Korea Multimedia Society, Vol. 21. No. 7, pp. 779-780, 2018.
DOI
|
23 |
G. Pang, et al., "Deep Learning for Anomaly Detection: A Review," ACM Computing Surveys, Vol. 54, Issue 2, pp. 1-8, 2021.
DOI
|