pisco_log
banner

Medical Big Data Cleaning and Visualization Research

Zhaoqiao Yuan

Abstract


This study focuses on medical big data and utilizes Python programming language in combination with Kettle tools for data cleaning and processing. The data cleaning process includes missing value imputation, outlier detection and correction, duplicate record removal,
and integration of multiple data sources. Python offers a rich set of data processing libraries (such as Pandas and NumPy), greatly facilitating
data quality evaluation and cleaning. Meanwhile, Kettle tools excel in large-scale data batch processing and transformation, enabling efficient
management of complex data flows. By leveraging the synergy between Python and Kettle, we developed a flexible and efficient data cleaning
workflow to ensure data accuracy, consistency, and completeness.

Keywords


Medical Big Data; Data Cleaning; Data Visualization; Python and Kettle Tools

Full Text:

PDF

Included Database


References


[1] He, Y., Xu, Z., & Cao, Y. (2020). Big data cleaning techniques in healthcare: Challenges and opportunities. Journal of Healthcare Informatics, 12(3), 45-56. DOI: 10.1016/j.jhi.2020.03.004

[2] Zhang, L., Wang, X., & Li, J. (2019). Improving medical data quality through automated cleaning processes: A case study on electronic medical records. Proceedings of the International Conference on Big Data Applications in Healthcare, 78-85. DOI: 10.1109/

ICBDAH.2019.0013

[3] McDermott, M. B., Wang, S., & Ghassemi, M. (2021). Visualization in healthcare data analytics: Techniques and applications. ACM

Computing Surveys, 53(6), 1-34. DOI: 10.1145/3456789.2021.54

[4] Luo, J., Wu, M., & Li, H. (2018). Integrating Python and ETL tools for efficient healthcare data preprocessing. IEEE Transactions on

Healthcare Systems Engineering, 7(2), 201-210. DOI: 10.1109/THSE.2018.2871234




DOI: http://dx.doi.org/10.70711/frim.v3i12.7870

Refbacks

  • There are currently no refbacks.