Methodologies for Designing Enterprise-Grade Data Pipelines for AI Agents in Regulated Industries

Authors

  • Shanmuka Siva Varma Chekuri

Keywords:

enterprise data pipelines, lakehouse architecture, ACID transactions, feature store engineering, AI agents, regulated industries, compliance automation, multi-region data governance, financial data reliability, payroll systems

Abstract

The study examines engineering methodologies for building enterprise-grade data pipelines that support AI agents under stringent regulatory constraints in finance, insurance, healthcare, and payroll domains. The research addresses the fragmentation of multi-region data flows, the heterogeneity of legacy ERP environments, and the growing load from AI workloads that depend on reproducible, ACID-compliant lakehouse architectures and feature-store-centric design. The work generalizes recent advances in Delta Lake–based reliability patterns, AI/ML-optimized lakehouses, feature stores, and AI-driven compliance automation, integrating them into a unified blueprint for multi-tenant, audit-ready pipelines. The goal of the article is to synthesize a reliability framework for financial data, a multi-tenant AI lakehouse model for payroll, and a novel multi-file validation and reconciliation pattern for high-risk financial ETL. Comparative analysis, source criticism, and architectural synthesis are applied to a curated set of recent scientific and professional publications. The conclusions describe how these patterns reduce reconciliation effort, strengthen regulatory assurance, and create AI-ready data foundations. The article targets data engineers, architects, and technical leaders who design AI-enabled systems in regulated industries.

Author Biography

  • Shanmuka Siva Varma Chekuri

    Data Engineer, American Software Group (ASG), United States, New Jersey

References

[1] Aileni, A. R. (2025). AI/ML optimized lakehouse architecture: A comprehensive framework for modern data science. World Journal of Advanced Engineering Technology and Sciences, 15(2).

[2] Anichukwueze, C. C., Osuji, V. C., & Oguntegbe, E. E. (2025). Enterprise-wide AI-driven compliance framework for real-time cross-border data transfer risk mitigation. Computer Science & IT Research Journal, 6(1).

[3] Armbrust, M., Das, T., et al. (2020). Delta Lake: High-performance ACID table storage over cloud object stores. Proceedings of the VLDB Endowment, 13(12), 3411–3424.

[4] Boosa, S. (2025). AI-augmented continuous delivery in regulated industries: A compliance-first strategy. International Journal of AI, BigData, Computational and Management Studies, 6(1), 106–115. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V6I1P111

[5] de la Rúa Martínez, J., Buso, F., Kouzoupis, A., Ormenisan, A. A., Niazi, S., et al. (2024). The Hopsworks feature store for machine learning. In Proceedings of the 2024 ACM SIGMOD/PODS Conference Companion (pp. 135–147).

[6] Kaul, D. (2024). AI-powered autonomous compliance management for multi-region data governance in cloud deployments. Journal of Current Science and Research Review, 2(3), 82–98.

[7] Lee, D., Wentling, T., Haines, S., & Babu, P. (2024). Delta Lake: The definitive guide: Modern data lakehouse architectures with data lakes. O’Reilly Media.

[8] Oye, S. (2023). Scalable ETL architecture with Apache Spark and Delta Lake for big data warehouses.

[9] Steidl, M., Felderer, M., & Ramler, R. (2023). The pipeline for the continuous development of artificial intelligence models—Current state of research and practice. Journal of Systems and Software, 203, 111615. https://doi.org/10.1016/j.jss.2023.111615

[10] Swamy, A. H. (2025). Innovations in data lake architectures for financial enterprises. World Journal of Advanced Research and Reviews, 26(1), 1975–1982. https://doi.org/10.30574/wjarr.2025.26.1.1252

Downloads

Published

2026-01-26

Issue

Section

Articles

How to Cite

Shanmuka Siva Varma Chekuri. (2026). Methodologies for Designing Enterprise-Grade Data Pipelines for AI Agents in Regulated Industries. International Journal of Computer (IJC), 56(1), 377-388. https://www.ijcjournal.org/InternationalJournalOfComputer/article/view/2487