Back to Posts

AI Engineering is (still) Systems Engineering

There is a lot of hype around AI engineering, and well, for good reasons. The capabilities are real. New AI frameworks for agent orchestration, prompt management, and context management show up every week. From what I see, this pace is in equal measures exciting and irritating.

Here is what I have come to realize, and more and more people are having the same epiphany: AI engineering is still fundamentally systems engineering (or much closer to Systems Engineering than Machine Learning Engineering). If I had to put a number on it, I would say it is roughly 80 percent systems engineering and 20 percent machine learning infrastructure engineering.

Think back to microservices. You had meshes of services with clear APIs, a storage layer behind each one, business logic in the middle, and scaling policies to ride traffic waves. Today, we build meshes of agents, inference calls, and data sources, coordinated by orchestration. The nouns changed. The verbs really haven't.

The first principles have not moved. How we scale, isolate failure, and reason about latency and availability still applies. What has changed, perhaps, is the composition. We added dynamic, probabilistic components and control flows, so we design around that reality with guardrails, fallbacks, and strong observability. Also, much of the code that used to take hours now takes minutes. Pretrained models with long context windows can draft boilerplate and suggest nontrivial scaffolding. That does not erase software engineers, but it does change the economics. Teams ship more with fewer keystrokes. The center of gravity shifts toward systems thinking, orchestration, evaluation, and product judgment rather than line count output.


So What Actually Stays the Same

First Principles

Separation of concerns, abstraction and encapsulation, modularity and composability, loose coupling with high cohesion, scalability, and reliability. None of these are invalidated by AI. If anything, they matter more now.

Architecture Patterns

MVC still makes sense. Three-tier architecture still works. Your micro-services mesh is still a solid structure. You still need to:

  • Fetch data through APIs or through tools provided to LLMs
  • Store data in appropriate databases such as SQL, NoSQL, and Elasticsearch
  • Take actions through API calls, whether triggered by rules or by AI agents
  • Think about latency-sensitive paths
  • Design for availability and consistency
  • Build workflows and user flows

The fundamentals of how systems communicate, how data moves through layers, and how you architect for scale are still the fundamentals.


What's the net new

Workflows have become more Dynamic

Traditional systems handled a small set of specific use cases with deterministic, rule-based logic. You could predict exactly what would happen for a given input.

AI-enabled systems can handle hundreds of use cases with more general and adaptive logic. The tradeoff is that success is no longer binary. You move from it works or it does not to it works most of the time with varying quality.

This mirrors natural intelligence. The result is more flexible and capable systems, with probabilistic outcomes instead of deterministic guarantees.

Implication: design hybrid systems. Use deterministic logic for mission-critical paths. Use AI for flexibility and adaptability. Build failsafes and fallbacks. Add explainability so you can trace how decisions were made.

Data architecture has evolved

Your structured data stores still matter. SQL, NoSQL, and search indexes remain core. Two additions grow in importance:

Semantic and relevance-based matching. Because LLMs can serve diverse use cases, fetching contextually relevant information even when you cannot predict the exact query is essential. Vector databases and semantic search add to your data layer.

Relationship graphs. Knowledge graphs replace giant trees of if else logic. Instead of hardcoding every relationship, model entities and their connections. The LLM can navigate those connections to answer complex queries without you anticipating every question.

Intelligence Layer for Orchestration

Add a dynamic orchestration layer that makes context-aware decisions about when and how to use your existing components. Your system still however likely has:

  • A presentation layer, the view
  • Business logic, the controller
  • Data storage, the model

The new piece is an intelligent orchestration layer that can route based on context, compose multiple API calls, decide whether to retrieve context before inference or during inference, and choose actions without hardcoded rules. This layer does not replace your architecture. It sits above it and uses your existing primitives well.

Infrastructure Considerations

Another area that has arguably become more crucial is infrastructure considerations for model hosting, distributed training, and training optimizations.

Please note this is machine learning engineering, not machine learning science, to be specific. You can call it machine learning engineering or machine learning infrastructure engineering.

A couple of years back, it was fine for software engineers or product engineers to build systems in parallel to the machine learning team. Today, the impact of how fast you can train your models and how fast you can do inferencing on different GPUs is too big to ignore. It affects both cost and customer experience directly. Which is why, if you are operating at a principal or architect level, you need to understand these implications as well.

You're now hosting ML models alongside your services, which has implications at multiple levels: Kernel level (GPU scheduling, memory management), Compiler level (model optimization, quantization), Chip level (GPU vs CPU trade-offs, specialized AI accelerators).

Basics like model parallelism and data parallelism, asynchronous and synchronous training, and fine-tuning are really useful. But again, this is infrastructure engineering, not machine learning research.


A Gentle but Practical Framework for AI System Design

1. Map Your Decision Points

Identify where your workflow makes decisions based on data or context. For each point, ask, does this need to be deterministic for mission-critical or safety-critical reasons, or can this be probabilistic for flexibility and adaptation? Design deterministic guardrails for critical paths. Use AI for flexible paths.

2. Classify Your Data Sources

Understand what data you have and how to access it:

  • Structured data through SQL queries and API calls
  • Unstructured data through semantic search and embeddings
  • Relationships through knowledge graphs

Model your data so AI can access it effectively.

3. Categorize Your Problems

For each decision, identify the actual problem type:

  • Classification for choosing among discrete options
  • Regression for predicting a continuous value
  • Retrieval for finding relevant information
  • Search for exploring a space of possibilities
  • True generation for creating new content

Do not default to using an LLM for everything.

4. Build Explainability In

Every decision should leave breadcrumbs. Keep audit trails that show what context was considered, what tools were called, what reasoning was applied, and why that path was chosen. This is how you debug, improve, and maintain trust.


My Two Cents

If you are a software engineer adapting to AI, here is what I have learned so far.

Do not abandon fundamentals. Systems design, architecture, APIs, databases, scaling, and reliability still matter. They may matter more because systems are now more complex and less predictable.

Think one abstraction higher and practice it. Focus on orchestration. Compose AI capabilities with traditional services. Design flows, not only code. You build this muscle through deliberate practice.

Know the new toolkit. Understand model types, when to use RAG versus finetuning, how to structure prompts, and how tools and function calling work. Treat these as new primitives, not replacements for everything you know.

Build intuition with small projects. Start small. Add simple AI enabled features. Learn where AI shines and where it struggles. Develop taste for when to use it and when not to.

Get comfortable with probabilistic systems. You are not in a binary world anymore. Think in terms of success rates, edge cases, and graceful degradation. Build systems that work most of the time and fail safely when they do not.


The hype will normalize. The dust will settle. AI did not replace software engineering. It expanded it. We are still building systems. We are still solving problems. We are still translating human needs into technical solutions. We simply have new and powerful tools. The engineers who thrive are likely the ones who stay grounded in first principles, integrate AI thoughtfully, and keep the discipline of good systems engineering.

Because at the end of the day, AI engineering is systems engineering. The fundamentals have not changed. You might be more prepared than you think.