Episode 22 — Large Language Models: What They Can and Can’t Do

Tool orchestration can be defined as the practice of managing when, how, and in what sequence multiple tools are invoked within a system so that complex tasks can be carried out reliably. Whereas basic tool use equips a model with the ability to call a single external function, orchestration involves coordinating many such calls together, often across heterogeneous services. The orchestrator acts as the conductor of a digital symphony, ensuring that each tool enters at the right moment, contributes its part, and exits without disrupting the whole performance. Without orchestration, tool use risks devolving into chaos, with overlapping requests, redundant calls, and cascading failures. Orchestration provides order and structure, giving developers and organizations confidence that their AI systems can execute workflows that are not only accurate but also predictable, safe, and efficient.

Routing is one of the central concepts in orchestration. It refers to the logic that determines which tool should be selected for a given task or user request. A routing system may be as simple as a set of rules—such as “if the query involves math, send it to the calculator”—or as sophisticated as a learned model that evaluates intent, context, and tool availability before deciding. Routing is critical because no single tool can solve every problem. A retrieval database, a summarization model, and a calculator each excel in different domains. Routing ensures that each is only used when appropriate, preventing misfires and wasted effort. In this way, routing acts like a traffic controller, guiding requests to the right destination so that the overall system operates smoothly.

Decision-making in routing can be designed in different ways, each with its strengths and weaknesses. Rule-based routing provides predictability and simplicity, making it easy to understand why a given decision was made. Learned routing, by contrast, uses data to adapt dynamically, learning patterns of when tools are most useful. Hybrid approaches combine the two, relying on rules for safety-critical decisions and learned models for flexibility. Choosing the right routing strategy depends on context. Enterprises with strict compliance needs may prefer rule-based systems, while consumer applications that prioritize adaptability may benefit from learning-based strategies. In either case, decision-making in routing reflects the tension between control and adaptability, a theme that runs through all aspects of orchestration.

Retries play an important role in making orchestration resilient. Even the most reliable tools occasionally fail due to network glitches, temporary overloads, or transient errors. Without retries, these momentary failures would translate into incomplete or incorrect workflows. Retries provide a safety net: they allow the orchestrator to attempt the same tool call again, often with exponential back-off or adjusted parameters, to maximize the chance of success. This is similar to a person redialing a phone number when the first attempt does not connect. However, retries must be designed carefully. Excessive retries can increase latency and cost, while insufficient retries may reduce resilience. Striking the right balance is essential for building workflows that recover gracefully from inevitable imperfections in real-world systems.

Fallback mechanisms add another layer of protection by providing alternative strategies when a tool fails repeatedly. Instead of endlessly retrying the same service, the orchestrator can switch to a backup tool, provide a simplified answer, or escalate the request for human review. Consider a weather application: if the primary weather API is down, the system might fall back to a secondary provider, ensuring users still receive service, albeit with potentially reduced accuracy. Fallbacks prevent total failure, turning potential breakdowns into degraded but acceptable performance. Designing effective fallback strategies requires anticipating failure scenarios in advance and ensuring that alternatives are trustworthy. Fallbacks embody the principle of resilience: even if one pathway fails, another can carry the system forward.

Circuit breakers are another safety feature that orchestration systems adopt from broader software engineering practices. A circuit breaker prevents repeated calls to a tool that is already failing, protecting the rest of the system from unnecessary load and preventing a cascading collapse. The metaphor comes from electrical engineering, where a breaker stops the flow of current when a fault occurs, preventing damage to the system. In orchestration, circuit breakers act similarly: they “trip” when error rates exceed a threshold, halting further calls until the tool recovers. This protects both the failing tool and the orchestrator itself from waste and instability. Circuit breakers thus embody the idea that sometimes the safest action is to stop trying and wait for recovery rather than persisting blindly.

Error handling is at the heart of orchestration because failures are inevitable in any multi-tool workflow. Users expect systems to provide helpful responses even when parts of the pipeline malfunction. Error handling ensures that when something goes wrong, the failure is caught, explained, and resolved rather than ignored. For example, if a summarization tool produces incomplete output, the orchestrator might retry with shorter text, fall back to another summarizer, or notify the user of partial results. Transparent error handling preserves trust: users are more forgiving of occasional errors if the system handles them gracefully. Poor error handling, by contrast, leads to frustration, confusion, and erosion of confidence. Orchestration succeeds not by avoiding all failures but by managing them well.

Workflows can be executed in either parallel or sequential fashion, and orchestration must decide which approach is best for each case. Parallel execution allows multiple tools to work simultaneously, reducing overall latency. For instance, a research assistant might query multiple databases at once, retrieving results more quickly than querying them one by one. Sequential execution, however, is necessary when later steps depend on earlier results. An orchestrator planning a financial report must first retrieve data, then calculate summaries, then format them for presentation. Orchestration thus requires not only choosing tools but also managing timing and dependencies, balancing speed with correctness. Choosing between parallel and sequential execution is a fundamental design decision that affects performance, cost, and user experience.

Coordination overhead is an unavoidable consequence of orchestration. Every additional tool call, retry, or fallback introduces complexity and latency. While orchestration improves reliability and capability, it also adds layers of monitoring, decision-making, and integration that increase system load. This is similar to project management in human organizations: coordinating multiple teams produces better outcomes but requires meetings, schedules, and communication overhead. The challenge is to keep orchestration lean enough to be efficient while robust enough to handle failures gracefully. Overhead is not inherently bad, but it must be managed consciously, ensuring that the benefits of orchestration outweigh the costs.

Observability is a crucial part of orchestration because systems cannot be improved if they cannot be seen. Logs, traces, and monitoring dashboards provide the visibility needed to understand how tools are being used, where failures occur, and how performance can be improved. Observability allows developers to detect whether retries are working, whether fallbacks are being triggered too often, or whether circuit breakers are tripping unnecessarily. Without observability, orchestration is a black box, and errors may go unnoticed until users are affected. With observability, orchestration becomes transparent and manageable, allowing organizations to refine performance, reduce costs, and build trust.

Industry use cases illustrate the value of orchestration in action. In document processing, retrieval, summarization, and translation tools may be orchestrated together to handle multilingual compliance reports. In customer service, a conversational model may orchestrate calculators, knowledge bases, and scheduling tools to resolve complex inquiries. In financial analysis, orchestration connects data retrieval, forecasting models, and visualization services into a coherent workflow. These examples show that orchestration is not theoretical; it is already shaping the way enterprises use AI to handle real-world tasks. Each case demonstrates how orchestration transforms isolated tools into end-to-end solutions.

Balancing flexibility and control is one of the great design challenges in orchestration. Too much flexibility can lead to unpredictable behavior, with tools being invoked in chaotic ways that confuse users or increase costs. Too much control can make the system rigid, unable to adapt to new tasks or unexpected conditions. Striking the right balance requires carefully defining boundaries—schemas, rules, and safeguards—while allowing enough freedom for adaptive routing and fallback strategies. This balance reflects a deeper truth: orchestration is not about choosing control or flexibility, but about weaving the two together so that systems are both adaptable and safe.

Scalability is a defining concern for large organizations that orchestrate hundreds of tools across thousands of workflows. Small-scale orchestration may be manageable with ad hoc scripts, but enterprises require formal frameworks that can monitor, govern, and coordinate at scale. Scalability means ensuring that workflows remain efficient even as more tools are added, that monitoring systems can track thousands of simultaneous calls, and that governance can enforce policies across the entire ecosystem. Without scalability, orchestration becomes fragile as it grows, collapsing under its own complexity. Designing for scalability from the outset ensures that orchestration remains an enabler of growth rather than a bottleneck.

Governance in orchestration ensures that tool use aligns with compliance requirements, ethical standards, and organizational policies. An orchestrator does not just decide which tool to call; it also enforces who is allowed to use each tool, under what conditions, and with what data. Governance ensures that sensitive information is not passed to untrusted services, that audit trails are maintained, and that systems meet regulatory obligations. For example, in healthcare, governance may enforce that patient data is only accessed through approved medical databases. In finance, it may ensure that trading tools are only invoked under authorized conditions. Governance transforms orchestration from a technical feature into a mechanism of organizational accountability.

Orchestration leads directly into the topic of planning. Planning strategies are about deciding not just which tools to call but how to sequence them, how to branch workflows based on outcomes, and how to manage resources efficiently. Orchestration provides the framework, while planning provides the strategy that fills it. Together, they define the future of AI systems: not just reactive generators but active participants in workflows that span multiple tools, tasks, and domains.

Service reliability is one of the foundational concerns in tool orchestration, because even the most carefully designed workflows fail if the services they depend on are unreliable. Orchestrators must constantly measure whether tools are available, how quickly they respond, and whether the results they return are correct and usable. This is similar to how a logistics company tracks the reliability of its delivery trucks: if one breaks down too often, the entire supply chain is disrupted. In AI systems, tools that suffer frequent downtime or delays undermine user trust and reduce the overall value of orchestration. To mitigate this, orchestrators maintain records of service uptime, response times, and error rates, often preferring to route requests to more reliable tools when multiple options exist. Service reliability, therefore, is not a static property but an ongoing measurement, a living signal that orchestration uses to guide decision-making and ensure continuity of service for users.

Latency trade-offs emerge whenever retries, fallbacks, or verification steps are introduced into orchestration. On one hand, these mechanisms improve resilience by ensuring that workflows succeed even in the face of errors. On the other, each retry adds time, and each fallback may involve slower or less efficient tools. For example, if a primary weather service times out and the orchestrator falls back to a slower secondary provider, the user still gets an answer but waits longer. Balancing reliability and speed is therefore one of the most difficult aspects of orchestration design. Too much emphasis on speed risks fragile systems that break under minor failures. Too much emphasis on reliability risks frustrating delays. The trade-off cannot be eliminated, but it can be managed through careful design, such as limiting retries, prioritizing faster fallbacks, or setting timeouts that preserve responsiveness while still capturing resilience.

Dynamic orchestration represents an emerging shift from static rules toward adaptive decision-making. In static orchestration, routing, retries, and fallbacks are predetermined and unchanging, like a recipe followed step by step. In dynamic orchestration, the system learns from observed performance, adapting routing choices and retry strategies in real time. For instance, if one database has been responding slowly all day, the orchestrator may reduce its usage automatically, favoring a faster but slightly less comprehensive service. If a tool consistently fails in a certain type of workflow, the orchestrator may preemptively avoid it. Dynamic orchestration introduces flexibility and responsiveness, making systems feel more resilient and intelligent. However, it also introduces unpredictability, which must be balanced with transparency. Users and operators need to understand why certain tools are chosen, even as orchestration adapts dynamically to changing conditions. This combination of adaptability and accountability defines the cutting edge of orchestration design.

Testing orchestrated pipelines is critical because orchestration involves multiple moving parts that interact in complex ways. End-to-end testing ensures that workflows function correctly, not only when each tool is working individually but also when tools are combined. Consider a pipeline that retrieves documents, summarizes them, and then translates the results into another language. Each tool might perform well in isolation, but when chained together, subtle issues can arise—such as truncation of inputs, mismatched formats, or timeouts cascading through the system. Testing must therefore include both unit-level checks and full workflow simulations, verifying that the orchestration logic holds up under realistic conditions. Automated test suites, synthetic queries, and stress testing all play roles in validating pipelines. Without rigorous testing, orchestration risks becoming fragile, with errors only surfacing in production when users are directly affected.

Human oversight in orchestration is often overlooked but remains vital, especially for sensitive tool calls that carry ethical, legal, or safety implications. For example, in financial systems, an orchestrator may coordinate tools that execute trades or calculate risk exposure. Blind automation without oversight could result in errors that cost millions or trigger regulatory violations. Similarly, in healthcare, orchestrators may invoke diagnostic tools whose results must be reviewed by a physician before being acted upon. Oversight introduces a checkpoint, ensuring that certain categories of tool calls are verified by human experts before final execution. This slows workflows but provides accountability and reduces the risk of catastrophic failure. In practice, orchestration frameworks often include configurable “approval layers,” where certain tools or tasks require review. This reflects the broader principle that AI orchestration is not only about efficiency but also about responsibility.

Security implications in orchestration are especially significant because each tool call represents a potential point of vulnerability. Malicious inputs could attempt to exploit tools, injecting unintended commands or accessing sensitive information. Prompt injection is a particular risk: cleverly crafted inputs might trick a model into routing calls to tools in unsafe ways. Orchestration must therefore include validation layers that sanitize inputs, enforce schemas strictly, and block unsafe calls. Similarly, responses from tools must be validated to prevent poisoning of downstream workflows. Security in orchestration is not just a matter of protecting tools individually but of safeguarding the entire pipeline against cascading threats. Robust governance, access controls, and monitoring must be in place, ensuring that orchestration cannot be abused as a backdoor into systems that should remain protected.

Data consistency becomes a pressing concern when workflows involve retries, parallel calls, or multi-step processes. For example, if a financial tool is retried after an error, but the underlying data has changed between attempts, the results may be inconsistent or misleading. Similarly, parallel tool calls may retrieve different snapshots of data, creating contradictions. Orchestration must therefore ensure that workflows maintain logical consistency, often by timestamping inputs, enforcing transaction-like semantics, or reconciling outputs from multiple sources. Without consistency, users may lose trust in results, even if individual tools performed correctly. This highlights how orchestration is not just about connecting tools but about managing their interactions in ways that preserve coherence and reliability across entire workflows.

Integration with agent frameworks is one of the most visible applications of orchestration today. Agents are systems designed to reason about tasks, plan multi-step workflows, and invoke tools dynamically as needed. Orchestration is what makes this possible. It provides the underlying mechanism for routing, retries, fallbacks, and monitoring, enabling agents to operate as reliable multi-tool coordinators. Without orchestration, agents would be brittle, collapsing when tools failed or producing chaotic workflows with inconsistent results. With orchestration, agents gain resilience, transparency, and scalability. This makes orchestration not just a technical feature but a foundational capability for the agent paradigm, allowing AI to shift from passive responders to active collaborators that can plan, decide, and act across complex tasks.

Cost considerations play a growing role as orchestration scales. Each retry, fallback, or redundant tool call consumes computational resources and often incurs direct financial cost, especially when third-party APIs are involved. For enterprises running large volumes of orchestrated workflows, these costs can accumulate quickly, turning resilience into a budgetary challenge. Designers must weigh the benefits of retries and redundancy against the expense of wasted calls. Cost-aware orchestration strategies include limiting retries, prioritizing lower-cost tools, or caching results to avoid repeated queries. Cost does not reduce the need for resilience, but it forces careful prioritization. In effect, orchestration design becomes an exercise in economics as much as engineering, balancing reliability with financial sustainability.

Research trends point toward adaptive orchestrators that learn from past outcomes to improve future performance. Instead of static retry counts or fixed fallback hierarchies, these systems analyze logs of past workflows, identifying which tools succeed most often, which fail under certain conditions, and how best to allocate calls. Adaptive orchestrators may even predict failures in advance, avoiding tools likely to struggle under current load. This trend reflects a broader shift in AI from static systems to adaptive, self-optimizing ones. Orchestration is not just being built—it is being trained, refined, and improved over time. This creates systems that grow more reliable and efficient the longer they run, learning from experience much as humans do.

Tool priority rules provide another way to manage orchestration intelligently. Not all tools are equal in reliability, speed, or accuracy. By ranking tools based on these attributes, orchestrators can prioritize the best options while keeping backups in reserve. For instance, a translation orchestrator may favor a fast, high-quality service for most queries but keep a slower, more specialized service as a fallback for rare languages. Priority rules reduce randomness and ensure that users benefit from the best possible service in most cases while still preserving resilience. These rules must be updated regularly, as tool performance changes over time, but they remain a simple and powerful way to structure orchestration.

Multimodal orchestration introduces unique challenges because tools may operate across different modalities, such as text, vision, and audio. Coordinating these tools requires careful design to ensure that outputs align. For example, an educational system might retrieve an image, generate a caption, and produce an audio explanation, all within one workflow. Orchestration ensures that each step is executed in the right order and that errors in one modality do not derail the others. Multimodal orchestration expands the possibilities of AI applications but also multiplies the risks of failure. Each additional modality adds complexity, requiring orchestration systems that can manage heterogeneous tools seamlessly. This represents the frontier of orchestration, where AI systems begin to act across the full spectrum of human senses and communication channels.

Maintenance challenges are inevitable in orchestration systems because tools evolve over time. Some are updated with new features, others are retired, and new ones are added. Orchestration frameworks must adapt to these changes without disrupting workflows. This requires modular design, where tools can be swapped in or out with minimal reconfiguration. It also requires continuous monitoring to detect when tools begin to degrade in performance, prompting updates to routing and fallback strategies. Maintenance is not glamorous, but it is essential. Without it, orchestrators become outdated, brittle, and unable to manage the changing landscape of tools they are meant to coordinate.

The future of orchestration points toward more autonomous decision-making, where orchestrators not only execute predefined workflows but also learn to plan and adapt without constant human intervention. As systems grow more complex, manual orchestration becomes unsustainable. Autonomous orchestrators will decide which tools to use, how many times to retry, when to fallback, and when to stop entirely, all based on context and learned experience. This autonomy will make systems more efficient but also raises new challenges of transparency and governance. Users and organizations must be able to trust that orchestration decisions are safe, ethical, and aligned with policy. The evolution toward autonomy thus amplifies the importance of observability, governance, and accountability in orchestration frameworks.

Finally, orchestration connects naturally to planning strategies, the subject of the next episode. While orchestration ensures that multiple tools can be managed reliably, planning strategies determine how these tools are sequenced, prioritized, and adapted to user goals. Orchestration provides the mechanism, but planning provides the intelligence that drives it. Together, they transform AI systems from static pipelines into dynamic collaborators capable of complex, multi-step reasoning.

Episode 22 — Large Language Models: What They Can and Can’t Do
Broadcast by