What is the Operational Data Agent in Microsoft Fabric?

The Operational Data Agent (ODA) in Microsoft Fabric is a real-time, autonomous agent that monitors telemetry in KQL databases, reasons over data using business rules, and triggers actions like alerts, scaling, or workflows automatically.

Does ODA support lakehouse or warehouse tables?

Not currently. ODA supports only Kusto (KQL) databases in Fabric’s Real-Time Intelligence layer. Lakehouse and warehouse tables are not supported in the ‘Choose Table’ picker today.

What kind of actions can I configure with the Operational Data Agent?

You can configure ODA to trigger Power Automate flows, send Microsoft Teams messages, call webhooks, create service-tickets, or scale Azure resources depending on your operational logic.

How do I set up ODA in Fabric?

To set up an ODA, create a Real-Time Intelligence Operations Agent in Fabric, define business goals, choose a KQL database, configure rules and conditions, attach actions (with optional approval), test with historical data, generate a playbook, and then start the agent.

Is ODA safe for automated operations?

Yes — you can enforce governance by using a least-privilege identity, rate-limiting actions, requiring approvals, and logging decisions in an audit trail to ensure transparency and rollback capability.

Microsoft Fabric Operations Agent: Complete How-To Guide

Operations Agent in Microsoft Fabric — In-Depth Guide

Operations Agent – Table of Contents

What Is an Operations Agent?
Microsoft Fabric Real-Time Intelligence (RTI)
Why Operations Agents Are Critical for Operations
How to Create and Configure an Operations Agent
Supported Data Sources
Architecture & Data Flow
Example Operations Agent Configuration
Governance, Safety & Best Practices
Operations Agent vs Data Agent
Real-World Use Cases
Common Challenges & Pitfalls
Frequently Asked Questions (FAQ)
Resources & Further Learning

What Is an Operations Agent?

The Operations Agent is a core feature of Microsoft Fabric’s Real-Time Intelligence (RTI) platform. It continuously monitors streaming telemetry data such as metrics, logs, and events, then applies configurable rules and AI-based reasoning to analyze this data in near real-time.

Unlike passive monitoring tools, the Operations Agent proactively recommends or triggers actions—such as sending alerts, initiating workflows, or scaling resources—to help operational teams detect and respond to issues faster. Importantly, many actions require human approval, delivering a balance between automation and control.

Operations Agent workflow illustrating observe, analyze, decide, and act steps

Microsoft Fabric Real-Time Intelligence (RTI)

Operations Agents are built on the foundational Real-Time Intelligence (RTI) workload of Microsoft Fabric. RTI ingests high-velocity telemetry data from event sources like Azure Event Hubs or IoT Hub, storing it in Kusto (KQL) databases optimized for real-time analytics and detection at sub-second latencies.

RTI provides features such as Copilot for RTI, which enables natural language queries and low-code programming for analytic insights and operational reasoning. This integration empowers non-engineers to author detection logic and automate actions easily through the Fabric portal.

Why Operations Agents Are Critical for Operations

Operations Agents offer several key benefits to modern operational teams:

Rapid anomaly detection to identify patterns such as spikes, drops, or other irregular telemetry behavior before they escalate.
Automated recommendations and controls that propose or initiate actions like scaling resources or opening tickets, with configurable human approvals to maintain safety.
Context-enriched insights that correlate alert data with deployment versions, service configurations, and historical trends for more actionable intelligence.
Full audit and traceability through an automatically generated operations playbook capturing decisions, reasoning, and actions for compliance and incident review.
Business-aligned monitoring by defining agent goals based on high-level company priorities and service-level objectives (SLOs).

Operations Agents transform reactive monitoring into proactive operational intelligence with automation balanced by human-in-the-loop safeguards.

How to Create and Configure an Operations Agent

Follow these steps to create and set up an Operations Agent within Microsoft Fabric’s Real-Time Intelligence environment:

Open your Fabric workspace and choose Create → Real-Time Intelligence → Operations Agent.
Assign a meaningful name that reflects the agent’s monitoring focus, such as API Latency Watcher.
Define the business goals and monitoring objectives—for example, tracking latency, error rates, or resource utilization relevant to your service.
Author agent instructions to interpret telemetry data in the context of these goals. Use natural language descriptions supplemented with Kusto Query Language (KQL) snippets and AI reasoning guidelines to shape how the agent evaluates data.
Select your data source: Currently, this must be a KQL database table sourced from telemetry ingested into Fabric RTI.
Set up detection rules using KQL queries or thresholds that trigger when telemetry metric conditions are met—these can include statistical baselines, anomalies, or hard limits.
Define actions the agent can recommend or trigger, such as sending alerts to Microsoft Teams, invoking Power Automate workflows, calling webhooks, or opening service tickets.
Optionally configure approval gates, requiring specific actions to be manually approved through the Teams interface before execution, ensuring safety and auditability.
Save the operations agent configuration. Fabric generates an operations playbook encapsulating monitoring goals, detection logic, reasoning flow, and action sequences.
Install the Fabric Operations Agent Teams app for real-time notifications and approval workflows integrated within your collaboration environment.
Test the agent rigorously by replaying historical telemetry data to validate detection rules and action recommendations, tuning sensitivities as needed.
Once tested, deploy the Operations Agent to begin live monitoring and operational automation on your streaming telemetry data.

Ensure the agent’s identity uses least privilege permissions delegated from its creator, limiting risks while maintaining the ability to execute necessary actions.

Supported Data Sources

The Operations Agent currently supports monitoring only Kusto Query Language (KQL) databases within the Fabric Real-Time Intelligence platform. It cannot directly monitor data from lakehouse or warehouse tables.

Operational teams typically ingest telemetry and streaming data to KQL via Event Hubs or similar services, enabling the agent to perform low-latency detection and decision-making on this data.

Planning your telemetry ingestion and processing pipelines to consolidate operational data into KQL is essential for leveraging Operations Agents effectively.

Operations Agent Architecture & Data Flow

The architecture combines streaming data ingestion, real-time analytics, AI-based reasoning, and action orchestration:

Event Ingestion: Applications, infrastructure, and IoT devices generate telemetry streamed into Fabric via Event Hubs or IoT Hubs.
Real-Time Storage: Incoming telemetry is persisted into KQL databases optimized for fast querying and analysis.
Contextual Enrichment: The agent enriches telemetry with metadata such as deployment versions, resource configurations, and historical baselines.
Rule Evaluation: Detection rules and AI reasoning evaluate enriched telemetry for anomalies, threshold breaches, or pattern matches.
Action Recommendation: Based on evaluation, the agent recommends or triggers actions with optional human approval gates.
Logging and Playbook Persistence: All decisions, each step’s reasoning, and executed or recommended actions are logged into an operations playbook for audit and troubleshooting.

Architecture diagram showing telemetry ingestion, KQL database storage, agent logic and action flows

Example Operations Agent Configuration

Below is a high-level conceptual example showcasing the agent’s declarative configuration using YAML-style pseudocode:


operations_agent: "api-latency-monitor"
data_source:
kql_database: "prodTelemetry"
detection_rules:
- id: "high_latency"
condition: "p95_latency > 1200 AND error_rate > 0.02"
severity: "critical"
enrich_with:
- lookup: "deployment_versions"
actions:
- id: "alert_teams"
type: "teams.send"
message: "⚠️ API latency is too high — please review immediately."
- id: "recommend_auto_scale"
type: "powerautomate.invoke"
flow_name: "ScaleUpApiInstances"
safety_limits:
max_auto_actions_per_hour: 2
audit_log:
table: "operations_agent_decision_log"

This specification illustrates how telemetry data, enriched with deployment metadata, drives alerts and recommended automation within safe operational limits, including auditing of decisions.

Example YAML configuration of operations agent including detection rules and actions

Governance, Safety & Best Practices

To ensure safe and trustworthy automation through Operations Agents, follow these best practices:

Use Least-Privilege Identities: Create and run the agent under a dedicated service principal with just enough permissions necessary for the intended automated actions.
Require Approval Gates for Critical Actions: Configure manual approval flows for sensitive or impactful actions to prevent unintended consequences.
Implement Rate Limiting: Limit the frequency of automated actions such as scale-outs to avoid runaway or cascading effects.
Comprehensive Audit Trails: Log every detection, reasoning path, recommendation, and action execution to facilitate incident reviews and compliance.
Test Extensively: Use replayed or synthetic telemetry to validate detection logic and action workflows before live deployment.
Explainability: Capture and surface underlying AI or rule-based reasoning for each agent decision to build operator confidence and enable iterative improvement.

Always design rollback and manual remediation mechanisms in case automated actions misfire or behave unexpectedly. Trust in automation must be earned incrementally.

Diagram depicting governance and safety practices for operations agent decision-making

Operations Agent vs Data Agent in Microsoft Fabric

Microsoft Fabric offers both Operations Agents and Data Agents, which serve distinct but complementary purposes in the Fabric ecosystem.

Operations Agent: Built on the Real-Time Intelligence (RTI) platform, Operations Agents focus on monitoring operational telemetry such as logs, metrics, and streaming events in near real-time. They continuously observe and analyze system behavior and can recommend or automate responses like scaling resources, sending alerts, or triggering workflows. They require a Kusto (KQL) database as their data source and are designed primarily for DevOps, SRE, and operational automation scenarios.
Data Agent: Data Agents provide AI-powered proactive assistance across a broader range of data sources, including SQL databases, lakehouses, warehouses, and files. They help users by answering natural language queries, generating insights, and assisting with data exploration and business intelligence tasks. Data Agents emphasize conversational AI and self-service analytics, making them valuable for data professionals and business users.

In summary, while Operations Agents are specialized for near real-time operational monitoring and automating incident response, Data Agents focus on enhancing data interaction and analysis through AI-driven conversational interfaces across diverse data domains.

Real-World Use Cases

Operations Agents are highly valuable across multiple operational domains:

SRE & DevOps Automation: Detect system degradations and recommend or automate remediation quickly.
Data Pipeline Reliability: Monitor complex ETL and streaming workflows and automate recovery on failure.
Cost Optimization: Identify and recommend scaling down of idle environments during off-peak hours to reduce cloud costs.
Business Process Alerts: Alert business teams of anomalies in transaction or conversion metrics aligned with revenue impact.
Incident Investigation: Enrich alerts with deployment and config context for faster triage and post-mortem analysis.

Common Challenges & Pitfalls

Key challenges when adopting Operations Agents include:

Overly Sensitive Detection Rules: Excessive alert noise can overwhelm and desensitize operational teams.
Insufficient Context: Alerts without enrichment lack critical business context reducing actionability.
Missing Rollback Paths: Automation must have safe undo or manual fallback strategies to handle errors.
Limited Audit Visibility: Without full logging, debugging and compliance auditing become difficult.
Skipping Pre-Deployment Testing: Unvalidated rules can cause unintended disruptions in production.

Start with passive observation, iterate to recommendations, and carefully add automated execution with approvals to build trust and reliability.

Frequently Asked Questions (FAQ)

Q: Is the Operations Agent generally available for production?
A: The Operations Agent is in public preview; thorough validation is advisable before production deployment.

Q: Can Operations Agents monitor lakehouse or warehouse tables?
A: No, they currently only support KQL databases within the Real-Time Intelligence platform.

Q: What actions can an Operations Agent perform?
A: It can recommend or trigger sending Teams messages, invoking Power Automate flows, calling webhooks, or opening service tickets.

Q: How does the Operations Agent decide when to act?
A: Actions are based on user-defined detection rules applied to enriched telemetry, with execution gated by configured approval policies.

Q: Does the agent support human approval before actions?
A: Yes, approval gates can be set up to require manual review before critical automated actions execute.