← Back to Hub

Microsoft Fabric Data Warehouse
Interview Questions

Microsoft Fabric Data Warehouse Interview Questions – Master T-SQL optimization, Synapse migration strategies, and Warehouse architecture for Senior SQL Developers.

What are the top Fabric Data Warehouse interview questions?

The most common Microsoft Fabric Data Warehouse interview questions focus on the architectural differences between the Fabric Warehouse (SaaS) and Azure Synapse Dedicated Pools (PaaS). Candidates are tested on T-SQL limitations (e.g., lack of Identity Columns), performance tuning using Result Set Caching, and cross-database querying strategies involving the Lakehouse SQL Endpoint.

If you are a SQL Developer or Architect transitioning to Microsoft Fabric, preparing for Microsoft Fabric Data Warehouse interview questions is essential for your success. Unlike traditional SQL Server environments, the Fabric Warehouse separates compute from storage completely, relying on open Delta Parquet files. Therefore, to succeed in senior interviews, you must demonstrate how to optimize T-SQL queries for this new architecture while managing migration challenges from legacy Synapse pools.

This comprehensive guide provides 40 deep-dive questions organized into 6 modules. Furthermore, we have integrated insights from our Synapse vs Fabric Comparison to help you articulate the platform differences clearly.

Module A: Data Warehouse Architecture

Understanding the fundamental shift from PaaS (Synapse) to SaaS (Fabric) is the first step. These Microsoft Fabric Data Warehouse interview questions cover the core design principles.

Core Warehouse Concepts

Beginner Q1: What is a Fabric Warehouse?

A Fabric Warehouse is a SaaS-based relational database that supports full T-SQL capabilities, including ACID transactions, DDL, and DML. Unlike traditional SQL Server, it completely separates compute from storage. Specifically, it stores data in the open Delta Parquet format in OneLake, but exposes it via a familiar T-SQL endpoint.

Intermediate Q2: Warehouse vs. Lakehouse SQL Endpoint?

This is a critical distinction. The Warehouse allows full read/write operations (INSERT, UPDATE, DELETE) using T-SQL. In contrast, the Lakehouse SQL Endpoint is Read-Only. Consequently, it allows you to query data ingested by Spark using T-SQL, but you cannot modify the data or schema from the SQL endpoint. See our Lakehouse vs Warehouse Guide.

Advanced Q3: Fabric Warehouse vs. Synapse Dedicated Pool?

Synapse Dedicated Pools use a proprietary distribution architecture (60 distributions) and storage format. Fabric Warehouse, however, uses a “Polaris” engine that is serverless and auto-scaling. Therefore, it does not require you to define distribution keys (Hash/Round-Robin) manually; the engine manages data placement automatically.

Storage Internals

Intermediate Q4: Does the Warehouse use Delta Lake?

Yes. Even though you interact with it using T-SQL, the underlying data is stored as Delta Parquet files in OneLake. This design enables “Open Mirroring.” As a result, other engines (like Spark) can potentially read Warehouse tables directly if permissions allow, effectively breaking down the traditional “SQL Silo.”

Advanced Q5: How is transaction logging handled?

Fabric Warehouse uses a distributed transaction log. When you commit a transaction, it writes to the log and simultaneously updates the Delta Lake metadata (_delta_log). This mechanism ensures ACID compliance while maintaining the open format. Consequently, unlike SQL Server, there is no single `.ldf` file to manage or back up.

Advanced Q6: What is “Time Travel” in Warehouse?

Because the underlying storage is Delta Lake, the Warehouse natively supports Time Travel. You can query historical versions of data using the T-SQL OPTION (FOR TIMESTAMP AS OF ...) syntax. This feature is invaluable for auditing and recovering from accidental data deletions without restoring a full backup.

Module B: T-SQL Development & Limitations

Developers must know what T-SQL features are supported. These questions focus on the surface area and limitations.

DDL & Constraints

Intermediate Q7: Are Primary Keys enforced?

No. You can define Primary Keys and Foreign Keys in Fabric Warehouse DDL for documentation and optimizer hints, which improves query performance. However, the engine does not enforce uniqueness or referential integrity. Therefore, you must handle data quality checks in your ETL pipeline.

Intermediate Q8: Are Identity Columns supported?

Currently, IDENTITY columns (auto-increment) are not supported in Fabric Warehouse. This is a major migration blocker for legacy schemas. Consequently, you must generate surrogate keys using T-SQL logic (e.g., ROW_NUMBER()) or during the ETL ingestion phase.

Intermediate Q9: Does Fabric support Stored Procedures?

Yes, Fabric Warehouse fully supports Stored Procedures, Views, and Table-Valued Functions (TVFs). This capability allows you to migrate existing business logic from SQL Server or Synapse with minimal changes, provided you handle the unsupported T-SQL commands.

Ingestion Methods

Beginner Q10: What is the “COPY INTO” command?

COPY INTO is the primary high-performance command for loading data into the Warehouse from external files (CSV, Parquet in ADLS/Blob). It is significantly faster than INSERT INTO ... SELECT or bulk insert methods because it parallelizes the ingestion process across compute nodes.

Intermediate Q11: Can you use dbt with Fabric?

Yes. Fabric Warehouse creates a T-SQL endpoint that dbt can connect to via the standard adapter. You can run dbt run to execute transformations directly inside the Warehouse. See our guide on dbt Best Practices in Fabric.

Advanced Q12: How to handle Schema Drift?

Unlike Spark, the Warehouse enforces “Schema-on-Write.” If the incoming data structure changes, your COPY INTO or Pipelines will fail. To handle drift effectively, you must use dynamic SQL or T-SQL scripts to check source metadata and execute ALTER TABLE commands before loading.

Module C: Warehouse Performance Tuning

Optimizing SQL queries in a serverless environment requires new strategies. These Microsoft Fabric Data Warehouse interview questions cover caching and stats.

Caching Strategies

Intermediate Q13: What is Result Set Caching?

Result Set Caching stores the output of a query. If a user runs the exact same query again and the underlying data hasn’t changed, Fabric returns the result instantly from the cache. Consequently, this consumes zero compute resources, which is vital for dashboard performance.

Advanced Q14: How does “V-Order” affect SQL?

V-Order optimizes the Parquet files for the VertiPaq engine (Direct Lake). While primarily for Power BI, the Warehouse engine can also benefit from the sorted nature of V-Ordered files. Specifically, this leads to faster IO operations during table scans. See Warehouse Optimization Guide.

Intermediate Q15: Statistics in Fabric Warehouse?

Fabric automatically creates statistics for columns used in joins and filters. However, for complex workloads, auto-stats might not be generated fast enough. Therefore, you can manually run CREATE STATISTICS to ensure the query optimizer has the most up-to-date histograms for execution planning.

Query Best Practices

Advanced Q16: Optimizing Joins in Fabric?

Since you don’t define distribution keys (Hash/Round-Robin), the engine handles shuffling. To optimize joins, ensure you join on columns with high cardinality and use standard star-schema designs. Furthermore, avoid joining on `VARCHAR(MAX)` columns as this severely degrades performance.

Advanced Q17: Managing TempDB contention?

Fabric handles TempDB automatically, but heavy usage of temp tables (`#temp`) can still cause spill-to-disk issues if the capacity is undersized. For complex intermediate logic, consider using CTEs (Common Table Expressions) or persisting intermediate results to permanent staging tables.

Intermediate Q18: Materialized Views support?

Currently, Fabric Warehouse supports standard Views but has limited support for Materialized Views compared to Synapse. You should check the latest roadmap. As a workaround, developers often use CTAS (Create Table As Select) to physically materialize complex aggregations for reporting.

Advanced Q19: Scaling Compute Resources?

Compute in Fabric is governed by the F-SKU capacity. Unlike Synapse where you scale DWUs, in Fabric, the Warehouse automatically utilizes the available CUs (Capacity Units). Consequently, if queries are slow due to concurrency, you must scale up the F-SKU or enable “Bursting” capabilities.

Intermediate Q20: Monitoring slow queries?

Use the Query Activity hub in the Fabric portal or query the dynamic management views (DMVs) like `sys.dm_exec_requests`. This allows you to identify long-running queries. Furthermore, you can analyze the query plan to spot missing stats or expensive shuffle operations.

Module D: Security & Governance

Security is non-negotiable. These questions cover Row-Level Security and masking.

Access Control

Beginner Q21: How do you secure data rows?

Fabric Warehouse supports standard T-SQL Row-Level Security (RLS). You define a Security Policy and a Predicate Function (e.g., `WHERE Region = @UserRegion`). This filters data dynamically based on the user executing the query.

Intermediate Q22: What is Object-Level Security (OLS)?

OLS allows you to restrict access to specific columns (Column-Level Security) or tables. You use the `GRANT` and `DENY` T-SQL commands. For example, you can `DENY SELECT` on the `Salary` column for the `Analyst` role. Consequently, they can query the table but not the sensitive column.

Advanced Q23: Dynamic Data Masking?

Dynamic Data Masking obfuscates sensitive data (like emails or credit cards) in the query result set without changing the data on disk. You can apply a masking rule (e.g., `masked with (function = ’email()’)`) to a column. As a result, privileged users can see the real data, while others see the mask.

Governance

Intermediate Q24: Sharing a Warehouse?

You can share a Warehouse directly via the “Share” button without adding the user to the Workspace. This grants them permission to connect via the SQL Endpoint. Specifically, you can define “Read” permissions or limit them to specific schemas.

Intermediate Q25: Warehouse vs. Lakehouse Security?

Warehouse security is defined using T-SQL (GRANT/DENY/RLS). In contrast, Lakehouse security (when accessed via Spark) relies on OneLake roles. However, the Lakehouse SQL Endpoint honors the SQL-based security rules defined on it, effectively bridging the gap.

Advanced Q26: Auditing user activity?

Fabric integrates with Microsoft Purview for auditing. You can view audit logs to see who ran which SQL query, when, and what data was accessed. This visibility is crucial for compliance in regulated industries.

Module E: Cross-Querying Strategy

Fabric’s superpower is querying across items. These Microsoft Fabric Data Warehouse interview questions test your ability to break silos.

Data Virtualization

Advanced Q27: How to query a Lakehouse from a Warehouse?

You can use Cross-Database Queries. In T-SQL, you can reference a Lakehouse table using three-part naming: `[LakehouseName].[Schema].[Table]`. This works seamlessly because both items reside in OneLake and share the same SQL engine compute.

Advanced Q28: How to query data across Workspaces?

Currently, cross-database queries work best within the same workspace. To query data from a different workspace, you should create a Shortcut in your local Warehouse (or Lakehouse) pointing to the remote data. Then, you can query the shortcut as if it were a local table.

Intermediate Q29: Joining SQL and Spark data?

Consider a scenario where you have “Sales” in a Warehouse (SQL) and “Logs” in a Lakehouse (Spark). You can write a single T-SQL query in the Warehouse that joins the local Sales table with the remote Logs table (via 3-part naming). Consequently, no ETL is required to move the logs.

Limitations

Intermediate Q30: What limitations exist for Cross-Queries?

Cross-database queries are Read-Only for the remote source. You cannot perform an `UPDATE` on a Lakehouse table from a Warehouse query. Additionally, ensure both items are in the same region to avoid latency or cross-region data transfer limitations.

Advanced Q31: Performance of Cross-Queries?

Performance is generally high because data is not moved; the compute engine reads the Parquet files directly from OneLake. However, joining two massive tables from different sources may incur shuffling overhead. Therefore, ensure stats are up to date on both sides.

Intermediate Q32: Using Views for Abstraction?

Best Practice: Create a View in your Warehouse that encapsulates the 3-part name logic (`CREATE VIEW v_Logs AS SELECT * FROM Lakehouse.dbo.Logs`). This abstracts the physical location from your BI tools and users.

Module F: Migration Scenarios

Real-world migration strategies from Synapse and SQL Server.

Migration Strategy

Advanced Q33: Migrating Synapse Dedicated Pools?

You cannot simply backup/restore. You must export data (Parquet/CSV) and load it into OneLake (COPY INTO). Then, rewrite DDL to remove unsupported syntax (DISTRIBUTION, INDEX types). Use the Fabric Migration Assistant to identify gaps. See our Synapse Migration Guide.

Intermediate Q34: Handling unsupported T-SQL?

Features like `MERGE` (enhanced), `IDENTITY`, and certain cursors behave differently or are unsupported. You must refactor stored procedures. For `IDENTITY`, implement custom sequence logic using `MAX(ID) + ROW_NUMBER()`.

Execution & Testing

Advanced Q35: Migrating On-Prem SQL Server?

Use a Data Factory Pipeline with the On-Premises Data Gateway. For large migrations, consider dumping tables to local Parquet files, uploading to ADLS, and then using `COPY INTO` for maximum throughput.

Intermediate Q36: CI/CD for Warehouse?

Fabric Warehouse projects can be integrated with Azure DevOps. You can commit your SQL Database Projects (scripts) to Git. Use Deployment Pipelines to promote schema changes from Dev to Test to Prod.

Advanced Q37: Zero-Downtime Migration?

Use Mirroring for Azure SQL DB sources to replicate data to Fabric near-real-time. Once the mirror is synced, point your BI tools to the Fabric Warehouse endpoint. This strategy minimizes downtime compared to batch ETL migration.

Intermediate Q38: Testing Data Consistency?

After migration, run row-count and aggregation checksum queries on both source (Synapse) and target (Fabric). Automate this validation using a Notebook or Data Pipeline before switching over users.

Advanced Q39: Handling large historical loads?

For PB-scale history, do not load everything into the active Warehouse. Load active data (e.g., last 3 years) to the Warehouse. Leave older data in ADLS (Archive tier) and use Shortcuts to expose it only when needed. This approach significantly saves storage costs.

Advanced Q40: Capacity Planning for Migration?

Estimate the CU (Capacity Units) required based on your Synapse DWU usage. A rough rule of thumb is available, but you should run a pilot workload on a trial F-SKU and monitor the Capacity Metrics App to size correctly before full migration.

References: Microsoft Learn | Delta Lake

Scroll to Top