2025/12/08

Unlocking the V-Model: A Blueprint for Agentic Automotive Development Summary

Summary

This article outlines a strategic Open Agentic Workflow to fundamentally modernize the V-Model for automotive software development. The central goal is to integrate Generative AI into safety-critical workflows (ISO 26262-compliant, [1]) by resolving the “binary wall” created by proprietary data formats. The solution is structured around four pillars:

  • Adopting Open Artifacts (SysML [4] v2 etc.)
  • Establishing tool access via the Model Context Protocol (MCP [5])
  • Enabling Agent-to-Agent Collaboration (A2A [6])
  • Enforcing verifiable results through Spec Driven Development (SDD [7]).

This framework enables agents to read, reason about, and modify the system’s core artifacts, promising robust, auditable automation across the entire V-Model, from initial requirements through SIL/HIL testing.

Introduction

The automotive software development lifecycle is based on the V-Model. It provides the necessary structure for safety-critical systems as mandated by ISO 26262 [1], ensuring that every requirement has a corresponding test. Although, agile processes are introduced in automotive software development in recent years, the V-Model is still relevant. The V-Models validation and verification are executed in agile incremental cycles.

 

When introduction generative Artificial Intelligence (GenAI) to automotive software development, we need to consider the complex processes and tooling implementing the V-Model. Different challenges stem from the fractured nature of the development environment.

  • Complex, Heterogeneous Tool Landscape: The development process requires dozens of specialized, often proprietary, tools across the V-cycle (from requirements and modeling to calibration and testing). This creates high overhead in managing the toolchain and dependencies.
  • Proprietary Data Formats: Many essential tools and systems use vendor-specific formats for models, measurement data, diagnostics, and calibration. This leads to data silos, complicates data exchange between different organizational units or suppliers, and hinders tool substitution (vendor lock-in).
  • Integration of Hardware Environments for Integration Testing (HiL): Testing relies heavily on expensive, complex, and time-consuming physical hardware setups (like Hardware-in-the-Loop simulators). This creates a bottleneck that limits the frequency and scale of testing, delaying defect discovery.
  • Traceability: Maintaining the mandatory, unbroken link between requirements, design, code, and test cases is inherently difficult when data is passed between disparate and often poorly integrated tools.

Regulatory requirements place heavy demands on documentation and verification

  • Compliance Overload: Developers must satisfy demanding, parallel standards for Functional Safety (ISO 26262), Cybersecurity (ISO 21434) and other standards. This results in massive documentation requirements and stringent process adherence (ASPICE), consuming significant development resources.
  • High Cost of Change Management: Due to the severe safety and security implications, any modification to certified software requires extensive re-verification and re-documentation, increasing the cost and time of maintenance and over-the-air (OTA) updates.

GenAI can help to better integrate the existing tools and support the development, test and documentation processes. To introduce GenAI in automotive software processes and toolchains, we propose four technical principles:

  1. Open Artifacts & Storage: Using open, text-based formats (like SysML v2) stored in Git-based repositories as the primary source of truth where possible.
  2. Enabling Tool Access for Agents: Enabling agents to access and modify data in proprietary tools via the Model Context Protocol (MCP) or native APIs.
  3. Agent-to-Agent (A2A) Collaboration: Orchestrating complex workflows by allowing specialized agents to communicate and delegate tasks.
  4. Spec Driven Development (SDD): Implementing formal contracts (specifications) derived from the artifacts to guide and verify AI-generated work.

 

These principles allow the industry to move beyond the limitations of closed formats and deploy robust, verifiable agentic automation across the entire V-Model.

Figure 1: The V-Model of Software Development supported by agentic workflows

 

 

The Roadblock: The “Binary” Wall

Current automotive toolchains are plagued by proprietary formats. This is the primary hurdle for introducing large-scale agentic workflows.

 

Why proprietary data blocks Agentic AI:

  1. Opaque to LLMs: Large Language Models (LLMs) thrive on text. They cannot effectively reason about binary files or closed databases.
  2. Context Fracturing: An agent cannot “trace” requirements from proprietary models, breaking the context window and leading to hallucinations.
  3. Tool Lock-in: Limited or proprietary APIs prevent the seamless integration required for autonomous agentic loops.

The Solution: The “Open Agentic” Workflow

To enable true agentic automation in the V-model, we adopt a strategy built on the four principles defined above.

 

1. The Foundation: Open Artifacts & Git Storage

The prerequisite for GenAI integration is ensuring the “source of truth” is readable and accessible to agents, regardless of its storage location. “Everything as Code” (EaC) would be the ideal, but this cannot be achieved in practical engineering tool chains with many different tools and proprietary formats.  The proposed architecture acknowledges that automotive engineering workflow require integration with proprietary tools. Artifact accessibility is achieved via a layered approach using three strategic methods:

  • Direct Git Access for Text Artifacts (EaC Principle): Storing artifacts that inherently support text-based formats (like SysML v2 notation, source Code (C/C++), Markdown documentation, JSON/YAML configuration) directly in Git-based repositories for direct agent interaction and version control.
  • API/Protocol Access to Tools (via MCP): Providing agents with programmatic access to complex or proprietary tools (e.g., PLM systems, simulation tools, requirements databases) via their native APIs or the Model Context Protocol (MCP).
  • Specialized Tool Agents (A2A Integration): Implementing a specialized agent to interface with a complex tool, translating instructions from other agents via Agent-to-Agent (A2A) protocols and brokering the data.

This strategic mix ensures artifacts from all tools involved in the V-Model are made accessible for GenAI.

Assessing Data Formats for GenAI Suitability

The choice of format dictates an agent’s ability to read, reason, and reliably modify engineering artifacts. Suitability varies widely:

For formats not well suited for direct processing by LLMs, MCP solutions can be developed to make the data accessible to the agentic workflow.

SysML v2 – The Semantic Core for Artifacts

The core value of SysML v2 lies in its human-readable textual notation and its formal semantic foundation. SysML v2’s concise, programmatic structure is ideal for GenAI:

  • Structured Generation: LLMs can generate models that conform to this grammar using simple, direct text, resulting in high fidelity output.
  • Direct Reasoning: The structured text enables agents to perform semantic diffing and reasoning effectively, allowing for rapid, accurate impact analysis and modification directly in the Git repository.

SysML v2 defines the explicit meaning (semantics) and relationships of system artifacts. Tools achieve tool-to-tool data exchange by interacting directly with this shared semantic model, enabling traceability agents to automatically link the requirement model to the architecture model, and the architecture model to the implementation (code/tests).

SysML v2 is a relatively new standard formally published in September 2025. At the time of this writing (November 2025), the LLM support for SysML v2 is very limited due to missing training data. Additionally, LLMs often confuse the XML based SysML v1 with the text based SysML v2. Before SysML v2 can be used in projects post training  and fine tuning of LLMs are necessary until the frontier LLMs support SysML v2.

2. Tool Access: Model Context Protocol (MCP)

The Model Context Protocol (MCP) enables agents (LLMs) to securely and reliably access proprietary and complex tools by establishing a standardized, two-way, client-server architecture, acting as a universal adapter for external systems. The Model Context Protocol acts as the Universal Serial Bus (USB) for Large Language Models (LLMs), providing a standardized way to transfer conversational state to the fundamentally stateless model.  MCP allows an AI agent to safely request to read files, run tools (like a parser or compiler), or execute tests.

MCP standardizes the communication and data exchange necessary for an LLM agent to interact with external, real-world capabilities—whether they are internal enterprise databases, private APIs, or complex business logic.

This process is broken down into three key stages: Tool Discovery, Tool Invocation, and Result Integration.

  1. Tool Discovery: Exposing Proprietary Functions
    The MCP Server is the intermediary that wraps the proprietary tool or API. It’s the key to making a private function accessible to any LLM using the MCP standard.

    – Standardized Definition: The server exposes a list of available tools (via the tools/list request) where each tool is defined with a structured JSON Schema. This schema includes a unique name, a human-readable description, and the required input parameters. Based on this, the MCP Client can select the right tools for a task.

    – Decoupling: By requiring the server to provide this structured metadata, the complex, proprietary implementation details of the tool (e.g., specific API keys, database connection strings, or internal logic) are shielded from the LLM itself. The LLM only sees the function signature, not the underlying code.

    – Dynamic Capabilities: The list of tools can be dynamic. A server can add, remove, or modify tool definitions at runtime, allowing the LLM agent’s capabilities to evolve without needing a system-wide update.

  2. Tool Invocation: The LLM’s “Function Call”
    When a user’s request (in the Context Payload) requires an external action, the LLM utilizes its reasoning engine to determine the appropriate tool, its arguments, and the need for its execution.

    – Intent to Call:
    Based on the user’s prompt, the LLM generates a structured request that specifies the chosen tool’s name and the corresponding, schema-validated arguments.

    – Client Proxy:
     The MCP Client (running within the LLM’s host application) intercepts this structured request. It then acts as a proxy, transmitting a tools/call message—containing the tool name and parameters—to the designated MCP Server.

    – Secure Execution:
    The MCP Server executes the proprietary function (e.g., runs a complex SQL query on a private database, calls an internal REST endpoint). Because the server controls the execution environment, it can enforce security, access controls, and rate limits on the proprietary system. The server holds the necessary credentials, ensuring they are never exposed to the LLM or the client application.
  3. Result Integration: Context-Aware Feedback
    Once the proprietary tool has executed, the MCP governs how the result is packaged and returned to the LLM.

    – Structured Output: The tool’s output—which could be simple text, a large JSON object, or a resource link—is formatted according to the MCP standard and sent back to the client.

    – Context Injection:
     This result is then injected back into the LLM’s Context Window as part of the conversation history. This allows the LLM to use the real-time, proprietary data to formulate its final, accurate, and relevant answer to the user.

    – Human-in-the-Loop:
     For sensitive or transactional proprietary tools (like “Approve OTA update”), the MCP supports elicitation features, allowing the server to pause execution and prompt the user for explicit confirmation before the action is executed.

By standardizing the tool definition and the call/response mechanism, the MCP successfully isolates the LLM’s reasoning engine from the complexity and security requirements of proprietary systems, enabling agents to be both powerful and safe.

 

3. Collaboration: Agent-to-Agent (A2A) Communication

The Agent-to-Agent (A2A) protocol [6] is an open standard that enables autonomous AI agents to communicate, collaborate, and coordinate seamlessly across different platforms and vendors. It represents a shift from isolated, monolithic agents toward multi-agent ecosystems capable of solving complex workflows together. Unlike the Model Context Protocol (MCP), which focuses on the agent-tool interface, A2A facilitates horizontal integration within a multi-agent system, allowing heterogeneous agents to operate as a cohesive, coordinated workflow.

A2A protocol defines several aspects of the agent-to-agent interaction:

1. Discovery

  • Agents announce their presence and capabilities using standardized metadata.
  • Other agents can query this information to find suitable collaborators.
  • This is like how devices discover each other on a network.

2. Authentication & Trust

  • Each agent has an agent card (like a digital identity).
  • Secure authentication ensures that only trusted agents can exchange tasks.
  • This prevents malicious or unauthorized agents from joining the ecosystem.

3. Communication Layer

  • A2A defines a transport layer for reliable message passing.
  • Messages are structured in a common format so agents built in different languages or frameworks can understand each other.

4. Task Objects

  • Work is exchanged through task objects: standardized packets that describe what needs to be done.
  • These objects include inputs, expected outputs, and constraints.
  • Agents can accept, reject, or negotiate tasks.

5. Delegation & Coordination

  • Agents can delegate tasks to peers better suited for them.
  • They can split complex workflows into smaller subtasks and assign them across multiple agents.
  • Coordination ensures results are aggregated back into a coherent output.

6. Execution & RPC

  • Agents use remote procedure calls (RPC) to invoke functions on other agents.
  • This allows one agent to directly leverage another’s specialized capabilities.
  • Example: A research agent calls a data-cleaning agent before passing results to a reporting agent.

In the context of an agentic workflow for the V-Model software engineering, A2A acts as the universal “language” and orchestration layer that connects the different autonomous agents responsible for each phase of development and testing. This collaboration is crucial because the V-Model requires a direct, traceable link between each development stage (left side of the V) and its corresponding validation stage (right side of the V).

 

4. Verification: Spec Driven Development (SDD)

Spec Driven Development (SDD) is a modern methodology that formalizes the use of a detailed, comprehensive specification as the primary source of truth for software creation. This “spec” is written first, driving every subsequent phase of the project, including implementation.

In the age of Agentic AI, SDD becomes essential for managing the inherent non-deterministic behavior of Generative AI (GenAI).

SDD as an AI Contract

To manage GenAI’s non-determinism, SDD is used to create a formal contract that the AI agent must satisfy.

  • The Specification: It acts as a formalized, machine-readable contract that clearly articulates the what (functional requirements) and the why (intended system behavior).
  • Enforcement: SDD ensures that the agent’s generated work (e.g., implementation code or test scripts) strictly adheres to the functional and non-functional requirements defined in the preceding specification.
  • Verification: Automated verification loops enforce this compliance, minimizing ambiguity, reducing costly rework, and ensuring the final implementation aligns perfectly with the intended system behavior, which is critical in regulated and safety-critical domains.
The Four-Phase SDD Workflow

The standard Spec Driven Development workflow is structured into four sequential phases, each serving as a critical checkpoint for human review and course correction. This process translates high-level intent into executable code through structured artifacts.

Phase 1: Specify

This is the initial and most critical phase. The human product owner or engineer defines the product or feature in natural language, focusing primarily on user needs and measurable success criteria.

  • Focus: The What and Why.
  • Artifact: The Specification Document (spec.md). This includes functional and non-functional requirements, acceptance criteria, user journeys, and explicit constraints. The resulting document is detailed enough that another developer (or an AI agent) could implement the feature without further clarification.

Phase 2: Plan
The specification is translated from a product-centric view into a technical architecture. The AI agent, or a specialized planning agent, reviews the specification and existing codebase (if applicable) to generate a detailed strategy for implementation.

  • Focus: The How.
  • Artifact: The Technical Plan (plan.md). This document defines the chosen architecture (e.g., AUTOSAR Adaptive, DDS), outlines API endpoints, specifies database schemas, identifies technical dependencies, and details integration points with existing systems.

Phase 3: Tasks
Once the technical plan is approved, the AI agent breaks the overall project down into small, actionable, and independently testable work units.

  • Focus: Actionable Steps.
  • Artifact: The Task List (tasks.md). This is typically an enumerated list with checkboxes, ensuring each item is minimal (e.g., “Implement the park brake function,” not “Build the whole system”). This segmentation is crucial for efficient execution and focused code review.

Phase 4: Implement & Test

This is the code generation phase. An AI coding agent executes the tasks sequentially, using the Specification, the Plan, and the current Task as its context. The agent generates the code and often includes the necessary unit tests to validate correctness against the acceptance criteria defined in the Specification. In the case of automotive software, the integration testing and validation is part of this phase.

  • Focus: Execution and Validation.
  • Output: Source Code, Unit Tests, and updated status in the tasks.md file. The human role shifts entirely to validation—reviewing the generated code against the original specification.

Example agentic workflow

In this chapter we will showcase a future agentic workflow. It describes how the four principles of our agentic workflow are integrated. The agentic workflow automates generation and verification but operates under the principle of Human-in-the-Loop. Human engineers maintain ultimate responsibility and approve critical transitions, such as design finalization and merging code into the main repository. The workflow relies on SysML v2 artifacts defining the system, which are continuously verified by Spec Driven Development (SDD) loops executed by the collaborative agent ecosystem.

 

The continuous SDD loop, orchestrated by the agents, follows a four-step cycle:

  1. Specify: The Systems Agent defines the system behavior and requirements using SysML v2 notation. This SysML v2 artifact is the formal, machine-readable specification contract. For SDD it is translated into markdown (spec.md)
  2. Plan: The Architecture Agent reads the SysML v2 specification. It generates a work plan (plan.md) outlining the design blocks, code modules, and verification steps required to satisfy the specification.
  3. Tasks: The Architecture Agent breaks the plan into granular, traceable tasks (tasks.md). These tasks are delegated to the Implementation Agent and QA Agent via A2A.
  4. Implement: The Implementation Agent executes the task, generating the code and tests, and then running the build/verification step. If verification fails, the cycle reverts to Specify for requirement/design refinement.

 

Synchronization of SDD Markdown with SysML v2 Artifacts

SDD Markdown files are not a source of truth for the system design; they serve as a dynamic output and instruction layer derived from the SysML v2 model.

  1. SysML v2 is the Source of truth: The authoritative source of the design is the SysML v2 text artifact stored in Git.
  2. SDD Generation: When the Architecture Agent performs the Plan step (Step 2, see below), it uses the SysML v2 API/kernel to parse the model. It then generates the SDD Markdown files which contain human-readable summaries and the exact constraints (pre/post-conditions) to be enforced.
  3. Git Tagging & Traceability: The SDD Markdown file explicitly links back to the specific SysML v2 Git commit hash and element IDs that were used to generate it, ensuring full traceability.

 

Workflow Steps

1. Requirements Analysis

  • Systems Engineering: The Systems Agent assists the Requirements Engineer by generating / extending and improving high-level system needs, functional requirements, and safety goals using SysML v2 notation.
  • Agent Action: The Systems Agent validates consistency and completeness against internal standards and delegates the initial architecture synthesis to the Architecture Agent via A2A.

2. High-Level Design (HLD)

  • Architecture & Design: The Architecture Agent assists the human IT Architect by synthesizing the system architecture, defining the main components, their interfaces, and high-level behavioral models (e.g., block definition diagrams, activity diagrams) in SysML v2.
  • A2A Handoff: The Architecture Agent submits the HLD draft to the Systems Agent for final review and approval, ensuring the design satisfies the initial requirements (Step 1).

3. Detailed Design (DD)

  • IT Design: The Architecture Agent assists the IT Architect by executing the Specify and Plan phases of SDD, refining the HLD into a Detailed Design (DD), including specific algorithms, data structures, and the final formal interface specifications (SDD contracts).
  • Human Approval Point (Design Sign-off): The Architecture Agent uses MCP to create a Pull Request (PR) containing the final DD (SysML v2 files). A human Design Reviewer must formally approve this PR in the Git system, providing essential engineering judgment and sign-off.
  • A2A Handoff: Once approved, the Architecture Agent communicates the final DD and the resulting SDD task list to the Implementation Agent via A2A.

4. Implementation

  • Coding: The Implementation Agent executes the Tasks and Implement phases of SDD, generating production C++ (C, Rust, …) code based on the DD specification.
  • Self-Healing Loop: The agent triggers a containerized build via MCP. If the compiler or static analysis throws a warning (e.g., MISRA violation), the agent reads the error, refactors the code to be compliant, and re compiles.

5. Unit Testing (Low-Level Verification Loop)

  • SDD Role: The QA Agent uses SDD to generate and execute Unit Tests (e.g., Google Test) that verify individual component functions against the SysML v2 DD.
  • Verification Loop: If a Unit Test fails, the QA Agent reports the failure directly to the Implementation Agent via A2A. The Implementation Agent immediately modifies the generated code and re-runs the Unit Tests until they are green.

6. Integration Testing (Mid-Level Verification Loop)

  • SDD Role: The QA Agent executes Integration Tests to verify communication and data flow between connected modules, based on the SysML v2 HLD.
  • Verification Loop: If an Integration Test fails, the QA Agent determines the cause:
    • If it is a simple code bug, the loop goes back to the Implementation Agent (Step 4).
    • If it is an interface or architectural mismatch, the failure is reported to the Architecture Agent via A2A (Step 2/3) for design correction.

7. System Testing (Highest-Level Verification Loop)

  • SDD Role: The Validation Agent executes the system-level validation by triggering SiL / HiL tests. The SDD framework constantly verifies the runtime results against the high-level SDD contracts derived from the requirements (Step 1).
  • Verification Loop (Back to Left Side): If a System Validation test fails (e.g., performance violation in the HIL rig), the Validation Agent uses A2A to notify the Systems Agent (or a specialized Traceability Agent). This closes the V-loop by sending the issue back to the Requirements or HLD stage (Left Side) for specification analysis and iteration. The Systems Agent can define tasks for the different Agents to address the performance issue.
  • Human Approval Point (Feature Merge Sign-off): Only after all automated tests (Unit, Integration, System) are green, the Validation Agent uses MCP to submit a final Pull Request (PR). A human Software Integrator reviews the traceability report and green test results before granting the final approval to merge the code into the production (main) brunch.

Conclusion

The proposed open agentic workflow provides a new, powerful and compliant path to GenAI assisted automotive software development. By addressing the challenge of proprietary data through the strategic adoption of open formats like SysML v2 and the use of bridging protocols like MCP, the industry can finally unlock the velocity and reliability promised by Generative AI. The four pillars—Open Artifacts, MCP, A2A Collaboration, and Spec Driven Development—together form a verifiable feedback loop that fully integrates AI agents into the mandatory V-Model process.

This transition to agentic workflows is challenging and long journey.  Further work and investigations are still needed in several areas, including the level of automation for code generation. A conservative approach would start with documentation and testing for the automated workflow. For safety-critical systems, it remains imperative that the human be kept firmly in the loop. While agents can handle generation, self-healing, and low-level verification, final architectural decisions, safety case sign-off, production readiness and the responsibility for system integrity must reside with the human engineer. The future of automotive engineering is agent-assisted, requiring us to continually refine the boundaries, controls, and trust mechanisms governing the relationship between human expertise and automated intelligence.

Contact

Link copied Link copied?

Any questions? Contact me

Let's connect and find out how we can make things happen.

Gregor Resing
Executive IT Architect, IBM Germany

What’s up?

Link copied Link copied?

What else drives us