

From Prototype to Production: Agentic AI Tools for Natural Language-Driven HPC Management
Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)
Foyer D-G - 2nd Floor
Women in HPC Poster
Cybersecurity in HPC and AILarge Language Models and Generative AI in HPCPerformance and Resource ModelingResource Management and SchedulingSystem and Performance Monitoring
Information
Poster is on display and will be presented at the poster pitch session.
High-performance computing (HPC) and AI infrastructure are growing rapidly, placing increasing operational pressure on HPC service providers. In the UK, the deployment of Isambard AI, one of the world’s fastest supercomputers, supports hundreds of projects and thousands of users, with frequent AI research calls and dynamic compute allocations. Managing access, allocations, and usage reporting at this scale presents significant challenges. While platforms such as Waldur manage users, projects, and compute resources, many reporting and administrative workflows remain manual, particularly when responding to external stakeholders such as UKRI and DSIT who require periodic usage and consumption reports. These processes are time-consuming, error-prone, and limit real-time visibility for both HPC staff and users.
This work explores how modern AI protocols can be used to automate and simplify HPC operations. We integrate large language models with live infrastructure data using the Model Context Protocol (MCP), a standardised client-server framework that enables AI assistants to securely query APIs, retrieve structured data, and execute controlled actions. By exposing Waldur’s APIs as MCP tools, users can issue natural-language queries to retrieve real-time project statistics, usage summaries, and administrative information, while complex requests are automatically decomposed into multiple tool calls with structured orchestration.
A key contribution of this work is the secure transition from prototype to production. We implement OpenID Connect (OIDC) device authentication using Keycloak to support MCP clients operating in chat-based environments where traditional browser redirect flows are impractical. Users authenticate via a browser-based device flow, after which tokens are securely exchanged and introspected before issuing Waldur API tokens. Read-only and read-write MCP servers are strictly separated, and all write operations are permission-checked to prevent unauthorised or “silent” actions. No credentials are hard-coded, and each user authenticates independently.
The resulting system enables automated reporting, real-time dashboards, and natural-language interaction with HPC management platforms, significantly reducing manual workload and improving operational visibility. Beyond raw data retrieval, the system generates narrative summaries, trends, dashboards, and recommendations to support decision-making, usage forecasting, and resource optimisation. This work demonstrates how AI combined with modern protocols such as MCP and OIDC can form a secure, scalable interface for managing complex HPC infrastructures.
High-performance computing (HPC) and AI infrastructure are growing rapidly, placing increasing operational pressure on HPC service providers. In the UK, the deployment of Isambard AI, one of the world’s fastest supercomputers, supports hundreds of projects and thousands of users, with frequent AI research calls and dynamic compute allocations. Managing access, allocations, and usage reporting at this scale presents significant challenges. While platforms such as Waldur manage users, projects, and compute resources, many reporting and administrative workflows remain manual, particularly when responding to external stakeholders such as UKRI and DSIT who require periodic usage and consumption reports. These processes are time-consuming, error-prone, and limit real-time visibility for both HPC staff and users.
This work explores how modern AI protocols can be used to automate and simplify HPC operations. We integrate large language models with live infrastructure data using the Model Context Protocol (MCP), a standardised client-server framework that enables AI assistants to securely query APIs, retrieve structured data, and execute controlled actions. By exposing Waldur’s APIs as MCP tools, users can issue natural-language queries to retrieve real-time project statistics, usage summaries, and administrative information, while complex requests are automatically decomposed into multiple tool calls with structured orchestration.
A key contribution of this work is the secure transition from prototype to production. We implement OpenID Connect (OIDC) device authentication using Keycloak to support MCP clients operating in chat-based environments where traditional browser redirect flows are impractical. Users authenticate via a browser-based device flow, after which tokens are securely exchanged and introspected before issuing Waldur API tokens. Read-only and read-write MCP servers are strictly separated, and all write operations are permission-checked to prevent unauthorised or “silent” actions. No credentials are hard-coded, and each user authenticates independently.
The resulting system enables automated reporting, real-time dashboards, and natural-language interaction with HPC management platforms, significantly reducing manual workload and improving operational visibility. Beyond raw data retrieval, the system generates narrative summaries, trends, dashboards, and recommendations to support decision-making, usage forecasting, and resource optimisation. This work demonstrates how AI combined with modern protocols such as MCP and OIDC can form a secure, scalable interface for managing complex HPC infrastructures.
Format
on-demandon-site
