AI Ops Architect
Company
IBM
Location
CA Toronto
Type
Full Time
Job Description
At IBM work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so lets talk.
Your Role and Responsibilities
We are looking for an AIOps Architect to lead the development and deployment of AI-enhanced solutions for IT operations. In this role you will architect cloud-native compliant platforms that integrate AIOps cognitive computing and machine learning models to improve infrastructure performance reduce downtime and enhance system observability. You will design scalable secure and resilient systems develop automated operations and implement robust security practices to ensure compliance and operational excellence.
As an AIOps Architect you will guide clients in their digital transformation utilizing state-of-the-art technologies to build intelligent operations platforms that drive efficiency enhance system reliability and support business growth.
Core Responsibilities
- Architect and deploy hybrid multi-cloud and cloud-native solutions to support payments transformation aligning infrastructure systems networking and data center strategies
- Architect and implement comprehensive Solution Architectures High-Level Designs (HLD) and Low-Level Designs (LLD) that ensure seamless integration of cloud-native technologies AI-enhanced monitoring and automation tools adhering to best practices in security compliance and governance
- Develop and deploy strategies to enhance scalability resilience and operational efficiency across hybrid and multi-cloud environments integrating automation observability and robust security protocols to support seamless high-performing and compliant systems
- Design and implement solutions that optimize cloud operations infrastructure management application performance DevOps pipelines security frameworks network architecture MLOps and LLMOps.
- Deep expertise in monitoring tools (AppDynamics Dynatrace Splunk Instana QRadar AWS CloudWatch Azure Monitor Google Operations Suite) with a focus on LLM observability and security for real-time analytics and anomaly detection
- Develop advanced monitoring and observability frameworks leveraging LLM observability and security enabling robust tracking of application performance anomaly detection and real-time analytics for Large Language Models and other AI/ML workloads
- Integrate supervised learning models for predictive analytics employing techniques such as data cleaning event correlation and root cause analysis to generate actionable insights that drive proactive incident resolution and optimize system performance
- Design and implement IT Service Management (ITSM) and ITIL frameworks encompassing incident management problem management change management and service level management to standardize operational workflows and enhance service reliability
- Utilize AI/ML models including machine learning-based anomaly detection and reinforcement learning to automate incident response performance tuning and infrastructure scaling reducing Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR)
- Engineer robust security architectures that include Cloud Native Application Protection Platforms (CNAPP) Zero Trust Network Access (ZTNA) and fully automated DevSecOps pipelines ensuring compliance with stringent regulatory requirements and maintaining security posture across multi-cloud ecosystems
- Design and deploy High Availability (HA) and Disaster Recovery (DR) solutions using distributed architectures multi-zone redundancy data replication and automated failover ensuring minimal service disruption and business continuity in multi-region deployments
- Implement chaos engineering practices conducting FURPS (Functionality Usability Reliability Performance Supportability) testing to identify potential failure points validate system resilience and ensure seamless recovery under high-stress conditions.
- Lead end-to-end project lifecycle management including agile project methodologies DevOps pipelines resource allocation risk management and milestone tracking to ensure the successful deployment of scalable robust and secure solutions aligned with client objectives
Required Technical and Professional Expertise
- 8+ years of experience in the design delivery and scaling of complex large-scale IT projects with a focus on cutting-edge technology solutions across hybrid multi-cloud and on-premises environments
- 3+ years of technical leadership as a solution architect driving the design integration and management of hybrid cloud solutions including seamless coordination across various cloud environments
- Demonstrated success in leading super complex projects from initial solution design through to deployment managing diverse teams multi-vendor coordination and ensuring alignment with strategic business goals
- Strong background in architecting complex multi-cloud systems leveraging hyperscalers (AWS Azure IBM Cloud Google Cloud) with experience in multi-region deployments multi-cloud networking and cross-cloud service integration
- Proven expertise in designing cloud-native solutions with microservices containers (Docker Podman) and orchestration platforms (Kubernetes OpenShift) ensuring modular scalable and resilient deployments
- In-depth understanding of regulatory compliance security frameworks and best practices in designing secure resilient architectures
- Familiarity with integrating AI/ML models to enhance monitoring incident response and predictive maintenance processes
- Expertise in emerging technologies such as AI-enhanced operations automation frameworks and cloud-native security to future-proof systems and improve operational efficiency
Preferred Technical and Professional Expertise
Same as above
Date Posted
11/26/2024
Views
0
Similar Jobs
Manager, Implementation Service - BuildOps
Views in the last 30 days - 0
BuildOps is a fastgrowing technology startup seeking a ManagerSenior Manager of Implementation The role involves leading a team to architect solutions...
View DetailsGenAI Architect - ServiceNow
Views in the last 30 days - 0
ServiceNow a global market leader in AIenhanced technology is seeking a GenAI Architect to drive customer outcomes and value realization The role invo...
View DetailsWorkday Financials Senior Manager - Connor Group
Views in the last 30 days - 0
Connor Group a professional services firm of Big 4 alumni industry executives and technology architects is seeking a Workday Professional for their Di...
View DetailsSr. Manager, Digital Marketing - The Wonderful Company
Views in the last 30 days - 0
Wonderful Agency an awardwinning inhouse marketing agency is seeking a Sr Manager Digital Marketing The role involves driving social and YouTube paid ...
View DetailsCloud Engineer - Mission Cloud
Views in the last 30 days - 0
Mission Cloud an AWS Premier Consulting Partner is seeking a Cloud Engineer to work remotely The role involves deploying applications on complex AWS i...
View DetailsTechnical Designer - CannonDesign
Views in the last 30 days - 0
CannonDesign is seeking a skilled architect with 4 years of experience in leading project development for transportation facilities including 3D model...
View Details