Looking for USA local candidates.

Location- Remote Work

Key Responsibilities

·       Architect and deploy agentic multi-agent AI frameworks.
·       Develop scalable pipelines integrating LLM → RAG → VectorDB → Agents
·       Build and deploy MCP server for Agentic AI Agents and Integration
·       Build observability, latency optimization, and performance monitoring systems.
·       Implement self-refine / feedback loop learning architectures.
 

1. Multi-Agent System Architecture & Deployment

·       Architect, design, and deploy agentic multi-agent frameworks where multiple AI agents collaborate autonomously.
·       Design and implement inter-agent communication protocols, coordination strategies, and workflow orchestration layers.
·       Integrate with frameworks such as LangGraph, CrewAI, AutoGen, or Swarm to develop distributed, event-driven agentic ecosystems.
·       Develop containerized deployments (Docker / Kubernetes) for multi-agent clusters running in hybrid or multi-cloud environments.

2. Intelligent Pipeline Development

·       Build end-to-end scalable pipelines integrating LLMs → RAG → VectorDB → Agents, ensuring optimal latency and retrieval quality.
·       Implement retrieval-augmented generation (RAG) architectures using FAISS, Chroma, Weaviate, Milvus, or Pinecone.
·       Develop embedding generation, storage, and query pipelines using OpenAI, Hugging Face, or local LLMs.
·       Orchestrate data movement, context caching, and memory persistence for agentic reasoning loops.

3. Agentic Infrastructure & Orchestration

·       Build and maintain MCP (Model Context Protocol) servers for Agentic AI agents and integrations.
·       Develop APIs, microservices, and serverless components for flexible integration with third-party systems.
·       Implement distributed task scheduling and event orchestration using Celery, Airflow, Temporal, or Prefect.

4. Observability, Performance, and Optimization

·       Build observability stacks for multi-agent systems with centralized logging, distributed tracing, and metrics visualization.
·       Optimize latency, throughput, and inference cost across LLM and RAG layers.
·       Implement performance benchmarking and automated regression testing for large-scale agent orchestration.
·       Monitor LLM response quality, drift, and fine-tuning performance through continuous feedback loops.

5. Self-Refining & Feedback Loop Architectures

·       Implement self-refining / reinforcement learning feedback mechanisms for agents to iteratively improve their performance.
·       Integrate auto-evaluation agents to assess output correctness and reduce hallucination.
·       Design memory systems (episodic, semantic, long-term) for adaptive agent learning and contextual persistence.
·       Experiment with tool-use capabilities, chaining, and adaptive reasoning strategies to enhance autonomous capabilities.

Technical Skills Required

·       Programming: Expert-level Python (async, multiprocessing, API design, performance tuning).
·       LLM Ecosystem: Familiarity with OpenAI, Anthropic, Hugging Face, Ollama, LangChain, LangGraph, CrewAI, or AutoGen.
·       Databases: VectorDBs (FAISS, Weaviate, Milvus, Pinecone), NoSQL (MongoDB, Redis), SQL (PostgreSQL, MySQL).
·       Cloud Platforms: AWS / Azure / GCP; experience with Kubernetes, Docker, Terraform, and serverless architecture.
·       Observability: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog, or New Relic.
·       CI/CD & DevOps: GitHub Actions, Jenkins, ArgoCD, Cloud Build, and testing frameworks (PyTest, Locust, etc.).
·       Other Tools: FastAPI, gRPC, REST, Kafka, Redis Streams, or event-driven frameworks.

Preferred Experience

·       Experience designing agentic workflows or AI orchestration systems in production environments.
·       Background in applied AI infrastructure, ML Ops, or distributed system design.
·       Exposure to RAG-based conversational AI or autonomous task delegation frameworks.
·       Strong understanding of context management, caching, and inference optimization for large models.
·       Experience with multi-agent benchmarking or simulation environments.

Soft Skills

·       Ability to translate conceptual AI architectures into production-grade systems.
·       Strong problem-solving and debugging capabilities in distributed environments.
·       Collaboration mindset – working closely with AI researchers, data scientists, and backend teams.
·       Passion for innovation in agentic intelligence, orchestration systems, and AI autonomy.

Education & Experience

·       Bachelor’s or Master’s in Computer Science, AI/ML, or related technical field.
·       5+ years of experience in backend, cloud, or AI infrastructure engineering.
·       2+ years in applied AI or LLM-based system development preferred.

Optional Nice-to-Haves

·       Knowledge of Reinforcement Learning from Human Feedback (RLHF) or self-improving AI systems.
·       Experience deploying on-premise or private LLMs or integrating custom fine-tuned models.
·       Familiarity with graph-based reasoning or knowledge representation systems.
·       Understanding of AI safety, alignment, and autonomous agent governance.

Required Skills

Cloud engineering python AI