engblogs

summaries of the latest blog articles from your favorite tech companies.
Lambda LabsLambda Labs

How to serve Kimi-K2-Instruct on Lambda with vLLM

A step-by-step guide to deploying and benchmarking the trillion-parameter Kimi-K2-Instruct MoE language model on Lambda's multi-GPU setup using vLLM for efficient large-scale inference.

12/22/2025
CloudflareCloudflare

How Workers powers our internal maintenance scheduling pipeline

Cloudflare uses a sophisticated maintenance scheduler powered by Cloudflare Workers and graph processing to automate and optimize global data center maintenance without disrupting customer traffic.

12/22/2025
DuolingoDuolingo

Solving database contention with optimistic locking

Implementing optimistic locking at Duolingo dramatically reduced database lock contention, improving notification timeliness and resource efficiency through careful testing and monitoring.

12/22/2025
AWS MLAWS ML

Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock

Leverage Amazon Bedrock's multimodal generative AI assistant with advanced sensor data analysis, guided troubleshooting, and multimodal retrieval to enhance root cause diagnosis in predictive maintenance, reducing downtime and improving operational efficiency across industries.

12/22/2025
AWS MLAWS ML

Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator

Leverage the Analytics Agent, powered by Strands AI Agents within the GenAI IDP Accelerator, to enable non-technical users to perform natural language document analytics and generate actionable visual insights at scale without SQL expertise.

12/22/2025
AWS MLAWS ML

Deploy Mistral AI’s Voxtral on Amazon SageMaker AI

Guide to deploying Mistral AI’s multimodal Voxtral models on Amazon SageMaker using vLLM and BYOC for advanced audio-text processing and function calling capabilities.

12/22/2025
AWS MLAWS ML

Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock

Explore how Chain-of-Draft prompting on Amazon Bedrock significantly reduces token usage and latency compared to Chain-of-Thought, optimizing large language model reasoning for cost-effective, high-performance AI applications.

12/22/2025
MetaMeta

DrP: Meta’s Root Cause Analysis Platform at Scale

DrP is Meta’s scalable root cause analysis platform that automates incident investigations using a flexible SDK, ML algorithms, and seamless workflow integrations to significantly reduce MTTR and improve on-call productivity across large-scale systems.

12/19/2025
Google CloudGoogle Cloud

From Code to Cloud: Three Labs for Deploying Your AI Agent

Explore three hands-on labs demonstrating how to deploy AI agents on Google Cloud using Vertex AI Agent Engine, Cloud Run, and Google Kubernetes Engine for scalable, secure, and production-ready applications.

12/19/2025
Google CloudGoogle Cloud

Why Stochastic Rounding is Essential for Modern Generative AI

Stochastic rounding, a probabilistic rounding technique supported by modern hardware like Google Cloud TPUs and NVIDIA Blackwell GPUs, enables stable and efficient low-precision training for large generative AI models by preserving small gradient updates.

12/19/2025
Google CloudGoogle Cloud

Agent Factory Recap: Supercharging Agents on GKE with Agent Sandbox and Pod Snapshots

Explore how Google Kubernetes Engine enhances agentic workloads with flexible ADK integration, secure Agent Sandbox isolation using gVisor, and accelerated Pod Snapshots for near-instant sandbox restoration.

12/19/2025
Google CloudGoogle Cloud

Cloud CISO Perspectives: 2025 in review: Cloud security basics and evolving AI

A comprehensive 2025 review of cloud security fundamentals and the integration of evolving AI technologies to enhance defense, threat intelligence, and trusted cloud services.

12/19/2025