Running an SMPP platform at scale requires predictable throughput, controlled latency, resilient message handling, and clean operational boundaries between SMPP ingress, routing, asynchronous processing, and observability. The most practical approach for many teams is a single installation: one AWS region, one Amazon EKS cluster, and a set of AWS managed services to reduce operational overhead.
This post outlines a clear, production-grade architecture where SMPP components are deployed as separate Kubernetes Deployments inside one EKS cluster, while state and messaging are handled by managed AWS services.

Traffic flow:
SMPP clients connect via TCP (commonly port 2775) to an AWS Network Load Balancer (NLB).
NLB forwards to Jasmin SMPP pods running in EKS.
Jasmin hands off work to router and worker services (also in EKS).
Messages, receipts, retries, and async jobs are buffered using Amazon SQS and/or Amazon MQ (managed).
Persistent storage and reporting live in Amazon Aurora (RDS).
Logs/metrics go to CloudWatch, long-term retention to S3.
This is not active-active and not multi-region. It is a single-cluster design with clear separation through Kubernetes objects.
A common failure mode in SMPP systems is mixing everything into one monolithic service. Instead, use separate Deployments, each with its own scaling, resource limits, and rollout strategy.
jasmin-smpp-gatewayPurpose: SMPP ingress and bind management
Why separate: SMPP TCP handling has distinct scaling and connection constraints
Exposed via Kubernetes Service of type LoadBalancer (backed by NLB)
Scaled by connection load and CPU
Can be isolated to dedicated nodes using taints/tolerations if needed
smpp-routerPurpose: Routing logic, throttling policies, connector selection
Why separate: Routing rules change often and should be deployable independently
Reads routing rules from Aurora (or config store)
Applies rate limits and policies per customer/connector
Sends async work to SQS/MQ
smpp-workersPurpose: Async processing (submit_sm, retries, DLRs, filtering, enrichment)
Why separate: Workers scale horizontally and should not impact ingress stability
Consumes from Amazon SQS (recommended for elastic queueing) and/or Amazon MQ (when you require broker semantics)
Scales using KEDA (queue depth-driven autoscaling) or HPA
dlr-processorPurpose: Delivery receipts (DLRs), callback webhooks, final state updates
Why separate: DLR spikes can be large and must not backpressure ingress
Writes final message status to Aurora
Optionally pushes delivery updates to customer endpoints via async queue
admin-apiPurpose: Management API for customers, binds, rules, reporting, provisioning
Why separate: Admin/API release cycles differ from SMPP runtime components
Auth via IAM Identity Center / Cognito (depending on your approach)
Talks to Aurora and publishes jobs/events
monitoring-exportersPurpose: Metrics, probes, dashboards
Why separate: Keeps observability changes independent and stable
CloudWatch Container Insights / Prometheus exporters (your choice)
Alerting on bind failures, error rate, queue lag, DLR lag
Use Aurora PostgreSQL or Aurora MySQL for:
Customer accounts, routes, bind configs
Message metadata, reporting tables
DLR final state
Why Aurora:
Managed backups and maintenance
High read scalability (read replicas) when reporting grows
Use SQS for:
Message buffering between gateway/router and workers
Backpressure protection when downstream is slow
KEDA-based autoscaling using queue depth
Why SQS:
Simple, highly scalable queueing
Excellent fit for worker fan-out and retries
Use Amazon MQ when you need:
AMQP/JMS compatibility
Broker features your ecosystem depends on
If you do not explicitly require broker semantics, SQS is typically simpler at scale.
Centralized logs for ingress/router/worker components
Alarms for: bind disconnect spikes, error rates, queue backlog, latency
Long-term log retention and audit trails
Exported reports, message archives, compliance artifacts
Scale based on CPU + connection metrics (where available)
Keep ingress stable by limiting per-pod connections
Use PodDisruptionBudgets to avoid mass disconnects during upgrades
Use KEDA to scale workers from queue depth (SQS)
Separate worker types (submit_sm vs DLR vs retries) into distinct Deployments if needed
Usually lightweight; scale with CPU and request rate
NLB for SMPP TCP ingress (stable, high throughput)
Security groups restrict inbound SMPP ports to allowed networks
IRSA (IAM Roles for Service Accounts) for AWS access (SQS, CloudWatch, S3, Secrets)
Secrets stored in AWS Secrets Manager (recommended)
This design is effective because it:
Keeps the platform operationally simple (one installation)
Provides clear service boundaries via Deployments
Offloads persistence, queue durability, and retention to AWS managed services
Enables independent rollouts and scaling for each SMPP component
No. This is a single EKS cluster installation in one region, with internal HA coming from Kubernetes replicas and AWS managed services.
Only if your workloads require broker protocols/semantics (AMQP/JMS). For most SMPP async processing, SQS is sufficient and simpler.
Yes—by splitting gateway/router/workers into separate Deployments and using queue-driven autoscaling (KEDA) for workers.