Runtime security refers to the continuous, end-to-end monitoring and validation of all activity within containers, hosts, and serverless functions. It works by leveraging application control and allowlisting to establish a baseline of normal behavior for each host, container, serverless function, and other objects within a cloud-native environment. Through real-time observation of file systems, processes, and network activity, runtime security tools detect suspicious or anomalous activity and alert teams as needed.
Real-time security monitoring, of course, isn’t new. For almost two decades, security information and event management (SIEM) platforms have been monitoring application environments for anomalies. So what’s different about cloud-native runtime security?
The difference centers on the environment.
By automating security for fast-moving, dynamic applications like those that run in containers, runtime security addresses the unique security and compliance needs of cloud-native environments.
Cloud-native runtime security operates in environments moving so fast that baselines, in the traditional sense, don’t exist. When clusters change as nodes come offline or containers spin up and down (or load balancers redirect traffic between instances. etc.), conventional data sources like logs and network traffic are incapable of detecting anomalies.
Runtime security in the cloud-native environment works on a deeper level, establishing a dynamic baseline by interpreting how behavioral trends vary over time. From here, runtime security tools can detect changes in internal container processes, file system activity, and so on, that deviate from the norm — even within environments that rapidly scale.
Put another way, runtime defense is the set of features that provide predictive and threat-based active protection for rapidly changing environments.
Runtime security focuses on safeguarding containers during their execution — when they’re active, operational containers and most vulnerable to malicious activity. Traditional security tools weren’t designed to monitor running containers.
Using AI and machine learning, runtime security automates the process of modeling healthy activity.
Modeling refers to the process of creating a representation of normal, safe behavior for applications and services running in a cloud-native environment. This representation, or model, serves as a baseline to identify and detect deviations or anomalies that might indicate security threats.
By continuously monitoring and comparing the runtime activities of applications and services against the established model, teams can identify and respond to unauthorized actions, privilege escalations, and other potential incidents.
A runtime security solution like Prisma Cloud implements individual sensors for file system, network, and process activity, each with a unique set of rules and alerting. The unified runtime defense architecture simplifies the administrator experience and provides detail about what the solution learns from each image. Within this framework, runtime defense consists of two main object types — models and rules.
Models are generated from the autonomous learning of a container runtime security solution and represent the allowed activities for a given container image across all runtime sensors. They offer administrators an overview of what the system has learned about their images. An Apache image model, for example, would specify the processes that should run within the container and the exposed network sockets.
Models are built from static analysis, like hashing process maps based on Dockerfile ENTRYPOINT scripts, and dynamic behavioral analysis, like observing actual process activity during early container runtime. Models can be in active, archived, or learning mode.
Some containers, like Jenkins containers, are difficult to model due to their dynamic nature. A container runtime security solution can automatically detect known containers and enhance the model with capabilities, tuning runtime behaviors for specific apps and configurations without changing the learned model.
Learning mode is when the container runtime security solution performs static or dynamic analysis. Images stay in learning mode for one hour, followed by a 24-hour "dry run" period to ensure model completeness. If behavioral changes are observed during the dry run, the model returns to learning mode for an additional 24 hours. During learning mode, only threat-based runtime events are logged.
Active mode is when the container runtime security solution enforces the model and looks for anomalies that violate it. Active mode begins after the learning mode's 1-hour period. The system monitors for variances against the model, such as unexpected processes.
Archived mode occurs when a container no longer actively runs a model. Models persist in archived mode for 24 hours before removal. Archived mode serves as a recycle bin for models, ensuring that frequently starting and stopping images don't need to re-enter learning mode.
Rules control how a container runtime security solution uses autonomously generated models to protect an environment. They allow or block activities by sensor and are evaluated together with models to create a resultant policy:
model + allowed activity from rule(s) - blocked activity from rule(s) = resultant policy
For example, if a model allows the httpd process and you want to ensure the bar process is allowed while the foo process is blocked, you can create a rule for all httpd images, add bar to the allowed process list, and add foo to the blocked process list.
Via models and rules, a runtime protection solution automatically learns how applications behave under different conditions. Users can then distinguish normal shifts in application behavior from those that reflect a security problem.
Identifying new vulnerabilities in running containers relies on knowing what normal looks like — even in dynamic environments. With dozens of microservices to manage and hundreds of containers, serverless functions, and VMs hosting them, teams don’t have time to manually collect behavioral data and configure behavior models. Organizations must leverage enhanced runtime protection capable of identifying and investigating suspicious activities potentially indicating zero-day attacks.
In addition to modeling safe behavior, runtime defenses should automatically define and enforce allowed and disallowed actions for each container, serverless function, or objects in the environment. This includes determining which other containers a given container can communicate with and the type of communication allowed, as well as specifying which data storage volumes can access it. Enforcing these rules is essential for limiting the impact of a potential security breach.
Runtime security tools need to automate defenses and alert your team when manual intervention is required. To achieve this, they should monitor and send alerts for suspicious changes in processes, network connections, or file system read/writes within cloud-native infrastructure. They must also be able to decide whether to send an alert based on dynamic alert rules. Static alerting rules are insufficient for addressing the evolving nature of cloud-native threats, given that activity appearing threatening at one moment may prove benign at another.
Runtime security represents only one layer of defense that should exist within your organization’s cloud-native security tech stack. Particularly when working with highly distributed, containerized microservices, you’ll want your runtime protection to integrate with security solutions addressing the additional layers of your ecosystem.
Automated data security protections, access control, auditing tools, container image scanners, and so on, are equally important. Your runtime security solution must be able to integrate with other security tools to provide full depth and context for incidents, as well as an understanding of how a threat at one layer of your tech stack (like the runtime environment) impacts another (like data at rest).
Although runtime security is capable of mitigating the impact of a breach after it occurs, your runtime solution will ideally allow you to find and remediate threats in real time, before they have an opportunity to escalate.
By delivering control over file systems, processes, and network activity for each container and serverless function, your runtime security solution should mitigate damage that could result if a security breach occurred within the environment. It should automatically model application-safe behavior and enforce rules that prevent dangerous activity on the container or host, ultimately preventing situations such as a compromised container executing processes that spread to other containers or the host.
Incident response hinges on the data collected by your runtime security solution. By capturing and storing audit data for cloud-native applications, it provides teams with the information needed to understand what went wrong in the wake of an incident, even if the cloud-native environment no longer exists in its earlier form when the investigation occurs.
Runtime security best practices serve to safeguard applications and infrastructure from runtime threats. By implementing proactive measures, organizations can minimize vulnerabilities, detect malicious activities, and limit the impact of security breaches.
Monitoring only part of your environment or focusing on only key services or infrastructure isn’t enough to detect all security threats. For optimum results, apply runtime security to all layers of your environment and use it to protect both development and production workloads.
Because every host, container instance, and serverless function in your cloud-native environment has a unique configuration and behavior you should model each object separately. Don’t assume all containers will behave the same — not even those based on a common container image. Operating from a sweeping assumption will lead to a sampled approach that limits visibility into security incidents.
At the core of container runtime security is the monitoring and filtering of system calls made by processes within containers. System calls act as an interface between applications and the operating system kernel, allowing applications to request resources or services. By monitoring and controlling these calls, organizations can detect and prevent unauthorized actions, privilege escalations, and other malicious activities.
Falco is an open-source runtime security tool that monitors system calls and network activity, detecting and alerting on suspicious behavior. Also open source, Seccomp filters and restricts system calls, providing granular control over the actions of processes in containers.
Regularly scanning containers for known vulnerabilities and malware during runtime is essential in identifying and addressing security risks. Continuous scanning ensures that organizations can detect newly discovered vulnerabilities and take appropriate action to secure their container environments.
Employ a runtime scanning solution that can detect unknown vulnerabilities and malicious code execution. Additionally, consider integrating threat intelligence feeds to stay updated on the latest threats and vulnerabilities affecting container environments.
Incorporate advanced network segmentation and traffic monitoring techniques by utilizing tools like Cilium or Calico to enforce network policies and enable microsegmentation. Leverage service mesh technologies, such as Istio or Linkerd, to encrypt container-to-container communication and implement fine-grained access controls. Use network monitoring and analysis tools to capture and analyze container traffic, facilitating the detection of anomalies and potential security threats.
Implement and maintain compliance for Docker, Kubernetes, and Linux CIS Benchmarks, as well as external compliance regulations and custom requirements. Remember to consider that, by default, Kubernetes APIs offer various easy privilege escalation routes. In a multitenant cluster, using certain features can introduce instability, so proceed cautiously when deploying them.
Policy engine management solutions like Kyverno and OpenPolicyAgent (OPA), or a CSPM like Prisma Cloud, help ensure that containers adhere to policies aligned with standards like PCI DSS, HIPAA, GDPR, ISO 27001:2013, and NIST. Custom policies can also be created to enforce organizational standards.
Policy use cases detect a myriad of activities, including account hijacking attempts, backdoor activity, network data exfiltration, unusual protocol, and DDoS activity. Once a threat is detected, an alert is generated, notifying administrators of the issue so that they can respond quickly. Many policies map to the MITRE ATT&CK Enterprise IaaS Matrix, providing a comprehensive roadmap for securing your cloud assets.
Implement a regular auditing process that scans all layers of your Kubernetes cluster and configurations to ensure they align with industry standards and best practices. Audits won’t necessarily detect threats in real time, but they will help you stay ahead of security problems or misconfigurations you may be overlooking that could give attackers an entry point to your cluster or applications.
Implementing monitoring and logging solutions for container activities enables organizations to detect and respond to security incidents in real-time, mitigating potential threats and facilitating incident response. Tools like Grafana, Jaeger, Prisma Cloud, and Prometheus provide visibility into container performance and health, enabling proactive management. Key metrics include cluster state, node status, pod availability, memory, disk, and CPU utilization. Monitoring helps identify configuration issues and ensures that containers meet business needs.
Monitoring Level | Metrics | Description |
---|---|---|
Cluster |
Cluster Nodes |
Measure how many nodes are available, which helps determine the cloud resources required to run the cluster. |
Cluster Pods |
Measure how many pods are running to help determine if you have sufficient nodes available to handle your overall workload in the event of a node failure. |
|
Resource Utilization |
Measure the computing resources utilized by your nodes, including memory, CPU, bandwidth, and disk utilization. |
|
Pod |
Container Metrics |
Monitor network utilization, CPU, and memory usage. These metrics, held up to DevOps-prescribed maximum values, determine if pods are running as designed. |
Application Metrics |
These metrics are application-specific and based on business use cases, for example, the number of concurrent users accessing the application, number of entries published or purged, user experience, etc. |
|
Kubernetes Scaling and Availability Metrics |
By monitoring the orchestration tool and how it handles a specific pod, you can see the number of pod instances at a given moment (compared to the expected number). These metrics will provide health checks of pods and applications, network data and on-progress deployments. |
Table 6: Strategic runtime metrics
With metrics, teams can understand whether microservices or individual container-based applications are running as expected and meeting desired business needs through scale-out or scale-in automation and analytics based on expected traffic.
Reviewing metrics also proves beneficial when considering horizontal scale-out approaches for container-based applications, microservices, and security-based products like Palo Alto Networks CN-Series firewalls. Having an effective monitoring strategy in place ensures higher uptime for services with minimal degradation and performance issues.
Additionally, understanding resource consumption, service configurations, and usage helps reduce operational and development costs. This insight can assist in daily operations efforts and gauging CI/CD pipeline health.
When selecting a monitoring and logging solution, keep in mind the metrics you’d like to observe. Many tools have the capacity to address a range of reporting for a multitude of applications and integrations.
Monitoring Kubernetes Clusters and Nodes |
Cluster resource usage |
Is the cluster infrastructure underutilized? |
Is the cluster infrastructure over capacity? |
||
Project and team chargeback |
||
Node availability and health |
Do we have enough nodes available to replicate the applications? |
|
Will we run out of resources? |
||
Monitoring Kubernetes Deployments and Pods |
Missing and failed pods |
Are all the necessary pods running for each of the applications or microservices? |
How many pods are dead or crashing? |
||
Running vs. desired instances |
How many instances for each microservice is actually ready? |
|
What is the expected number of microservices meant to be ready? |
||
Pod resource usage against requests and limits |
Is the pod's resource usage within the configured CPU and memory requests and limits? |
|
What is the expected number of microservices meant to be ready? |
||
Monitoring Kubernetes Applications |
Application availability |
Is the application responding? |
Application health and performance |
How many requests are we seeing? |
|
What is the responsiveness or latency for this application? |
||
Do we have any errors? |
Table 7: Additional metrics to consider, depending on use cases aligning with your organizational needs.
In the event of a security incident, container runtime security tools can provide valuable data for investigation and remediation. This includes logs, system calls, and other forensic evidence that can help to identify the source of the attack and prevent future occurrences.
Container escape is a significant threat during runtime. It occurs when an attacker breaches a container's isolation, accessing the host system. Preventing this requires minimizing container privileges and avoiding critical mount points. Following best practices like CIS benchmarks for Docker and Kubernetes is essential.
Employing a defense-in-depth approach to container security by implementing multiple layers of protection, including runtime security, image scanning, network segmentation, and host security helps organizations build a resilient security posture.
For instance, in addition to container network security via containerized next-generation firewalls, container runtime protection can serve as another layer of security to block malware. Runtime protection can also incorporate web application and API security to prevent HTTP-based Layer 7 attacks, such as the OWASP Top 10, denial of service (DoS), or bots.
Container security must be addressed as part of a holistic enterprise cloud security strategy. While it’s tempting to add yet another security tool to the arsenal, addressing container and cloud security separately tends to leave organizations blind to risks that an otherwise integrated strategy would address. Mature organizations see containers as an essential component of their cloud infrastructure and address them with a centralized platform approach, typically leveraging a CNAPP.
If your security team is reactively focused on securing your applications during runtime, take a step back and consider the entire development and deployment process. While it's crucial to ensure the end state (runtime) is secure, concentrating solely on runtime security may cause you to overlook vulnerabilities or early-stage security issues that will likely repeat with a narrow approach.
By working backward, you can evaluate and address security concerns throughout the entire development lifecycle, from design and coding to testing and deployment. The holistic strategy will help you identify and fix issues before they become problems in the runtime environment, reducing the chances of repeating the same security issues.