Security Considerations for Data Stream Processing

In today’s day and age, it’s hard to overstate the importance of data. Every day organizations throughout the world build complex systems to collect, analyze and derive insights from massive amounts of data in an effort to gain a competitive edge within their industry. With that said, it’s a simple step to then see the value in technologies that allow for acceleration in data processing. The earlier an organization can obtain insights from their data, the earlier they can leverage these insights to provide them with a leg up on the competition.

Stream processing, and the tools that support such a strategy, have enabled organizations to utilize data in the most time-efficient manner possible. But with data stream processing comes security considerations that need to be addressed. Read on for an introduction to the basics of stream processing, an overview of related security challenges and a few tips for managing these challenges effectively.

About Stream Processing

Stream processing refers to the execution of a process on data from an event stream, oftentimes shortly after the event is created. In other words, as data is received a stream processing application acts upon this data in near real-time. Stream processing, therefore, allows for data to move through a data pipeline in a manner that enables the analysis or utilization of this data in the most efficient manner possible. This is in opposition to batch processing where data is collected and stored to be operated upon at a later time.

A stream processor can have many jobs. For instance, such an application may simply be responsible for creating more specific events which are then detected by another stream processing application further down the data pipeline. In other cases, a stream processor could be tasked with performing real-time operations on event data to provide an alert to a particular condition represented by the data point (i.e. alerting a stockbroker that a particular stock has hit an all-time high, possibly indicating a time to sell) or to provide context to data that has been processed over time (i.e. the real-time percentage change in a stock’s price over the past year).

Today, there exist several stream processing platforms to support the building of stream processing applications. One such platform (of great popularity) is Apache Kafka. It’s hard to discuss stream processing platforms without discussing Kafka, just as it is hard to discuss data-based applications without recognizing the importance of properly securing these applications. Thus it’s easy to see why implementations of big data processing and real-time analytics must be configured with security in mind. Let’s take a look at some of the security challenges presented when dealing with stream processing.

Challenges in Securing Stream Processes

There are a few major security aspects to take into consideration when analyzing the soundness and reliability of a stream processing application.

Access Control – While it’s easy to see why access control is important in any system, it’s particularly important when being granted access means gaining a level of control over critical data. This is a scenario not uncommon when dealing with big data processors such as that involved in stream processing applications. Let’s consider an organization working with Apache Kafka. Such an organization will have developed producers and consumers that communicate with a Kafka cluster on a continuous basis.Without access controls in place, any client could be configured to read from or write to any particular topic within the Kafka system. As your organization matures and begins to utilize the Kafka cluster on a larger scale and for various types of data (some more important than others), it’s likely that a fly-by-night approach to securing the implementation of the platform will not suffice. In this instance, it’s critical that all clients attempting to read or write to the cluster be properly authenticated and authorized for the topics with which they are attempting to interact, to ensure that improper access to potentially sensitive data is prevented.

Fortunately, Kafka has solutions for this including leveraging SSL for client authentication and utilizing access control lists (ACLs) to authorize particular applications to read from or write to particular topics – thus providing secure access on a more granular level.

Data Security – Just as authentication and authorization are critical security aspects to consider when dealing in stream processing, ensuring the security of the data as it is being transmitted over the network by such applications is also of the utmost importance. Continuing to use Kafka as our example platform, data-in-transit via write operations from producers to Kafka brokers and via read operations executed by consumers against these Kafka brokers, should be protected from undue access by unauthorized systems. In an effort to provide a more thorough security policy for such a stream processing implementation, this data should be encrypted when it is communicated between client applications and Kafka.

Effective Secrets Management for Stream Processing and More

Though security considerations can vary depending upon the application being developed, one step to take to implement proper application security remains constant; that is the use of effective secrets management practices, beginning with the use of a secrets manager like Conjur from CyberArk. With the use of a secrets manager, development teams are empowered to keep passwords, API keys and more in a secure and centralized location for secure access and proper use. No more hard-coding secrets within an application where they can be viewed by anyone with access to the source code.

Revisiting the scenario above, where we discussed securing data-in-transit when working with Kafka – In this instance, an encryption key is necessary to encrypt the data being transmitted to Kafka via a producer, and also to decrypt data being read from Kafka by a consumer. An encryption key used for these types of operations is an excellent candidate to be stored in a secrets manager to provide secure secrets access to authorized applications via a centralized location.

Join the Conversation on the CyberArk Commons

If you’re interested in this and other open-source content, join the conversation on the CyberArk Commons Community. Secretless Broker, Conjur and other open-source projects are a part of the CyberArk Commons Community, an open community dedicated to developers, engineers, cybersecurity researchers and other technically-minded people. To discuss Kubernetes, Secretless Broker, Conjur, CyberArk Threat Research, join me on the CyberArk Commons discussion forum.

Scott Fitzpatrick

Scott Fitzpatrick is a Fixate IO Contributor and has 8 years of experience in software development. He has worked with many languages and frameworks, including Java, ColdFusion, HTML/CSS, JavaScript and SQL.