10 Ways to Secure Big Data at Rest and In-Transit

Cloud computing has made collecting, storing, and processing data easier and cheaper than ever. Agencies are increasingly leveraging big data to drive actionable insights and improve cyber threat detection. Furthermore, more than 60% of agencies are using big data to reduce costs and operating expenses, writes Information Week.

As the volume of big data and the number of platforms for accessing that data grows, the more pressing it becomes to ensure its security. Unfortunately, today’s security systems were designed to protect data that resides on static servers and computer hard drives, but the distributed and in-transit nature of big data (in and out of clouds, mobile, etc.) creates a new challenge for InfoSec teams.

So how do you protect all that data – at rest and in-transit?

The Cloud Security Alliance (CSA), a leading organization dedicated to defining and raising awareness of best practices to secure cloud computing environments, recently released its 100 Best Practices in Big Data Security and Privacy handbook.

The handbook lists the 10 major challenges in big data security and privacy and suggests best practices for thwarting those threats. Below is a summary of each of the points and a snapshot of the recommended solutions:

1. Securing Computations in Distributed Programming Frameworks

To address this challenge, CSA recommends using authentication to establish initial trust. Periodically check the security properties of each team member, use role-based access controls and authorize access to files with predefined security policies. Remove or mask all personally identifiable information (PII) from all data to prevent the identity of the data subject being linked with external data. Use proper encryption to prevent data leakage. To avoid attacks in cloud and virtual environments detect fake nodes and check for altered copies of data.

2. Securing Non-Relational Data Stores

Non-relational data stores such as NoSQL databases tend not to have robust security. Implement encryption to protect passwords and safeguard data while at rest and use transport layer security for in-transit data. To expose vulnerabilities caused by insufficient input validation in NoSQL, use invalid, unexpected, or random inputs by deploying dumb fuzzing and smart fuzzing strategies.

3. Securing Data Storage and Transaction Logs

CSA recommends a number of practices, including implementing exchange of signed message digests to address potential disputes. Employ SUNDR to store data securely on untrusted servers. Improve scalability with broadcast encryption, lazy revocation and key rotation. To reliably verify that data uploaded to the cloud is available and intact, implement proof of irretrievability (POR) or provable data possession (PDP). To securely store sensitive data, implement a secure cloud storage system called cryptographic cloud storage.

4. Achieving the Best Possible Input Validation/Filtering in BYOD Models

In addition to using traditional endpoint protection to detect, filter and block malicious inputs, CSA suggests generating models that represent “normal” behavior and then detect outliers or deviations from normal input. This model-based approach limits the amount of additional computation needed on resource-constrained endpoint devices.

5. Real-Time Security and Compliance Monitoring

Detect un-authorized connections to a cluster using big data analytics. CSA endorses solutions like TLS/SSL, Kerberos, SESAME, IPsec, or SSH. Security tools like a SIEM system can also be used to monitor anomalous connections. If your data lives in the public cloud, consider cloud-level security and look for CSA STAR-certified compliant vendors. Other considerations include cluster- and application-level security.

6. Protecting Data Privacy During Analytics

Data anonymization, which is designed to sanitize information (usually through encryption or removing PII) is inadequate for ensuring privacy in a big data environment, says CSA, since threat attackers can easily link two or more databases to identify PII. To combat this problem, CSA recommends implementing differential privacy and homomorphic encryption, among other things.

7. Exploring Cryptographic Technologies for Big Data

Since computational tasks can’t be performed on encrypted sensitive data sent to the cloud, homomorphic encryption can help. That being said, the computation cost of homomorphic encryption is high. CSA suggests limiting its features to overcome this challenge. If you’re looking to compare encrypted data without sharing encryption keys, apply relational encryption to match IDs, attribute values, etc. among data encrypted with different keys. Check out the handbook for more best practices for implementing identity-based encryption, attribute-based encryption, and convergent encryption.

8. Reduce Data Restriction without Violating Policies with Granular Access Control

Granular access control broadens data sharing without a large administrative overhead. Single sign-on (SSO) solutions can offload user authentication tasks to enterprise or cloud systems. If your data analysis spans multiple providers, federating the authorization space can help you maintain control of data access.

9. Establishing Granular Audits

Granular audits can pick up on true positive alerts that your real-time security monitoring system may have missed. Use a SIEM solution and audit and forensic tools to process information collected during an audit. Leverage that information to provide a full audit trail. Be sure to store audit information separately and implement access control to safeguard the information the audit has exposed. Then, separate your audit data from other big data to enforce separation of duties.

10. Securing Provenance Data

Provenance metadata describes the steps by which the data was derived, and increases exponentially as the volume of big data grows. Securing this data is of great importance, says CSA. To prevent access and misuse of provenance data develop infrastructure authentication protocol. Protect data across mobile devices with trust and reputation systems. Implement encryption as data is transmitted to cloud servers and limit access to shared data in the cloud with role-based access control (RBAC).

Read the full handbook here.