This article is originally posted on Nightfall.ai
Unstructured data is projected to account for approximately 80% of the data that enterprises will process on a daily basis by 2025. Data breaches and other security issues get a lot of attention in the media, but all businesses working with data, especially data in the cloud, are at risk of data loss. Preventing data loss can be difficult for a number of reasons.
IDG projects that by 2026, there will be 163 zettabytes of data in the world. To put that in context, one zettabyte is equal to a thousand exabytes, a billion terabytes, or a trillion gigabytes. The astronomical amount of data transmitting, living, and working in the cloud is just one of the complications that make securing data a tough task for businesses to manage. Of all the unstructured data in the world, most of it goes completely unused. According to industry analysts IDC, more than 90% of unstructured data is never examined. This means large portions of data float around unsecured and underutilized for many businesses.
That’s why it’s important to understand where unstructured data comes from, why it’s so hard to pin down, the risks of not securing unstructured data, and the rewards of bringing that data into a structured environment.
Hiding in plain sight
Unstructured data can come from almost any source. Nearly every asset or piece of content created or shared by a device in the cloud carries unstructured data. This can include:
- Product demo videos on your website
- QR codes for discounts and deals on an e-commerce app
- Podcasts and other audio blogging files hosted on your website’s blog page
- Social media messages on platforms like Facebook, Twitter, and LinkedIn
Internal communications and collaboration platforms are major sources of unstructured data. Think Slack, Confluence, and other SaaS applications where many people do their daily work and communicate with colleagues. Most cloud-based applications like these allow unstructured data to pass through massive networks to be shared, copied, accessed and stored unprotected.
IDG Communications published an article written by then-Pitney Bowes Software Vice President Andy Berry in 2018. Berry commented on how the modern workplace approaches data and why these norms contribute to the data loss problem, citing one study that found enterprises using almost 500 unique business applications. SaaS applications generate data that can quickly become obsolete, unusable, and eventually inaccessible.
Data powers everything we do in our professional and personal lives, but with little to no oversight on data hygiene, we often miss out on key opportunities to improve security blindspots and maximize data performance.
A complex problem
The various sources of unstructured data show how complex data loss can be. Many problems with DLP start with the three V’s of data — volume, velocity, and variety. It’s hard for humans and manual review to keep up with the staggering amount of data, speed of data proliferation, and the many different sources of data.
Adding to the problem is the fact that unstructured data is very difficult to organize. It’s impossible to dump every piece of unstructured information into a database or spreadsheet, because that data comes from myriad different sources and likely doesn’t follow similar formatting rules. On top of that, finding unstructured data through manual processes would take more time than there are hours in the day. It’s not a job for humans.
Other roadblocks to unstructured data collection include increasingly stringent privacy regimes, laws that protect intellectual property (IP) and other confidential or proprietary information like trade secrets, and businesses communicating across different security domains between the cloud and traditional hard-drive based storage systems. Information security is evolving at lightning speeds, but some schools of thought are still based on older priorities that focus on preventing outsider threats. It’s important to protect an organization from malicious actors, but what about good-natured, everyday workers who don’t know what they don’t know? That can still hurt an organization in tremendous ways.
Unstructured data isn’t all bad news. It can also be an opportunity for organizations that can recognize two main ideas. First, that this data must be gathered, protected, and understood. Second, that there’s value in all the data that is currently going unused. Computer Weekly cited sources that estimate modern businesses are utilizing as little as 1% of their unstructured data.
Our world runs on data, and each person interacting with apps, platforms, and devices contributes to the growing data reserves. When organizations think about gathering data to help with marketing, business intelligence, and other key functions, they must also factor in the impact of unstructured data. Unstructured data presents equal risk and opportunity for business leaders. When that data lives in the darkness, its only impacts are negative. But when data is brought into the light, we can use that data to be smarter and better at work.
Solving the unstructured data problem
Unstructured data is a major concern for organizations using cloud-based collaboration and communications platforms. Productivity relies on environments where co-workers can share ideas and messages quickly, without fear of exposing sensitive data. Nightfall, a data loss prevention (DLP) solution, provides much-needed security for today’s most used communications and collaboration platforms like Slack, Confluence, and many other popular SaaS & data infrastructure products.
Since these applications lack an internal DLP function, and each allows for the lightning-fast transmission of massive amounts of data, Nightfall’s machine learning based platform is an essential partner for many organizations handling sensitive information like PII (personally identifiable information), PHI (protected health information), and other business-critical secrets. Nightfall’s three step approach allows businesses to discover, classify, and protect unstructured data through artificial intelligence (AI) and machine learning (ML). Our solution makes sense of unstructured data, while traditional security solutions solely rely on users to help categorize data through methods like regular expressions (regex), which have limited accuracy in unstructured environments.
Each step of Nightfall’s ML solution is critical to the process of DLP. Discover means a continuous monitor of sensitive data that is flowing into and out of all the services you use. Classify means ML classifies your sensitive data & PII automatically, so nothing gets missed. Protect means businesses can set up automated workflows for quarantines, deletions, alerts, and more. These three arms of DLP save you time and keep your business safe — all with minimal manual process or review oversight from you or your staff.
Helping businesses identify and access unstructured data
Data is a part of life, especially as remote work becomes an essential function for productivity and collaboration. Business leaders must understand the risk of ignoring unstructured data and the value of making that data work for the business. It’s a tall order to identify and bring in a mass of unknown data to the cloud, but the rewards come with a better understanding of your organization, your industry, and your customers. Good things can come from unstructured data — as long as you’re ready to approach the issue with a solid data strategy and a knowledgeable DLP partner like Nightfall.