Before You Deploy AI, Fix Unstructured Data Security

Lynne Murray | Director of Product Marketing for Data Security More About This Author >

CYBERSEC GLOSSARY

What is Unstructured Data Security?

Unstructured Data Security is the practice of discovering, classifying, and protecting sensitive data in unstructured formats—such as emails, documents, chat logs, and AI-generated content—to maintain visibility, reduce risk, and address gaps where traditional security tools fail to identify and secure hidden data.

Today’s reality is tomorrow’s risk. Unstructured data—documents, emails, messages, images, logs, and AI-generated content—is growing exponentially and spreading across every environment organizations depend on. It now represents the majority of the data enterprises create and store, including their most sensitive information.

Unlike structured data, unstructured data is highly distributed, constantly changing, and difficult to systematically classify and govern. As a result, sensitive information frequently ends up in places it shouldn’t be shared too broadly, stored in cloud platforms without adequate controls, or unintentionally exposed through misconfigurations. What begins as everyday data sprawl quickly becomes a widening exposure gap that traditional security models struggle to manage.

Attackers understand this shift. They no longer focus solely on databases; instead, they target documents, file shares, inboxes, and collaboration platforms where sensitive content, business logic, and personal data are less protected. These exposed assets can be exploited for extortion, fraud, credential abuse, or long-term infiltration.

Report: The Rise in Unstructured Data and AI Security Risks

A recently released survey report by the Cloud Security Alliance (CSA) – The Rise in Unstructured Data and AI Security Risks – reveals that traditional security and governance practices are struggling to keep pace with the rapid growth of unstructured data. The survey, commissioned by Thales found that while three-quarters of organizations are confident in their ability to secure unstructured data, more than two-thirds (68%) report that less than 80% of their unstructured data is protected—a security gap exacerbated by limited visibility and distributed environments.

Today, sensitive enterprise data is increasingly commingled with ungoverned and high-risk content (aka “toxic” data) across file shares and platforms, creating material exposure that automation and AI can unintentionally amplify.

When sensitive or regulated data (e.g., PII, PHI, PCI, export-controlled data) is exposed, misclassified, or uncontrolled beyond defined thresholds, the incident triggers breach response protocols, resulting in a time-consuming, enterprise-level incident—not a localized technical issue.

At a Glance

Unstructured data is the fastest-growing and least controlled part of the enterprise data footprint
Limited visibility and inconsistent classification leave sensitive data exposed
Traditional security models don’t scale to distributed, dynamic data environments
Attackers increasingly target unstructured data where controls are weakest
AI amplifies these gaps—accelerating risk without strong data foundations
Organizations must prioritize visibility, classification, and governance before scaling AI

The Rise in Unstructured Data and AI Security Risks

Security is the #1 unstructured data concern for organizations

Although awareness of unstructured data risk is high, research from the Cloud Security Alliance shows many organizations struggle to translate that awareness into scalable, effective security outcomes—particularly as cloud adoption, automation, and AI accelerate.

Get the Report

Key Findings: Exposure Gaps Undermining Compliance, Security, and AI Readiness

Enterprises are creating and sharing more unstructured data than ever, across on-premises, cloud, SaaS, and collaboration platforms. These files and communications often contain sensitive information—PII, financial data, healthcare data, source code, and IP—yet the controls for finding, classifying, protecting, and monitoring them haven’t kept pace. The result is a widening exposure gap: more data, in more places, with less certainty about what is sensitive and how well it is protected.

1. Higher long-term risks as unstructured data becomes the growth engine of the enterprise

As unstructured data becomes a larger share of enterprise data and the primary driver of growth, organizations face a rapidly expanding attack surface. This growth is not incremental; it is structural.

Sensitive unstructured data is widely distributed: cloud apps (58%), file servers (57%), public cloud (47%), on-premises databases (46%), and cloud collaboration (45%).

As data sprawls across hybrid and multi-cloud environments, many existing security models struggle to keep up. The result is rising storage, security, and compliance costs, combined with reduced visibility into where sensitive or regulated data resides. The business impact is higher long-term risk and diminishing control efficiency.

2. Gaps in visibility and classification limit timely response to data exposure

Organizations consistently rank security, governance, privacy, and compliance as top concerns, yet struggle to execute them at scale. The most difficult capabilities—protecting sensitive data, detecting risky behavior or unauthorized access, maintaining privacy controls, and performing effective classification—highlight gaps in foundational controls.

Only 35% report full visibility into where unstructured data resides; more than half (56%) of those surveyed indicated that they have only partial visibility into where data is stored.

These challenges are amplified in cloud and SaaS environments, where data leakage risk is higher, and organizations face growing difficulty identifying, masking, and even locating sensitive data across distributed systems.

Organization reported that their toughest functions are: sensitive data protection (44%), detecting bad practices (42%), privacy (39%), detecting unauthorized access (39%), classification scanning (36%).

Limited visibility into data location, sensitivity, and access prevents leaders from accurately assessing exposure or prioritizing remediation. Without reliable discovery and consistent classification, risk triage and control enforcement are uneven by default—especially in cloud-first collaboration and AI-heavy workflows.

Top Unstructured Data Security Priorities

Key Findings from The Risk in Unstructured Data and AI Security Risks

3. Overconfidence in security posture delays risk response

When organizations believe their unstructured data is protected despite significant coverage gaps, risks remain hidden until thresholds are exceeded.

Of the organizations surveyed, seventy-five percent expressed moderately to highly confident in securing unstructured data, while 68% admitted to less than eighty percent of unstructured data is protected.

This false confidence delays action, allowing minor control failures to grow into enterprise-level incidents that demand breach response, legal review, and executive involvement. Confidence may be based on compliance or partial visibility—not on comprehensive, measurable coverage. This creates blind spots adversaries exploit first.

4. Tool sprawl and fragmented ownership causes higher operational cost

Excessive tools and shared accountability create inefficiencies, manual processes, and gaps between teams.

According to the survey, organizations rely on a fragmented and complex tool landscape to manage unstructured data, with nearly one-third using 11 or more tools and a notable share relying on more than 20. This tool sprawl extends across encryption, cloud security, application security, and identity controls adding operational friction. More than a third of respondents (38%) say fragmented tools make securing cloud data more difficult.

The business impact includes longer detection and response timelines, duplicated spend, difficulty in demonstrating compliance, and reduced scalability as data volumes continue to grow. It’s not the absence of tools—it’s the absence of integration, automation, and clear ownership. Operations drag increases as data scales.

5. AI is both an accelerator and a risk multiplier

AI is now both a force multiplier and a risk amplifier, and its value depends on the strength of the foundations beneath it. While AI promises improved detection and automation, deploying it without baseline visibility and scanning can amplify existing blind spots.

As the report underscores, organizations must address these foundational pillars first to ensure AI enhances data protection rather than magnifying existing exposure.

Organizations view AI as both a top future threat (47%) and a core security capability (40%) for unstructured data. While many said they plan to rely on AI for detection, classification, and automation, foundational gaps remain. Only 9% of organizations reported having real-time scanning capabilities, and 23% cannot scan at all, raising concerns that AI may amplify existing blind spots rather than improving security outcomes.

Business risk increases as AI systems act on incomplete data, potentially accelerating data exposure, misclassification, or unauthorized access faster than organizations can respond. Deploying AI atop incomplete inventories and delayed detection will amplify blind spots. AI delivers value only when discovery, classification, and governance are strong.

Most Businesses Lack Unstructured Data Visibility

23% of Organizations Can't Scan Unstructured Data

AI Threats are the Top Security Risk to Unstructured Data

Operationalize the Foundations at Scale

Unstructured data security is now a systems challenge, not a point problem. The organizations that operationalize the basics at scale—visibility, classification, governance, and streamlined operations to unlock effective automation and AI-driven security. This approach will accelerate the transition from reactive breach response to a proactive security posture that systematically reduces exposure before incidents escalate to enterprise-level consequences.

Organizations can quickly gain control and reduce unstructured data risk while establishing durable governance for regulatory compliance and AI adoption. To do so, you must begin by clearly defining sensitive data, standardizing a limited set of enforceable enterprise policies, and assigning accountable ownership across Security, IT, Data Governance, and the business.

Here is an example of a 90-day action plan:

30 Days

In the first 30 days, the organization gains visibility into where sensitive data resides and immediately reduces risk by remediating open access, stale sharing, and weak authentication.

Understanding that discovery alone isn’t enough—you must understand how risky each data set is, therefore you must review your data governance practices by applying classification best practices and risk assessments to contextualize the data. This step determines severity and priority. You must be able to answer these key questions:

What type of sensitive data is it? (PII, PCI, IP, source code, contracts, health data)
Who owns it and who currently has access?
Where is it stored and how exposed is it?
Does it violate internal policies or regulations if breached?

60 Days

By 60 days, teams apply protections to the most critical datasets, automate common exposure scenarios, and rationalize redundant tooling to improve efficiency and control costs.

90 Days

At 90 days, continuous monitoring, standardized incident response playbooks, and governed AI access are in place—positioning the company to sustain lower data risk, respond faster to incidents, and enable AI use without compromising compliance, trust, or shareholder value.

By prioritizing these practical steps to address the highest-impact risks first, your organization can reduce today’s material exposure while establishing consistent guardrails that scale. This approach balances immediate protection with long-term readiness—lowering current risk, improving accountability, and building a durable foundation for future growth, regulatory resilience, and AI adoption.

The Imperative is Clear: Foundations Decide Outcomes

While many leaders are confident in the security controls they have in place, those measures were designed for a different data era—and with coverage and investment increasingly lagging unstructured data is in the cross-hairs for consequential. By securing sensitive digital assets, organizations materially reduce the risk of breaches, regulatory violations, intellectual property loss, and operational disruption, while also enabling the safe and responsible use of generative AI and automation.

AI is now both a force multiplier and a risk amplifier, and its value depends on the strength of the foundations beneath it. As the report underscores, organizations must address these foundational pillars before deploying AI; ultimately, without strong foundations, growth and innovation will outpace control—widening exposure and eroding trust.

To explore the full findings, global insights, and expert analysis, download CSA Report: The Rise in Unstructured Data and AI Security Risks.

For an expert analysis of the report that touches on all the key findings, check out our joint webinar with Jon-Rav Shende, Chief Technology Officer at Thales, and Hillary Baron, AVP of Research at CSA.

Security is the #1 unstructured data concern for organizations

Get the Report

July 16, 2026

Quantum Risk Is No Longer Theoretical: Key Takeaways From Thales Trust Horizon Executive Briefing

July 9, 2026

THALES BLOG

Before You Deploy AI, Fix Your Unstructured Data Security — New CSA Research Explains Why

What is Unstructured Data Security?

At a Glance

Security is the #1 unstructured data concern for organizations

Key Findings: Exposure Gaps Undermining Compliance, Security, and AI Readiness

1. Higher long-term risks as unstructured data becomes the growth engine of the enterprise

2. Gaps in visibility and classification limit timely response to data exposure

3. Overconfidence in security posture delays risk response

4. Tool sprawl and fragmented ownership causes higher operational cost

5. AI is both an accelerator and a risk multiplier

Operationalize the Foundations at Scale

30 Days

60 Days

90 Days

The Imperative is Clear: Foundations Decide Outcomes

Security is the #1 unstructured data concern for organizations

Related Articles

Quantum Risk Is No Longer Theoretical: Key Takeaways From Thales Trust Horizon Executive Briefing

Unlock Cloud Innovation with Independent Key Control from Thales

Securing AI at Scale: Thales Joins the HPE Unleash AI Ecosystem

How Financial Institutions Use FIDO2 for Secure Transactions

HIPAA's New Data Security Expectations Require More Than Encryption

Why Major Sporting Events Demand Converged Security