Enterprises are contending with unprecedented data growth, rising storage costs, and evolving expectations for how unstructured data can drive business value. The latest findings from Komprise shed light on where organizations stand in 2022 and how they’re shaping their data strategies in response. More than half of large enterprises report managing five petabytes or more of data, up from a minority in the previous year, signaling a rapid scale-up in data sprawl and the corresponding need for smarter data management. Meanwhile, a substantial share of IT budgets—about 68%—is now dedicated to data storage, backups, and disaster recovery, underscoring the cost pressures that accompany ever-expanding data estates. Against this backdrop, cloud storage remains the dominant approach, with almost half of organizations planning investment in cloud-based network attached storage and a similar share targeting cloud object storage. At the same time, ongoing efforts to extract value from unstructured data are driving a shift toward cloud-enabled data services and analytics, as enterprises seek competitive differentiation and operational resilience. The following sections unpack these themes in depth, drawing on the Komprise 2022 Unstructured Data Management Report and related insights to illuminate the challenges, priorities, and paths forward for modern data leadership.
Data Growth, Budgets, and Strategic Imperatives
The central takeaway from the Komprise 2022 Unstructured Data Management Report is the accelerating scale of data and the accompanying financial commitments that organizations are making to store, protect, and manage that data. More than half of surveyed enterprises report managing 5 petabytes (PB) or more of unstructured data, a striking increase relative to 2021, when the share achieving this scale was notably lower. This growth is not merely a matter of volume; it reflects increasing diversity in data types, sources, and use cases, all of which compound the complexity of storage, governance, and utilization. Unstructured data—comprising documents, emails, images, videos, sensor feeds, and other non-tabular data—remains inherently harder to catalog, classify, and derive actionable insights from than structured data, yet it often contains the most valuable signals for business decisions, risk assessment, and customer understanding. The rapid rise in data volume intensifies the need for scalable architectures, robust data lifecycle management, and disciplined data governance to prevent sprawl, reduce sprawl-related costs, and unlock analytics potential.
The budgetary picture accompanying this data growth is telling. A sizable majority of organizations—approximately 68%—report spending more than 30% of their IT budget on data storage, backups, and disaster recovery (DR). This allocation highlights several dynamics: the ongoing demand for durable, redundant storage to ensure resilience; the importance of maintaining data protection and business continuity; and the reality that data management is increasingly a core, ongoing expense rather than a one-off capital outlay. The high percentage of IT budgets devoted to storage and DR also implies pressure on organizations to optimize costs through smarter storage tiers, data deduplication, compression, long-term archival strategies, and the automation of data retention and deletion policies. It also underscores the need for cost-conscious architectures that do not sacrifice accessibility or recoverability.
In practical terms, these trends translate into a few critical imperatives for IT leadership. First, there is a clear driver to adopt scalable, interoperable storage architectures that can accommodate exponential growth without excessive cost inflation. This often means a mix of on-premises storage for high-value, frequently accessed data and cloud-based storage for bottomless capacity, long-term retention, and disaster recovery copies. Second, the data management strategy must address the full data lifecycle—acquisition, organization, usage, retention, and deletion—with explicit policies that balance regulatory compliance, security, and operational needs. Third, IT leaders must invest in automation and orchestration that can streamline data movement, ensure consistent data protection, and reduce manual overhead. Finally, the emphasis on unstructured data analytics signals a demand for tools and platforms that empower end users—analysts, data scientists, and business teams alike—to access, explore, and derive insights from unstructured data with minimal friction, thereby accelerating decision-making and time-to-insight.
The Komprise study surveyed enterprise IT directors, VPs, and C-level executives at companies with more than 1,000 employees across the United States and the United Kingdom, offering a lens into mid-market and large-enterprise practices that are shaping industry benchmarks. While the report highlights substantial progress in data management maturity, it also reveals persistent challenges that organizations must address to sustain growth and extract enduring value from their unstructured data estates. The breadth of responses underscores that the data management agenda is not a niche IT concern but a strategic pillar that intersects security, analytics, governance, and digital transformation initiatives across the organization.
Looking ahead, organizations are likely to pursue an integrated approach that combines smarter storage economics with data services that enable analytics at scale. This includes leveraging cloud-based file and object data in analytics workflows, enabling more direct access to data for analytics platforms, and reducing the latency and friction involved in moving data between storage, processing, and insights generation. The overarching objective is to transform petabytes of unstructured data from a stored asset into an active, governed, and highly usable resource that supports informed decision-making, risk mitigation, and competitive differentiation.
Key takeaways from this section include:
- Unstructured data volumes are growing quickly, with more than half of organizations now managing 5PB or more.
- A significant portion of IT budgets is dedicated to storage, backups, and DR, signaling tight cost constraints and the need for efficient data architectures.
- There is a broad move toward cloud-based storage options, driven by capacity, scalability, and resilience considerations.
- Investments in unstructured data analytics are rising, reflecting a demand for user empowerment and faster time-to-insight.
These points set the stage for a deeper exploration of how cloud storage is shaping data strategy and why analytics and data services in the cloud have become central to modern data management.
Cloud Storage Dominance and the Rise of Cloud Data Services
The Komprise findings highlight cloud storage as the dominant storage medium in today’s enterprise landscape. Nearly half of respondents—about 47%—indicate plans to invest in cloud-based network attached storage (NAS), while a closely related share—43%—are targeting cloud object storage. This tilt toward cloud storage is consistent with broader industry shifts toward scalable, cost-efficient, and globally accessible data repositories. Cloud NAS offers file-level access with the elastic scalability that organizations require for evolving workflows, collaboration across geographically dispersed teams, and seamless integration with enterprise software that expects centralized file availability. Cloud object storage, meanwhile, provides highly durable, cost-effective long-term storage for unstructured data, multimedia assets, and archived datasets. Taken together, these two cloud-based modalities provide a complementary foundation for managing vast, diverse data assets while enabling new usage patterns.
Beyond the storage layer, enterprises are increasingly looking to deliver data services in the cloud that go beyond preservation and basic access. A notable shift is from solely pursuing storage efficiency toward enabling data services that empower analytics and operational capabilities. Specifically, 65% of respondents indicated investments in unstructured data analytics with the aim of “putting more power in the hands of end users.” This emphasis on analytics signals a broader transformation: data is not merely being stored; it is being prepared, integrated, and served in ways that facilitate rapid analysis and decision-making. Cloud analytics tools and platforms can ingest diverse forms of unstructured data, provide search and discovery capabilities, support machine learning pipelines, and enable collaboration across business functions. The result is a more dynamic data ecosystem where end users—data professionals and business users alike—can access meaningful insights more quickly, driving competitive advantage and operational responsiveness.
Several implications emerge from this cloud-centric orientation. First, organizations must design storage and data access architectures that maximize interoperability between on-premises systems and cloud services. Data gravity—the concept that large data stores are costly to move—needs to be considered in choosing where to host data and how to integrate it with cloud analytics platforms. A robust data movement strategy, including automated tiering, lifecycle policies, and cross-cloud governance, becomes essential to balance latency, cost, and accessibility. Second, security and governance gain heightened importance in cloud contexts. As data moves to the cloud and becomes more widely accessible for analytics, organizations must implement stringent access controls, encryption, auditing, and data classification to protect sensitive data while enabling legitimate use. Third, cloud-native data services must be chosen and configured with an eye toward scalability, reliability, and cost-effectiveness. This involves evaluating service levels, compatibility with existing tools, and the ability to automate data workflows, metadata management, and policy enforcement at scale.
The push toward cloud-based analytics also invites consideration of the data preparation and curation steps required for effective analytics on unstructured data. Files, images, videos, emails, and other unstructured formats often require preprocessing—such as OCR for text in images, metadata extraction, or content tagging—to unlock their analytical value. Cloud platforms that offer integrated pipelines for data cleansing, enrichment, and feature extraction can reduce time-to-insight and lower the total cost of ownership by consolidating tools within a single ecosystem. Enterprises that master these capabilities can accelerate the deployment of analytics use cases across customer insights, fraud detection, supply chain optimization, and product development.
From a strategic standpoint, the cloud-centric model also supports experimentation and agility. Organizations can pilot new analytics workloads, scale them up or down based on demand, and iterate rapidly on data-driven initiatives without the friction of procuring and provisioning on-premises hardware. This operational flexibility aligns with broader digital transformation goals, enabling teams to respond to regulatory changes, evolving market conditions, and new business models with greater speed and resilience.
Yet the shift to cloud data services is not without challenges. Organizations must manage data sovereignty concerns, regulatory compliance, and potential vendor lock-in as cloud ecosystems expand. Performance considerations—such as network bandwidth, access latency, and data retrieval costs—also influence design choices and ongoing cost management. Consequently, governance frameworks that define data ownership, lineage, retention periods, and access policies become central to realizing the benefits of cloud storage and cloud-native analytics.
In sum, the cloud storage landscape—comprising cloud NAS, cloud object storage, and cloud-enabled data services for analytics—represents a core strategic pivot for contemporary enterprises. It enables scalable storage, broader data accessibility, and stronger analytics capabilities, while demanding careful attention to data governance, security, interoperability, and cost management. As organizations continue to scale their unstructured data assets, the cloud is likely to remain a central enabler of both storage efficiency and data-driven value creation.
Key points from this section include:
- Cloud NAS and cloud object storage are the leading cloud storage investments, signaling a diversified cloud strategy for file and object data.
- Investments in unstructured data analytics are rising, with a clear objective of enabling end users to access and act on insights more rapidly.
- The transition to cloud-based data services emphasizes analytics readiness, governance, and cross-functional usability, not just storage capacity.
- Effective cloud strategies require attention to data movement, interoperability, security, and cost control.
AI Scaling: Limits, Efficiency, and Automated Data Workflows
Enterprise AI initiatives are increasingly encountering practical limits as they scale. The Komprise findings reference power caps, rising token costs, and inference delays as factors reshaping how organizations deploy and operate AI models at scale. In this context, “AI scaling” refers not only to training ever larger models but also to the broader pipeline of using AI and machine learning across business processes, from data preparation to decision support. The combination of energy constraints, computational costs, and latency considerations prompts a rethinking of how to structure AI workloads, how to allocate compute resources, and how to optimize data flows to maximize value while containing expenses.
One of the notable responses to these pressures is a shift toward efficient inference architectures and optimized data workflows. Enterprises are exploring approaches such as model quantization, pruning, and distillation to reduce the computational load and energy consumption required for running AI in production. At the same time, there is interest in architectural patterns that balance throughput and latency, such as deploying inference services closer to data sources or integrating edge and cloud components to distribute workloads intelligently. These strategies aim to achieve tangible throughput gains and more predictable performance, which are critical for business applications that rely on real-time or near-real-time AI-driven insights.
A parallel trend involves optimizing the data ecosystem to support AI more effectively. The ability to initiate and execute automated data workflows across diverse use cases emerges as a central lever for AI enablement. Automated workflows can streamline data ingestion, cleansing, transformation, annotation, and routing to analytics or model deployment pipelines. By automating repetitive data preparation steps, organizations free up data scientists and analysts to focus on higher-value activities such as feature engineering, model evaluation, and experimentation. In practice, this means establishing end-to-end pipelines that integrate cloud storage, data catalogs, metadata management, and workflow orchestration tools to ensure data is discoverable, trustworthy, and readily usable by AI tools.
From a strategic perspective, aligning AI initiatives with unstructured data management is essential. Unstructured data often requires specialized processing to extract meaningful signals for AI tasks—text extraction from documents, image and video feature extraction, audio transcription, and more. The ability to automate these steps within a coherent workflow, and to do so in a scalable, governable manner, is a key differentiator for organizations seeking sustainable AI ROI. This alignment also supports governance objectives, including data privacy, retention, and deletion policies, ensuring that AI deployments do not inadvertently propagate sensitive information or retain data beyond its required scope.
The Komprise report suggests that IT leaders view unstructured data management as a strategic lever for both cost optimization and risk management. By moving data to the right place in the cloud and applying consistent, automated workflows, organizations can mitigate storage and compute costs while maintaining access to critical data for analytics and AI initiatives. In such a framework, cloud-based data services become not just a storage tier but an integral component of the AI infrastructure, enabling scalable pipelines for data preparation, feature extraction, and model deployment.
Practical implications for enterprises seeking to harness AI at scale include:
- Emphasizing data workflows and automation to reduce manual data preparation and accelerate AI projects.
- Adopting efficient inference techniques and hardware strategies to lower energy usage and operational costs.
- Designing cloud-augmented architectures that balance data locality with centralized processing to optimize latency and cost.
- Implementing comprehensive governance to manage sensitive data, support privacy compliance, and enable safe data deletion when appropriate.
- Integrating unstructured data analytics into AI workflows to unlock value from text, images, videos, and other non-tabular data types.
These considerations reflect a broader trend toward treating data management and AI as an integrated ecosystem. The most successful organizations will be the ones that harmonize cloud storage strategies, automated data workflows, secure governance, and efficient AI execution to deliver measurable business outcomes. As AI scales, the role of unstructured data management becomes even more central: it is the backbone that enables AI to operate on diverse data assets at scale, with governance and cost discipline that sustain long-term value.
Unstructured Data Management: Governance, Security, and Analytics as a Priority
A core finding of the Komprise report is that IT leaders are prioritizing unstructured data management as a strategic means to reduce costs, protect sensitive data, improve big data analytics, support data-centric M&A activities, and implement data deletion policies. This multi-faceted approach reflects a maturing view of unstructured data—not simply as a storage challenge but as a strategic asset that touches nearly every aspect of enterprise risk management, analytics capability, and governance.
Cost reduction remains a central driver. Unstructured data typically occupies the largest share of a company’s storage footprint and tends to be the most costly to store and protect, particularly when regulatory requirements demand multiple copies, long retention periods, or high availability. By applying disciplined data management practices—such as identifying data that can be archived, tiering data to cheaper storage tiers, and eliminating redundant or obsolete data—organizations can achieve meaningful savings. This is not solely about hardware or cloud expenditure; it also relates to the operational costs of data management, including the manpower required to maintain complex storage environments and to enforce policies across dispersed data sets. The report underscores that executives recognize unstructured data management as a lever to optimize total cost of ownership (TCO) and to redirect resources toward higher-value initiatives.
Security and protection of sensitive data are equally critical. Unstructured data often contains personally identifiable information (PII), regulated content, or trade secrets that require robust controls. The emphasis on data protection in the report signals that IT leaders are integrating data governance into the broader data strategy, ensuring that data is classified, access-controlled, encrypted, and auditable throughout its lifecycle. Effective governance means not only preventing data loss or exposure but also enabling responsible data usage for analytics and innovation. In practice, this requires comprehensive metadata management, data lineage tracking, and policy enforcement that spans on-premises environments, cloud repositories, and hybrid configurations.
Analytics readiness and optimization form the third pillar. The ability to extract value from unstructured data hinges on the availability of analytics-ready data and the tools to analyze it. IT leaders are seeking to improve big data analytics by providing more accessible access to diverse data types and by enabling end users to perform self-service analytics where appropriate. This often involves enriching unstructured data with structured metadata, building catalogs that facilitate discovery, and integrating analytics platforms with data storage in a way that minimizes data movement and latency. The ultimate objective is to unlock insights hidden in unstructured sources while maintaining governance, security, and efficiency.
Mergers and acquisitions (M&A) present a distinct use case for data segmentation and policy enforcement. In such scenarios, distinguishing and isolating relevant data segments, while ensuring appropriate retention and deletion policies, becomes essential for regulatory compliance and risk management. A well-governed unstructured data estate can streamline due diligence, support post-merger integration, and mitigate integration-related data risks. This facet of unstructured data management emphasizes that governance is not a constraint but a differentiator that enables faster, safer strategic moves.
The Komprise report’s scope—encompassing 300 enterprise IT leaders across the United States and the United Kingdom—provides a cross-section of perspectives at large organizations. The insights reflect a shared understanding that data management is a strategic function that intersects with IT operations, security, analytics, and corporate strategy. They also suggest a path forward: investments in cloud-based data management services, governance-enabled analytics pipelines, and automated workflows can deliver both cost efficiencies and enhanced data value across the enterprise.
Key takeaways from this section include:
- Unstructured data management is increasingly viewed as a strategic priority for cost optimization, data protection, and analytics capability.
- Governance and security are central to unlocking the value of unstructured data while mitigating risk.
- Analytics readiness and data cataloging are essential for enabling end-user access to meaningful insights.
- Data segmentation and policy enforcement support critical business activities like M&A and regulatory compliance.
Implications for Enterprises: Strategy, Governance, and the Path Forward
These evolving trends carry substantial implications for how enterprises design their data architectures, govern their data assets, and prioritize investments in storage, analytics, and AI. The convergence of growing data volumes, cloud-oriented storage strategies, and AI-enabled analytics creates a landscape in which data management is a strategic differentiator rather than a technical afterthought.
First, organizations must adopt a holistic data architecture that harmonizes on-premises and cloud environments. The cloud’s role as a dominant storage medium and as a platform for analytics necessitates interoperable data pipelines, standardized metadata, and unified governance policies. A well-orchestrated hybrid approach can balance latency, cost, and control, enabling rapid data access for analytics while preserving the ability to store large data assets cost-effectively in the cloud. This requires investments in data catalogs, metadata management, data lineage, and automation that can operate consistently across multi-cloud and hybrid contexts.
Second, there is a clear emphasis on empowering users with secure, governed access to unstructured data for analytics. Organizations should focus on feature-rich analytics platforms, search capabilities, and self-service data exploration that respect governance constraints. The goal is to democratize data insights while maintaining accountability and traceability. Achieving this balance demands robust access controls, auditing, and transparent data lineage that makes it possible to explain how data is produced, transformed, and used in insights and decisions.
Third, data privacy and compliance must be integral to any data strategy. As unstructured data grows in scope and is moved across storage platforms, there is an increased risk of mishandling sensitive information. Enterprises should implement explicit retention schedules, deletion policies, and data minimization practices to prevent unnecessary data retention while preserving the ability to meet regulatory obligations. This is particularly important for industries with stringent data protection requirements, where the consequences of data breaches or policy violations can be severe.
Fourth, the analytics and AI components of the strategy require careful alignment with data management capabilities. Businesses that want to realize ROI from AI investments must ensure that data is accessible, clean, well-tagged, and readily usable by AI pipelines. Automated data workflows can streamline the preprocessing, feature extraction, and data routing steps that feed analytics and ML models. By integrating data preparation with storage and governance, organizations can shorten the cycle from data acquisition to insight and model deployment, while maintaining cost discipline and risk controls.
Fifth, the report’s sample of 300 enterprise leaders from the US and UK emphasizes the regional dimension of data strategy. While these findings reflect a cross-section of large organizations, regional regulatory differences, workforce skills, and cloud adoption maturity will influence how individual enterprises implement the lessons. In practice, this means tailoring data architectures to local regulatory environments, investing in upskilling for data teams, and building a phased roadmap that scales from pilot projects to enterprise-wide implementation.
Sixth, leadership and organizational alignment are critical. Data strategy cannot be siloed within IT; it must be integrated with business units, security teams, legal, and executive leadership. Clear accountability for data stewardship, governance, and policy enforcement helps ensure that data initiatives deliver measurable business value, reduce risk, and bolster trust in analytics outcomes. Organizations that foster cross-functional collaboration around data governance, analytics, and AI are more likely to realize sustained benefits as their data estates grow.
In sum, the strategic implications of Komprise’s findings point toward a data-management-centric path to digital maturity. Enterprises that invest in cloud-supported data services, automated data workflows, robust governance, and analytics-ready data architectures will be better positioned to control costs, safeguard data, accelerate insights, and compete in data-driven markets. The synthesis of storage efficiency, cloud-driven analytics, AI scalability, and governance discipline forms a cohesive blueprint for navigating the complexities of unstructured data in the modern enterprise.
Conclusion-level synthesis points:
- Embrace an integrated cloud-on-premises data architecture to balance scalability, cost, and control.
- Prioritize automated data workflows to unlock AI and analytics value while reducing manual, error-prone processes.
- Invest in governance, security, and data cataloging to enable safe, scalable access to unstructured data.
- Align AI and analytics initiatives with a disciplined data management program to maximize ROI and minimize risk.
The Road Ahead: Future Trends and Strategic Considerations
As enterprises continue to contend with expanding unstructured data volumes, evolving storage paradigms, and the expanding role of AI in business operations, several forward-looking trends are likely to shape the next era of data management. First, the convergence of data governance and AI governance will become more pronounced. As AI systems increasingly rely on diverse data sources, including unstructured data, organizations will need integrated policies that address both data privacy and model governance. This includes tracking data lineages, ensuring explainability where possible, and defining limits on what data can be used for training and inference. Effective governance frameworks will be central to maintaining trust in AI systems and preserving regulatory compliance.
Second, there will be greater emphasis on data value maximization through data partnerships, data marketplaces, and shared analytics ecosystems. As cloud-native data services mature, organizations may explore curated datasets, standardized schemas for unstructured data, and interoperable APIs that facilitate data exchange across platforms and partnerships. By enabling secure data sharing while maintaining governance, companies can unlock new revenue streams, accelerate collaboration with external partners, and drive cross-organizational insights that would be difficult to achieve in isolation.
Third, energy efficiency and sustainability will increasingly influence AI and data infrastructure decisions. With power caps and cost constraints cited as AI scaling factors, enterprises will seek greener options for training, inference, and data processing. This could involve adopting more energy-efficient hardware, optimizing software stacks for power efficiency, and selecting cloud regions that balance proximity, performance, and environmental considerations. Sustainable AI practices will be part of why and how organizations deploy models at scale, aligning profitability with responsible use of resources.
Fourth, the ongoing evolution of cloud storage and data services will yield richer capabilities for unstructured data management. Advances in metadata extraction, semantic tagging, and intelligent data categorization will improve searchability and analytics readiness. More sophisticated data catalogs and discovery tools will make it easier for end users to locate relevant unstructured data, while automated metadata generation will reduce the burden on data stewards and accelerate data preparation for analytics.
Fifth, talent and skill development will be a differentiator. The success of data strategies hinges on the capabilities of the people who design, implement, and operate data systems. Organizations will invest in training and recruitment to build teams proficient in data engineering, data governance, data science, and security. A culture of data literacy across the organization will help ensure that analytics and AI initiatives deliver tangible business value and are embraced across departments.
As these trends unfold, the central role of unstructured data management will become even more pronounced. The ability to store, govern, and analyze large volumes of unstructured data in a secure and cost-effective manner will be a defining factor in enterprise competitiveness. The Komprise findings, while focused on a particular year and a specific set of respondents, point to broader trajectories that are likely to shape the strategic decisions of CIOs, CTOs, and data leaders for years to come.
Conclusion
The Komprise 2022 Unstructured Data Management Report underscores a pivotal moment for enterprises navigating the surge of unstructured data. Data volumes at scale, substantial portions of IT budgets directed toward storage and DR, and cloud storage as the dominant storage paradigm collectively define a landscape in which data management is a strategic imperative. Investments in cloud-based storage options, combined with a clear push toward analytics-ready unstructured data and automated data workflows, signal a movement from storage optimization to value-driven data services. The focus on AI scaling and the need for efficient, governance-enabled data ecosystems highlight the essential link between data architecture, analytics capability, and responsible AI deployment.
For organizations seeking to remain competitive, the message is clear: build a cohesive strategy that integrates scalable cloud storage, automated data workflows, robust governance, and accessible analytics. The future belongs to those who treat unstructured data not as a problem to be managed, but as a strategic asset to be leveraged—through disciplined governance, scalable data services, and AI-enabled insights that empower business decision-makers. By aligning technology choices with the broader objectives of resilience, innovation, and sustainability, enterprises can unlock the full potential of their unstructured data estates and transform petabytes of data into durable business value.