Data: The Intangible Asset of the AI Age
Databases and data collections have always been valuable, but the AI revolution has elevated data from a supporting operational resource to a primary strategic asset. Under IFRS 3, databases are classified as technology-based intangible assets — recognisable separately from goodwill when they are identifiable, separable, and have measurable fair value.
The category encompasses structured databases (customer records, transaction histories, product catalogues), unstructured data collections (documents, images, sensor readings), and curated datasets (training data, benchmark data, reference data). In data-intensive acquisitions — adtech, healthtech, fintech, and increasingly any AI-related business — databases can represent the most valuable identifiable intangible asset.
$274B
global big data and analytics market (2024)
Cost / Income
primary valuation approaches
3-10 yrs
typical useful life range for databases
When Databases Are Identifiable
Databases satisfy the identifiability criteria through:
Separability: Databases can be sold, licensed, or transferred independently. The existence of data marketplaces, data-as-a-service businesses, and data licensing agreements demonstrates clear separability.
Contractual rights: Database protection laws (the EU Database Directive, for instance) grant specific rights to the creator of a database based on the investment in obtaining, verifying, and presenting the data.
However, not all data is a recognisable asset. Internal operational data that is not separately exploitable — generic transaction logs, routine system records — may not meet the identifiability threshold.
★ Key Takeaway
A database is a recognisable intangible asset when it has been systematically compiled, curated, and maintained, and when it can be separated from the business (sold, licensed, or transferred). Raw, uncurated data that has not been organised into a usable collection is unlikely to qualify.
Data Quality and Value Drivers
The value of a database is directly proportional to its quality. A comprehensive quality assessment is essential before valuation:
| Quality Dimension |
Description |
Impact on Value |
| Accuracy |
Data correctly represents reality |
Fundamental — inaccurate data has negative value |
| Completeness |
Coverage of the relevant domain |
Higher completeness = broader applicability |
| Recency |
How current the data is |
Critical for time-sensitive domains |
| Uniqueness |
Data not available from other sources |
Scarcity premium for unique datasets |
| Volume |
Size of the dataset |
Matters for AI training; less so for reference data |
| Structure |
Organisation, schema quality, metadata |
Affects usability and integration cost |
| Permissions |
Legal right to use, sell, and licence |
GDPR compliance essential for personal data |
Valuation Approaches
Cost Approach
The cost approach estimates the investment required to recreate an equivalent database:
- Data acquisition costs — purchasing, collecting, or generating the raw data
- Data processing — cleaning, normalising, deduplicating, and structuring
- Verification and quality assurance — validating accuracy and completeness
- Infrastructure — database design, hosting, and management systems
- Curation and maintenance — ongoing updates and quality maintenance
✔ Example
A healthcare analytics company is acquired with a clinical outcomes database containing 15 million patient records collected over 8 years through partnerships with 200 hospitals. The cost to recreate this database — establishing equivalent partnerships, negotiating data sharing agreements, ingesting and normalising the data — is estimated at £35 million over 5-6 years. The replacement cost approach values the database at approximately £35 million, with an additional time-value adjustment for the competitive advantage of having the data available today rather than in 5 years.
Income Approach
Where the database generates direct revenue — through data licensing, analytics products, or advertising targeting — the income approach captures the cash flows:
Identify revenue streams
Map all revenue generated by or dependent on the database: data licensing fees, analytics product subscriptions, advertising revenue enabled by targeting data, and insights products.
Attribute revenue to the data
Separate the database's contribution from the analytics software, the sales team, and the brand. The database is one contributing asset among several.
Project data-attributed revenue
Forecast the revenue stream over the useful life of the database, accounting for data decay, competitive entry, and market growth.
Deduct costs and discount
Subtract data maintenance costs, contributory asset charges, and taxes. Discount at a data-appropriate risk rate.
Market Approach
Where comparable data transactions exist, market pricing provides direct evidence:
- Data licensing benchmarks — price per record, per query, or per seat for comparable datasets
- Data acquisition transactions — prices paid for similar databases in M&A transactions
- Data marketplace pricing — prices observed on data exchanges for comparable data types
Revenue-Generating Databases
- Data licensing, analytics products, adtech targeting
- Valued using income approach
- Value reflects market demand and uniqueness
- Often the primary intangible in data businesses
Internal-Use Databases
- CRM data, operational data, internal analytics
- Valued using cost approach
- Value reflects compilation investment
- May overlap with customer list valuation
The AI Training Data Dimension
The explosion in AI and machine learning has created a new and significant value dimension for databases: their utility as training data. High-quality, labelled datasets are essential inputs for building AI models, and their scarcity in specialised domains commands substantial premiums.
Databases valuable for AI training include:
- Labelled image datasets — medical imaging, autonomous driving, industrial inspection
- Text corpora — legal documents, financial reports, scientific papers
- Behavioural data — user interactions, recommendation engine training sets
- Sensor data — IoT readings, manufacturing process data, environmental monitoring
⚠ Warning
The use of personal data for AI training is increasingly regulated. GDPR and equivalent regulations require explicit consent for new processing purposes. A database collected for one purpose (customer service) may not be legally usable for AI training without additional consent. Valuation of AI training data must account for the legal permissions embedded in the data collection agreements.
Useful Life
Database useful lives depend on data decay rates and the domain:
- Financial market data: 1-3 years (rapidly changing)
- Customer and behavioural data: 2-5 years (moderate decay)
- Healthcare and clinical data: 5-10 years (slower decay for structured clinical outcomes)
- Geographic and mapping data: 3-7 years (requires continuous updating)
- Historical reference data: 10+ years (historical data retains value if properly maintained)
- AI training data: 3-7 years (model architectures and requirements evolve)
Data as Competitive Moat
In the AI age, proprietary data is arguably the most defensible competitive advantage a company can build. Algorithms can be replicated; data cannot. A company with a decade of proprietary customer behaviour data, clinical outcomes, or industrial sensor readings has an asset that no competitor can quickly recreate. For founders, systematically building and maintaining proprietary datasets is an intangible asset investment with compounding returns.
Databases and data collections are one of ten technology-based intangible assets under IFRS 3. For the full taxonomy, see 35 types of intangible assets. For more on data as a balance sheet asset, read data on the balance sheet.
Ivan Gowan is the Founder and CEO of Opagio. He brings 25 years of experience building and scaling technology platforms in financial services. Meet the team.