Databases and Data Collections: Valuation Methods

Databases and Data Collections: Valuation Methods

Data: The Intangible Asset of the AI Age

Databases and data collections have always been valuable, but the AI revolution has elevated data from a supporting operational resource to a primary strategic asset. Under IFRS 3, databases are classified as technology-based intangible assets — recognisable separately from goodwill when they are identifiable, separable, and have measurable fair value.

The category encompasses structured databases (customer records, transaction histories, product catalogues), unstructured data collections (documents, images, sensor readings), and curated datasets (training data, benchmark data, reference data). In data-intensive acquisitions — adtech, healthtech, fintech, and increasingly any AI-related business — databases can represent the most valuable identifiable intangible asset.

$274B global big data and analytics market (2024)
Cost / Income primary valuation approaches
3-10 yrs typical useful life range for databases

When Databases Are Identifiable

Databases satisfy the identifiability criteria through:

Separability: Databases can be sold, licensed, or transferred independently. The existence of data marketplaces, data-as-a-service businesses, and data licensing agreements demonstrates clear separability.

Contractual rights: Database protection laws (the EU Database Directive, for instance) grant specific rights to the creator of a database based on the investment in obtaining, verifying, and presenting the data.

However, not all data is a recognisable asset. Internal operational data that is not separately exploitable — generic transaction logs, routine system records — may not meet the identifiability threshold.

★ Key Takeaway

A database is a recognisable intangible asset when it has been systematically compiled, curated, and maintained, and when it can be separated from the business (sold, licensed, or transferred). Raw, uncurated data that has not been organised into a usable collection is unlikely to qualify.

Data Quality and Value Drivers

The value of a database is directly proportional to its quality. A comprehensive quality assessment is essential before valuation:

Quality Dimension Description Impact on Value
Accuracy Data correctly represents reality Fundamental — inaccurate data has negative value
Completeness Coverage of the relevant domain Higher completeness = broader applicability
Recency How current the data is Critical for time-sensitive domains
Uniqueness Data not available from other sources Scarcity premium for unique datasets
Volume Size of the dataset Matters for AI training; less so for reference data
Structure Organisation, schema quality, metadata Affects usability and integration cost
Permissions Legal right to use, sell, and licence GDPR compliance essential for personal data

Valuation Approaches

Cost Approach

The cost approach estimates the investment required to recreate an equivalent database:

  • Data acquisition costs — purchasing, collecting, or generating the raw data
  • Data processing — cleaning, normalising, deduplicating, and structuring
  • Verification and quality assurance — validating accuracy and completeness
  • Infrastructure — database design, hosting, and management systems
  • Curation and maintenance — ongoing updates and quality maintenance
✔ Example

A healthcare analytics company is acquired with a clinical outcomes database containing 15 million patient records collected over 8 years through partnerships with 200 hospitals. The cost to recreate this database — establishing equivalent partnerships, negotiating data sharing agreements, ingesting and normalising the data — is estimated at £35 million over 5-6 years. The replacement cost approach values the database at approximately £35 million, with an additional time-value adjustment for the competitive advantage of having the data available today rather than in 5 years.

Income Approach

Where the database generates direct revenue — through data licensing, analytics products, or advertising targeting — the income approach captures the cash flows:

Identify revenue streams

Map all revenue generated by or dependent on the database: data licensing fees, analytics product subscriptions, advertising revenue enabled by targeting data, and insights products.

Attribute revenue to the data

Separate the database's contribution from the analytics software, the sales team, and the brand. The database is one contributing asset among several.

Project data-attributed revenue

Forecast the revenue stream over the useful life of the database, accounting for data decay, competitive entry, and market growth.

Deduct costs and discount

Subtract data maintenance costs, contributory asset charges, and taxes. Discount at a data-appropriate risk rate.

Market Approach

Where comparable data transactions exist, market pricing provides direct evidence:

  • Data licensing benchmarks — price per record, per query, or per seat for comparable datasets
  • Data acquisition transactions — prices paid for similar databases in M&A transactions
  • Data marketplace pricing — prices observed on data exchanges for comparable data types

Revenue-Generating Databases

  • Data licensing, analytics products, adtech targeting
  • Valued using income approach
  • Value reflects market demand and uniqueness
  • Often the primary intangible in data businesses

Internal-Use Databases

  • CRM data, operational data, internal analytics
  • Valued using cost approach
  • Value reflects compilation investment
  • May overlap with customer list valuation

The AI Training Data Dimension

The explosion in AI and machine learning has created a new and significant value dimension for databases: their utility as training data. High-quality, labelled datasets are essential inputs for building AI models, and their scarcity in specialised domains commands substantial premiums.

Databases valuable for AI training include:

  • Labelled image datasets — medical imaging, autonomous driving, industrial inspection
  • Text corpora — legal documents, financial reports, scientific papers
  • Behavioural data — user interactions, recommendation engine training sets
  • Sensor data — IoT readings, manufacturing process data, environmental monitoring
⚠ Warning

The use of personal data for AI training is increasingly regulated. GDPR and equivalent regulations require explicit consent for new processing purposes. A database collected for one purpose (customer service) may not be legally usable for AI training without additional consent. Valuation of AI training data must account for the legal permissions embedded in the data collection agreements.

Useful Life

Database useful lives depend on data decay rates and the domain:

  • Financial market data: 1-3 years (rapidly changing)
  • Customer and behavioural data: 2-5 years (moderate decay)
  • Healthcare and clinical data: 5-10 years (slower decay for structured clinical outcomes)
  • Geographic and mapping data: 3-7 years (requires continuous updating)
  • Historical reference data: 10+ years (historical data retains value if properly maintained)
  • AI training data: 3-7 years (model architectures and requirements evolve)

Data as Competitive Moat

In the AI age, proprietary data is arguably the most defensible competitive advantage a company can build. Algorithms can be replicated; data cannot. A company with a decade of proprietary customer behaviour data, clinical outcomes, or industrial sensor readings has an asset that no competitor can quickly recreate. For founders, systematically building and maintaining proprietary datasets is an intangible asset investment with compounding returns.


Databases and data collections are one of ten technology-based intangible assets under IFRS 3. For the full taxonomy, see 35 types of intangible assets. For more on data as a balance sheet asset, read data on the balance sheet.


Ivan Gowan is the Founder and CEO of Opagio. He brings 25 years of experience building and scaling technology platforms in financial services. Meet the team.

Share:

Ivan Gowan

Ivan Gowan — CEO, Co-Founder

25 years as tech entrepreneur, exited Angel

Connect on LinkedIn →

Try it yourself — Valuator

Estimate the value of your intangible assets using industry-standard methods like Relief from Royalty, MPEEM, and With & Without.

Open Valuator →

Related Articles

Computer Software: Capitalisation and Valuation Guide
intangible assets 2026-04-28 · Ivan Gowan

Computer Software: Capitalisation and Valuation Guide

A practical guide to capitalising and valuing computer software as a technology-based intangible asset. Covers IAS 38 capitalisation criteria, IFRS 3 fair value measurement, and the RFR and cost approaches for software valuation.

Read more →

Subscribe to our newsletter

Get the latest insights on intangible asset growth and productivity delivered to your inbox.

Want to learn more about your intangible assets?

Book a free consultation to see how Opagio Intangibles can help your business.