Databases & Data Collections: Valuation Methods

Databases and Data Collections: Valuation Methods

Data: The Intangible Asset of the AI Age

Databases and data collections have always been valuable, but the AI revolution has elevated data from a supporting operational resource to a primary strategic asset. Under IFRS 3, databases are classified as technology-based intangible assets — recognisable separately from goodwill when they are identifiable, separable, and have measurable fair value.

The category encompasses structured databases (customer records, transaction histories, product catalogues), unstructured data collections (documents, images, sensor readings), and curated datasets (training data, benchmark data, reference data). In data-intensive acquisitions — adtech, healthtech, fintech, and increasingly any AI-related business — databases can represent the most valuable identifiable intangible asset.

$274B global big data and analytics market (2024)

Cost / Income primary valuation approaches

3-10 yrs typical useful life range for databases

When Databases Are Identifiable

Databases satisfy the identifiability criteria through:

Separability: Databases can be sold, licensed, or transferred independently. The existence of data marketplaces, data-as-a-service businesses, and data licensing agreements demonstrates clear separability.

Contractual rights: Database protection laws (the EU Database Directive, for instance) grant specific rights to the creator of a database based on the investment in obtaining, verifying, and presenting the data.

However, not all data is a recognisable asset. Internal operational data that is not separately exploitable — generic transaction logs, routine system records — may not meet the identifiability threshold.

★ Key Takeaway

A database is a recognisable intangible asset when it has been systematically compiled, curated, and maintained, and when it can be separated from the business (sold, licensed, or transferred). Raw, uncurated data that has not been organised into a usable collection is unlikely to qualify.

Data Quality and Value Drivers

The value of a database is directly proportional to its quality. A comprehensive quality assessment is essential before valuation:

Quality Dimension	Description	Impact on Value
Accuracy	Data correctly represents reality	Fundamental — inaccurate data has negative value
Completeness	Coverage of the relevant domain	Higher completeness = broader applicability
Recency	How current the data is	Critical for time-sensitive domains
Uniqueness	Data not available from other sources	Scarcity premium for unique datasets
Volume	Size of the dataset	Matters for AI training; less so for reference data
Structure	Organisation, schema quality, metadata	Affects usability and integration cost
Permissions	Legal right to use, sell, and licence	GDPR compliance essential for personal data

Valuation Approaches

Cost Approach

The cost approach estimates the investment required to recreate an equivalent database:

Data acquisition costs — purchasing, collecting, or generating the raw data
Data processing — cleaning, normalising, deduplicating, and structuring
Verification and quality assurance — validating accuracy and completeness
Infrastructure — database design, hosting, and management systems
Curation and maintenance — ongoing updates and quality maintenance

✔ Example

A healthcare analytics company is acquired with a clinical outcomes database containing 15 million patient records collected over 8 years through partnerships with 200 hospitals. The cost to recreate this database — establishing equivalent partnerships, negotiating data sharing agreements, ingesting and normalising the data — is estimated at £35 million over 5-6 years. The replacement cost approach values the database at approximately £35 million, with an additional time-value adjustment for the competitive advantage of having the data available today rather than in 5 years.

Income Approach

Where the database generates direct revenue — through data licensing, analytics products, or advertising targeting — the income approach captures the cash flows:

Identify revenue streams

Map all revenue generated by or dependent on the database: data licensing fees, analytics product subscriptions, advertising revenue enabled by targeting data, and insights products.

Attribute revenue to the data

Separate the database's contribution from the analytics software, the sales team, and the brand. The database is one contributing asset among several.

Project data-attributed revenue

Forecast the revenue stream over the useful life of the database, accounting for data decay, competitive entry, and market growth.

Deduct costs and discount

Subtract data maintenance costs, contributory asset charges, and taxes. Discount at a data-appropriate risk rate.

Market Approach

Where comparable data transactions exist, market pricing provides direct evidence:

Data licensing benchmarks — price per record, per query, or per seat for comparable datasets
Data acquisition transactions — prices paid for similar databases in M&A transactions
Data marketplace pricing — prices observed on data exchanges for comparable data types

Revenue-Generating Databases

Data licensing, analytics products, adtech targeting
Valued using income approach
Value reflects market demand and uniqueness
Often the primary intangible in data businesses

Internal-Use Databases

CRM data, operational data, internal analytics
Valued using cost approach
Value reflects compilation investment
May overlap with customer list valuation

The AI Training Data Dimension

The explosion in AI and machine learning has created a new and significant value dimension for databases: their utility as training data. High-quality, labelled datasets are essential inputs for building AI models, and their scarcity in specialised domains commands substantial premiums.

Databases valuable for AI training include:

Labelled image datasets — medical imaging, autonomous driving, industrial inspection
Text corpora — legal documents, financial reports, scientific papers
Behavioural data — user interactions, recommendation engine training sets
Sensor data — IoT readings, manufacturing process data, environmental monitoring

⚠ Warning

The use of personal data for AI training is increasingly regulated. GDPR and equivalent regulations require explicit consent for new processing purposes. A database collected for one purpose (customer service) may not be legally usable for AI training without additional consent. Valuation of AI training data must account for the legal permissions embedded in the data collection agreements.

Useful Life

Database useful lives depend on data decay rates and the domain:

Financial market data: 1-3 years (rapidly changing)
Customer and behavioural data: 2-5 years (moderate decay)
Healthcare and clinical data: 5-10 years (slower decay for structured clinical outcomes)
Geographic and mapping data: 3-7 years (requires continuous updating)
Historical reference data: 10+ years (historical data retains value if properly maintained)
AI training data: 3-7 years (model architectures and requirements evolve)

Data as Competitive Moat

In the AI age, proprietary data is arguably the most defensible competitive advantage a company can build. Algorithms can be replicated; data cannot. A company with a decade of proprietary customer behaviour data, clinical outcomes, or industrial sensor readings has an asset that no competitor can quickly recreate. For founders, systematically building and maintaining proprietary datasets is an intangible asset investment with compounding returns.

Databases and data collections are one of ten technology-based intangible assets under IFRS 3. For the full taxonomy, see 35 types of intangible assets. For more on data as a balance sheet asset, read data on the balance sheet.

Ivan Gowan is the Founder and CEO of Opagio. He brings 25 years of experience building and scaling technology platforms in financial services. Meet the team.

Share: