Content Classification

What is Content Classification?

Content Classification is an SDK feature that analyzes the entire query file to determine whether its segments contain MUSIC, SPEECH, or SILENCE. It works by creating a classification fingerprint - similar to an audio or melody fingerprint - that enables the system to label each detected segment and provide a confidence score. This feature can be run on its own or combined with audio/melody fingerprint searches in a single request.

How is Content Classification different from audio or melody/phonetic matching?

Audio and melody/phonetic matching are designed to identify a piece of media by comparing its fingerprint to Pex’s database of known assets. This results in matches tied to specific tracks or works.

Content Classification, by contrast, does not attempt to identify an asset. Instead, it analyzes the audio content itself to label what kind of sound is present - even if that content is not in Pex’s database. This makes classification useful for understanding the nature of the audio regardless of whether a commercial match is found.

For example, if classification shows a MUSIC segment but there’s no identification match, that music could be non-commercial or from a sound library.

What categories does Content Classification return?

Currently, Content Classification returns one or more of the following top-level categories:

MUSIC – Any type of music, instrumental or with vocals.
SPEECH – Spoken voice segments.
SILENCE – Periods with no significant audio.

Each category may include additional subclasses that provide more detail (e.g., instrumental, vocal style, or genre indicators).

What does a typical classification response look like?

A Content Classification response includes each detected category, its time range in seconds, a confidence score (0–100), and any subcategories with their own confidence values. Example:

"content_classification": {
  "silence": [
    {
      "start": 230,
      "end": 240,
      "confidence": 100.0,
      "subclasses": []
    }
  ],
  "music": [
    {
      "start": 2,
      "end": 230,
      "confidence": 98.4,
      "subclasses": [
        { "name": "singing", "confidence": 82.1 },
        { "name": "happy music", "confidence": 37.2 },
        { "name": "pop music", "confidence": 35.2 }
      ]
    }
  ],
  "speech": []
}

Can I combine Content Classification with other Pex Search types?

Yes. You can run Content Classification independently or alongside audio, melody, video, or phonetic fingerprint searches in a single request. This lets you both classify and identify content without making multiple API calls.

How should I use confidence scores?

Confidence scores range from 0 to 100, with higher values indicating stronger certainty that the segment matches the given category or subcategory. Many applications choose a threshold (e.g., ≥80) for deciding whether to treat a classification as definitive.

PreviousIdentification/Matching NextErrors

Last updated 3 months ago