Overview and details
NewsCodes Q & A
We try to explain the NewsCodes in detail by Questions and Answers - see also the NewsCodes Glossary in the column to the right to better understand some specific terms.
What does NewsCodes mean?
A NewsCode is a single code representing a concept which is used to categorize news content. Many of these codes can make a set for a specific use, such a set is branded as NewsCodes by the IPTC, more generic terms are controlled vocabulary, taxonomy, or list of values.
What can NewsCodes be used for?
For the news industry - but also far beyond - it is a strict requirement being able to assert something about the content of a news item, to apply so called metadata. This could be achieved either by free-text human language (e.g. by a headline or a caption) or by codes, NewsCodes. Codes have the advantage that they can be easily shared among users and as each code requires an explicit and comprehensive definition not only the codes but also their semantics can be shared among users. Further NewsCodes are language agnostic, thus the code is the same for describing content in different languages, only the definition of the code should be translated to help with understanding its semantics.
Why so many different sets of NewsCodes?
As the world is only one single big object all codes could be put into a single vocabulary. But such a vocabulary can hardly be managed. For that reason the IPTC decided to split concepts of the same type into a specific controlled vocabulary. Each vocabulary has a description of its scope.
Can NewsCodes be used free of charge?
Yes, they can. Any NewsCode provided by the IPTC can be used at any stage of a news workflow without any royalty fee. But if one includes IPTC NewsCodes into an application the intellectual property and the copyright of the IPTC must be explicitly included.
How to view and retrieve NewsCodes?
NewsCodes are available in different formats:
- in a human readable format as web page. Go to the View NewsCodes page, select one vocabulary and its members will be shown.
- in machine readable formats: either as static file in the NewsML 1 TopicSet format or as dynamic downloads from the IPTC CV server as NewsML-G2 Knowledge Item, or as RDF/XML with SKOS, or as RDF/Turtle with SKOS. More about how to select these formats can be found on a special web page.
How are NewsCodes maintained?
The sets of NewsCodes are updated from time to time. Only IPTC members are eligible to formally propose a new NewsCode as each code has to be approved by the IPTC membership. But any user is invited to propose a code on the NewsCodes Yahoo group.
After the approval of a new NewsCode it will be published on this IPTC web site.
This is a short glossary of terms which are often used in the scope of NewsCodes.
A character sequence which forms a member of a controlled vocabulary. Each code represents a concept.
Anything that one may wish to refer to, e.g. Diplomacy, Paris, the Euro, OECD, the Japanese language, the IMF, Oil, Madonna, Olympic Games. Thus concept here has a broader meaning than is usual. This is because we are dealing with the idea of Paris, rather than with Paris itself, the idea of Oil, rather than Oil itself, and so on. Concepts fall in two broad categories: named entity and generic (or abstract) concepts. A concept may be represented by one or more codes.
A set of code(s), managed by some authority (e.g. a person or an organisation), employing some mechanism (e.g. an XML Schema, a Web page, an RFC, or IPTC G2 KnowledgeItem) to maintain this set. Each code in a controlled vocabulary represents a concept.
Generic (or abstract) concept:
Any concept which does not represent a named entity but a generic topic like e.g. Diplomacy, Art, Science, Country Music, Forest, or Global Warming.
Globally Unique Identifier (GUID)
An identifier that is unique, unambiguous, and persistent. Being unique and unambiguous means that there is a 1:1 relationship between the identifier and the identified object. Being persistent means that the identifier never changes as time passes, and that it is never reused as an identifier for another concept even if the original concept disappears.
Knowledge Item (G2 Knowledge Item):
This is an XML format of the IPTC to exchange one or more controlled vocabularies. The outer wrapper is a knowledgeItem instance and it delivers a set of concept elements in a conceptSet as inner wrapper. Each concept delivers the NewsCode by the qcode attribute of the conceptId element, the concept's name and definition is delivered by the correspondingly named elements.
Data which asserts something about some other data.
A named entity may be a person, place, event, organization, product name, object name or any other news-related real life entity.
A special IPTC format to express the code of a concept which was introduced with the family of G2-Standards. Typical for the format is having a string, then a colon, and finally another string. As the G2-Standards require to have potentially long strings as globally unique identifiers the major goal of QCodes are to shorten them and to make the controlled vocabulary visible this code pertains to. The format of a QCode is in short: "short name for the controlled vocabulary":"code of the concept" like e.g. subj:06011000
In a broad sense, taxonomy is the science of classification, but is often taken to mean a particular classification. In the context of the NewsCodes, a taxonomy is a collection of concept(s), with associated code(s). A taxonomy may support typed relationships between concepts. Such a taxonomy is sometimes known as an ontology or thesaurus.
Topicset (NewsML 1.x Topicset):
This is an XML format of the IPTC to exchange controlled vocabularies. The outer wrapper is a NewsML 1.x instance and it delivers a set of Topic elements in a TopicSet as inner wrapper. Each Topic delivers the NewsCodes by the FormalName element and further the name of the concept, its definition as "Explanation", and some more administrative attributes.
Type (of a concept):
A concept type allows the logical grouping of all similar concept(s), regardless of the vocabulary the concepts belong to. Examples of concept type might be: Person, Organisation, Language, Business Sector, News Subject or Geography. A concept type is itself a concept and, as such, is represented by a code in a scheme.
A set of codes. Can be either controlled (see Controlled Vocabulary) or uncontrolled, that means terms are added and deleted at random.
NewsCodes is ...... the brand name for IPTC's controlled vocabularies/taxonomies.
Go and see the group ofDescriptive NewsCodes
NewsML 1 NewsCodes
Photo Metadata NewsCodes
A controlled vocabulary
... or taxonomy or scheme is a set of terms to express a facet of news content. Facets could be e.g. the subject, the genre, the urgency etc. A controlled vocabulary could be a flat list of terms or a hierarchical structure. In the context of the G2-Standards a vocabulary is called a 'scheme'.