Skip to content

Import newest zbmath data when doing the other import #200

@LizzAlice

Description

@LizzAlice

zbMath Import Documentation

This describes the properties imported from zbMATH into the MaRDI Portal Wikibase and how the data is processed.


Entity Types

The importer creates three entity types:

  • Publications (scholarly articles) — the main entity
  • Authors (humans) — linked from publications
  • Journals (scientific journals) — linked from publications

Publication Properties

Property ID Source & Processing
label / From title.title; falls back to Crossref DOI lookup if license-restricted
description / set to "scientific article; zbMATH DE number YOUR_DENUMBER"
Instance of P31 Set to scholarly article (Q56887)
Title P159 From title.title; falls back to Crossref DOI lookup if license-restricted
zbMATH Document ID P225 The identifier field (zbl_id)
DOI P27 From links array where type='doi', converted to uppercase
Author P16 Links to author entities from contributors.authors
Published in P200 Links to journal entity from series.title
Publication date P28 Formatted as +YYYY-00-00T00:00:00Z from year field
Full work available at URL P205 HTTP/HTTPS links from links array
Review text P1448 From editorial_contributions where contribution_type='review'
Reviewed by P1447 Links to reviewer's author entity
MSC classification P226 Codes from msc[].code array
zbMATH DE Number P1451 The id field; used for duplicate detection
zbMATH Keywords P1450 From keywords array
arXiv ID P21 Extracted from URLs containing arxiv.org/abs/
MaRDI profile type P1460 Set to "Publication" item (Q5976449)

Author Properties

Property ID Source & Processing
Instance of P31 Set to human (Q57162)
Label Name reformatted from "Last, First" → "First Last"
zbMATH author ID P676 From contributors.authors[].codes[0]
MaRDI profile type P1460 Set to "Person" item (Q5976445)

Journal Properties

Property ID Source & Processing
Instance of P31 Set to scientific journal (Q56973)
Label From source.series[0].title
Description Set to "scientific journal"

Processing Logic

License Conflict Handling

When data contains the license conflict message, the importer queries Crossref using the DOI to retrieve:

  • Document title
  • Journal name

Field Transformations

Field Transformation
Author names "Mustermann, Max""Max Mustermann"
Reviewer names Strips suffixes after / and ( characters
Special characters Tabs → \T, newlines → \N, carriage returns → \R

arXiv Handling

Publications with arXiv in the zbl_id receive:

  • P21 — arXiv ID (from identifier suffix)
  • P22 — arXiv classification (extracted from source field pattern [classification])

Duplicate Detection

Only one item per property should exist

Entity Matched by
Publications DE Number or arXiv ID
Authors zbMATH author ID (P676)
Journals Label + instance type

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions