acdh-oeaw / uri-norm-rules
Set of static assets used (mainly) for ARCHE data preprocessing
Requires
- php: ^8.1
Requires (Dev)
- phpunit/phpunit: ^9.5
- dev-master
- 3.21.1
- 3.21.0
- 3.20.1
- 3.20.0
- 3.19
- 3.18.1
- 3.18.0
- v3.17.1
- 3.17.0
- 3.16.0
- 3.15.2
- 3.15.1
- 3.15.0
- 3.14.1
- 3.14.0
- 3.13.0
- 3.12.2
- 3.12.1
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.7
- 3.9.6
- 3.9.5
- 3.9.4
- 3.9.3
- 3.9.2
- 3.9.1
- 3.8.1
- 3.8.0
- 3.7.0
- v3.6
- 3.5.0
- 3.4.0
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- v3.1
- v3.0
- 2.0.0
- 1.1.0
- 1.0.0
- 0.0.2
- dev-fix-for-lobid-URL-Regex-pattern
- dev-dev
This package is auto-updated.
Last update: 2025-01-10 17:18:56 UTC
README
Set of static assets used (mainly) for ARCHE data preprocessing or ARCHE information pages:
- URI normalization rules used within the ACDH-CH.
(stored inAcdhArcheAssets/uriNormRules.json
) - Description of input data formats accepted by ARCHE.
(stored inAcdhArcheAssets/formats.json
)
The repository provides also Python 3 and PHP bindings for accessing those assets.
Installation & usage
Python
- Install using pip3:
pip3 install acdh-arche-assets
- Use with
from AcdhArcheAssets.uri_norm_rules import get_rules, get_normalized_uri, get_norm_id print(f"{get_rules()}") wrong_id = "http://sws.geonames.org/1232324343/linz.html" good_id = get_normalized_uri(wrong_id) print(good_id) # "https://sws.geonames.org/1232324343/" # extract ID from URL norm_id = get_norm_id("http://sws.geonames.org/1232324343/linz.html") print(norm_id) # "1232324343" from AcdhArcheAssets.file_formats import get_formats, get_by_mtype, get_by_extension formats = get_formats() matching_mapping = get_by_mtype('image/png') matching_mapping = get_by_extension('png')
PHP
- Install using using composer:
composer require acdh-oeaw/arche-assets
- Usage with
require_once 'vendor/autoload.php'; print_r(acdhOeaw\UriNormRules::getRules()); print_r(acdhOeaw\UriNormRules::getRules(['viaf', 'gnd'])); print_r(acdhOeaw\ArcheFileFormats::getAll(); print_r(acdhOeaw\ArcheFileFormats::getByMime('application/json'); print_r(acdhOeaw\ArcheFileFormats::getByExtension('application/json');
Description of assets
URI normalization rules
Each rule consists of five properties:
name
: a rule namematch
: a regular expression matching a given URI namespacereplace
: a regular expression replace expression normalizing an URI in a given namespaceresolve
: a regular expression replace expression transforming an URI in a given namespace to an URL fetching an RDF dataformat
: a RDF serialization format to be requested while resolving the URL produced using theresolve
field
Formats
A curated and growing list of file extensions. For each file extension mappings to the respective ARCHE Resource Type Category (stored in acdh:hasCategory
) and Media Type (MIME type) (stored in acdh:hasFormat
) are given. The indicated Media Type should only be used as a fallback; it is best practice to rely on automated Media Type detection based on file signatures.
Further information is provided as well.
- fileExtension: File extension to be mapped.
- name: Name(s) the format is known
- archeCategory: The corresponding URI of the ARCHE Resource Type Category Vocabulary
- dataType: A broad category to group formats in; mainly intended for visualisation purposes.
- pronomID: ID(s) assigned by PRONOM
- mimeType: Official Media Type(s) (formerly known as MIME types) registered at IANA.
- informalMimeType: Other MIME types kown for the format
- magicNumber: A constant numerical or text value used to identify a file format, e.g. Wikipedia list of file signatures
- ianaTemplate: Link to template at IANA
- reference: Link(s) to format specifications referenced by IANA and others
- longTerm: Indicates if a format is suitable for long-term preservation.
Possible values and their meaning- yes - long-term format
- no - not suitable, another format should be used
- restricted - can be used for long-term preservation in some cases (see comment)
- unsure - status remains to be evaluated
- archeDocs: Link to a place with more information for the format.
- comment: Any other noteworthy information not stated elsewhere.
Developement (Python)
install needed developement packages pip install requirements_dev.txt
linting, tests and testcoverage
- to run the test:
tox
- check coverage and create report:
coverage run setup.py test
andcoverage html
- check linting
flake8