geonode.harvesting.harvesters.base ================================== .. py:module:: geonode.harvesting.harvesters.base Attributes ---------- .. autoapisummary:: geonode.harvesting.harvesters.base.logger Exceptions ---------- .. autoapisummary:: geonode.harvesting.harvesters.base.HarvestingException Classes ------- .. autoapisummary:: geonode.harvesting.harvesters.base.BriefRemoteResource geonode.harvesting.harvesters.base.HarvestedResourceInfo geonode.harvesting.harvesters.base.BaseHarvesterWorker Functions --------- .. autoapisummary:: geonode.harvesting.harvesters.base.download_resource_file geonode.harvesting.harvesters.base._get_file_name geonode.harvesting.harvesters.base._consolidate_resource_keywords Module Contents --------------- .. py:data:: logger .. py:exception:: HarvestingException Bases: :py:obj:`Exception` Common base class for all non-exit exceptions. .. py:class:: BriefRemoteResource .. py:attribute:: unique_identifier :type: str .. py:attribute:: title :type: str .. py:attribute:: resource_type :type: str .. py:attribute:: abstract :type: Optional[str] :value: '' .. py:attribute:: should_be_harvested :type: bool :value: False .. py:class:: HarvestedResourceInfo .. py:attribute:: resource_descriptor :type: geonode.harvesting.resourcedescriptor.RecordDescription .. py:attribute:: additional_information :type: Optional[Any] .. py:attribute:: copied_resources :type: Optional[List] .. py:class:: BaseHarvesterWorker(remote_url: str, harvester_id: int) Bases: :py:obj:`abc.ABC` Base class for harvesters. This provides two relevant things: - An interface that all custom GeoNode harvesting classes must implement; - Default implementation for common functionality. .. py:attribute:: remote_url :type: str .. py:attribute:: harvester_id :type: int .. py:property:: allows_copying_resources :type: bool :abstractmethod: Whether copying remote resources is implemented by this worker .. py:method:: from_django_record(harvester: Harvester) :classmethod: :abstractmethod: Return a new instance of the worker from the django harvester .. py:method:: get_num_available_resources() -> int :abstractmethod: Return the number of available resources on the remote service. If there is a problem retrieving the number of available resource, this method shall raise `HarvestingException`. .. py:method:: list_resources(offset: Optional[int] = 0) -> List[BriefRemoteResource] :abstractmethod: Return a list of resources from the remote service. If there is a problem listing resource, this method shall raise `HarvestingException`. .. py:method:: check_availability(timeout_seconds: Optional[int] = 5) -> bool :abstractmethod: Check whether the remote service is online .. py:method:: get_geonode_resource_type(remote_resource_type: str) -> geonode.base.models.ResourceBase :abstractmethod: Return the GeoNode type that should be created from the remote resource type .. py:method:: get_resource(harvestable_resource: HarvestableResource) -> Optional[HarvestedResourceInfo] :abstractmethod: Harvest a single resource from the remote service. The return value is an instance of `HarvestedResourceInfo`. It stores an instance of `RecordDescription` and additionally whatever type is required by child classes to be able to customize resource creation/update on the local GeoNode. Note that the default implementation of `update_geonode_resource()` only needs the `RecordDescription`. The possibility to return additional information exists solely for extensibility purposes and can be left as None in the simple cases. .. py:method:: get_extra_config_schema() -> Optional[Dict] :classmethod: Return a jsonschema schema to be used to validate models.Harvester objects .. py:method:: get_current_config() -> Dict Return a dict with the current configuration. .. py:method:: finalize_resource_update(geonode_resource: geonode.base.models.ResourceBase, harvested_info: HarvestedResourceInfo, harvestable_resource: HarvestableResource) -> geonode.base.models.ResourceBase Perform additional actions just after having created/updated a local GeoNode resource. This method can be used to further manipulate the relevant GeoNode Resource that is being created/updated in the context of a harvesting operation. It is typically called from within `base.BaseHarvesterWorker.update_geonode_resource` as the last step, after having already acted upon the GeoNode resource. The default implementation does nothing. .. py:method:: finalize_harvestable_resource_deletion(harvestable_resource: HarvestableResource) -> bool Perform additional actions just before deleting a harvestable resource. This method is typically called from within `models.HarvestableResource.delete()`, just before deleting the actual harvestable resource. It can be useful for child classes that customize resource creation in order to also customize the deletion of harvestable resources. The default implementation does nothing. .. py:method:: should_copy_resource(harvestable_resource: HarvestableResource) -> bool Return True if the worker is able to copy the remote resource. The base implementation just returns False. Subclasses must re-implement this method if they support copying remote resources onto the local GeoNode. .. py:method:: copy_resource(harvestable_resource: HarvestableResource, harvested_resource_info: HarvestedResourceInfo) -> Optional[pathlib.Path] Copy a remote resource's data to the local GeoNode. The base implementation provides a generic copy using GeoNode's `storage_manager`. Subclasses may need to re-implement this method if they require specialized behavior. .. py:method:: get_geonode_resource_defaults(harvested_info: HarvestedResourceInfo, harvestable_resource: HarvestableResource) -> Dict Extract default values to be used by resource manager when updating a resource .. py:method:: update_geonode_resource(harvested_info: HarvestedResourceInfo, harvestable_resource: HarvestableResource) Create or update a local GeoNode resource with the input harvested information. If the underlying harvestable resource already exists as a local GeoNode resource, then it is updated. Otherwise it is created locally. If something goes wrong with the update of the geonode resource this method should raise a `RuntimeError`. This will be caught by the harvesting task and handled appropriately. .. py:method:: _create_new_geonode_resource(geonode_resource_type, defaults: Dict) .. py:method:: _update_existing_geonode_resource(geonode_resource: geonode.base.models.ResourceBase, defaults: Dict) .. py:function:: download_resource_file(url: str, target_name: str) -> pathlib.Path Download a resource file and store it using GeoNode's `storage_manager`. Downloads use the django `UploadedFile` helper classes. Depending on the size of the remote resource, we may download it into an in-memory buffer or store it on a temporary location on disk. After having downloaded the file, we use `storage_manager` to save it in the appropriate location. .. py:function:: _get_file_name(resource_info: HarvestedResourceInfo) -> Optional[str] .. py:function:: _consolidate_resource_keywords(resource_descriptor: geonode.harvesting.resourcedescriptor.RecordDescription, geonode_resource, harvester_id: int) -> List[str]