geonode.harvesting.tasks
Attributes
Functions
|
Check whether any of the configured harvesters needs to be run or not. |
|
Perform harvesting asynchronously. |
|
Harvest a list of remote resources that all belong to the same harvester. |
|
Harvest a single resource from the input harvestable resource id |
|
|
|
Ensure harvester status is set back to ready in case of an error. |
|
|
|
|
|
|
|
|
|
|
|
Delete harvestable resources that haven't been found on the current refresh. |
|
Finish the asynchronous session and also reset the harvester status. |
|
Module Contents
- geonode.harvesting.tasks.harvesting_scheduler(self)[source]
Check whether any of the configured harvesters needs to be run or not.
This function is called periodically by celery beat. It is configured to run every HARVESTER_SCHEDULER_FREQUENCY_MINUTES minutes. This can be configured in the GeoNode settings. The default value is 0.5, which means that this function is called every thirty seconds.
- geonode.harvesting.tasks.harvesting_dispatcher(self, harvesting_session_id: int)[source]
Perform harvesting asynchronously.
This function kicks-off a harvesting session for the input harvester id. It defines an async workflow that results in the harvestable resources being checked and harvested onto the local GeoNode.
The implementation briefly consists of:
start a harvesting session
determine which of the known harvestable resources have to be harvested
schedule each harvestable resource to be harvested asynchronously
when all resources have been harvested, finish the harvesting session
Internally, the code uses celery’s chord workflow primitive. This is used to allow the harvesting of individual resources to be done in parallel (as much as the celery worker config allows for) and still have a final synchronization step, once all resources are harvested, that finalizes the harvesting session.
- geonode.harvesting.tasks.harvest_resources(self, harvestable_resource_ids: List[int], harvesting_session_id: int)[source]
Harvest a list of remote resources that all belong to the same harvester.
- geonode.harvesting.tasks._harvest_resource(self, harvestable_resource_id: int, harvesting_session_id: int)[source]
Harvest a single resource from the input harvestable resource id
- geonode.harvesting.tasks._handle_harvesting_error(self, task_id, *args, **kwargs)[source]
Ensure harvester status is set back to ready in case of an error.
- geonode.harvesting.tasks._update_harvestable_resources_batch(self, refresh_session_id: int, page: int, page_size: int)[source]
- geonode.harvesting.tasks._finish_harvestable_resources_update(self, refresh_session_id: int)[source]
- geonode.harvesting.tasks._handle_harvestable_resources_update_error(self, task_id, *args, **kwargs)[source]
- geonode.harvesting.tasks._delete_stale_harvestable_resources(harvester: geonode.harvesting.models.Harvester)[source]
Delete harvestable resources that haven’t been found on the current refresh.
If a harvestable resource was not refreshed since the last check, it this means it has not been found this time around - maybe it is not available anymore, or maybe the user just changed the harvester parameters. Either way, the harvestable resource is no longer relevant, and therefore can be deleted.
When a harvestable resource is deleted, it may leave a corresponding GeoNode resource orphan. The corresponding GeoNode resource is the one that had been created locally as a result of harvesting information of the HarvestableResource on the remote server by the harvester worker. Depending on the value of the corresponding models.Harvester.delete_orphan_resources_automatically, the GeoNode resource may also be deleted as a result of running this method.