geonode.harvesting.tasks ======================== .. py:module:: geonode.harvesting.tasks Attributes ---------- .. autoapisummary:: geonode.harvesting.tasks.logger Functions --------- .. autoapisummary:: geonode.harvesting.tasks.harvesting_scheduler geonode.harvesting.tasks.harvesting_dispatcher geonode.harvesting.tasks.harvest_resources geonode.harvesting.tasks._harvest_resource geonode.harvesting.tasks._finish_harvesting geonode.harvesting.tasks._handle_harvesting_error geonode.harvesting.tasks.check_harvester_available geonode.harvesting.tasks.update_harvestable_resources geonode.harvesting.tasks._update_harvestable_resources_batch geonode.harvesting.tasks._finish_harvestable_resources_update geonode.harvesting.tasks._handle_harvestable_resources_update_error geonode.harvesting.tasks._delete_stale_harvestable_resources geonode.harvesting.tasks.finish_asynchronous_session geonode.harvesting.tasks.update_asynchronous_session Module Contents --------------- .. py:data:: logger .. py:function:: harvesting_scheduler(self) Check whether any of the configured harvesters needs to be run or not. This function is called periodically by celery beat. It is configured to run every `HARVESTER_SCHEDULER_FREQUENCY_MINUTES` minutes. This can be configured in the GeoNode settings. The default value is 0.5, which means that this function is called every thirty seconds. .. py:function:: harvesting_dispatcher(self, harvesting_session_id: int) Perform harvesting asynchronously. This function kicks-off a harvesting session for the input harvester id. It defines an async workflow that results in the harvestable resources being checked and harvested onto the local GeoNode. The implementation briefly consists of: - start a harvesting session - determine which of the known harvestable resources have to be harvested - schedule each harvestable resource to be harvested asynchronously - when all resources have been harvested, finish the harvesting session Internally, the code uses celery's `chord` workflow primitive. This is used to allow the harvesting of individual resources to be done in parallel (as much as the celery worker config allows for) and still have a final synchronization step, once all resources are harvested, that finalizes the harvesting session. .. py:function:: harvest_resources(self, harvestable_resource_ids: List[int], harvesting_session_id: int) Harvest a list of remote resources that all belong to the same harvester. .. py:function:: _harvest_resource(self, harvestable_resource_id: int, harvesting_session_id: int) Harvest a single resource from the input harvestable resource id .. py:function:: _finish_harvesting(self, harvesting_session_id: int) .. py:function:: _handle_harvesting_error(self, task_id, *args, **kwargs) Ensure harvester status is set back to `ready` in case of an error. .. py:function:: check_harvester_available(self, harvester_id: int) .. py:function:: update_harvestable_resources(self, refresh_session_id: int) .. py:function:: _update_harvestable_resources_batch(self, refresh_session_id: int, page: int, page_size: int) .. py:function:: _finish_harvestable_resources_update(self, refresh_session_id: int) .. py:function:: _handle_harvestable_resources_update_error(self, task_id, *args, **kwargs) .. py:function:: _delete_stale_harvestable_resources(harvester: geonode.harvesting.models.Harvester) Delete harvestable resources that haven't been found on the current refresh. If a harvestable resource was not refreshed since the last check, it this means it has not been found this time around - maybe it is not available anymore, or maybe the user just changed the harvester parameters. Either way, the harvestable resource is no longer relevant, and therefore can be deleted. When a harvestable resource is deleted, it may leave a corresponding GeoNode resource orphan. The corresponding GeoNode resource is the one that had been created locally as a result of harvesting information of the HarvestableResource on the remote server by the harvester worker. Depending on the value of the corresponding `models.Harvester.delete_orphan_resources_automatically`, the GeoNode resource may also be deleted as a result of running this method. .. py:function:: finish_asynchronous_session(session_id: int, final_status: str, final_details: Optional[str] = None, additional_processed_records: Optional[int] = None) -> None Finish the asynchronous session and also reset the harvester status. .. py:function:: update_asynchronous_session(session_id: int, total_records_to_process: Optional[int] = None, additional_processed_records: Optional[int] = None, additional_details: Optional[str] = None) -> None