Implementing a custom file fetcher

DKAN uses a library called getdkan/file-fetcher. This library allows developers to extend the file transfer functionality for their specialized needs.

This library is used to download a resource, such as a CSV file, so that it can be loaded into the database and presented through the UI and API. This process is called “localization,” because the source resource is copied to the local file system. Usually, this downloaded copy is temporary and is eventually removed.

The standard file fetcher processors will probably be adequate for most uses, but there could be other use cases, such as needing to authenticate, or getting a file from S3 instead of HTTP.

In cases such as these, we might want to add our own processor class to extend the file fetcher functionality.

How-to:

Note that a code example can be found in the custom_processor_test module, which is used to test this functionality.

Create a file processor class

To implement a new file processor, a create a custom file fetcher processor class. This class could extend FileFetcher\Processor\Remote or FileFetcher\Processor\Local, or be a totally new implementation of FileFetcher\Processor\ProcessorInterface.

Create a FileFetcherFactory

Next, create a new file fetcher factory class. This class should emulate Drupal\common\FileFetcher\FileFetcherFactory. There is example code in the custom_processor_test module which demonstrates how to do this.

The new factory should create and configure a FileFetcher\FileFetcher object to use your new custom processor. Do this by merging configuration for your new processor into the $config['processors'] array that is passed to FileFetcherFactory::getInstance():

public function getInstance(string $identifier, array $config = []) {
  // Add OurProcessor as a custom processor.
  $config['processors'] = array_merge(
    [OurProcessor::class],
    $config['processors'] ?? []
  );
  // Get the instance from the decorated factory, using our modified config.
  return $this->decoratedFactory->getInstance($identifier, $config);
}

Declare your factory as a service

It is also very important to declare your new factory class as a service. You accomplish this by decorating dkan.common.file_fetcher in your module’s *.services.yml file, something like this:

our_module.file_fetcher:
  class: Drupal\our_module\FileFetcher\FileFetcherFactory
  decorates: dkan.common.file_fetcher
  arguments: ['@our_module.file_fetcher.inner']

Now whenever DKAN uses the dkan.common.file_fetcher service, your file fetcher factory will be used instead, and your new processor will find its way into use.

Processor negotiation

It’s important to know how FileFetcher goes about choosing a processor.

File fetcher knows about two processors by default: FileFetcher\Processor\Local and FileFetcher\Processor\Remote. It also knows about whichever custom processor class names you configured in the processors array in configuration.

When you ask a file fetcher object to perform the transfer (using FileFetcher::run()), it will instantiate all the different types of processors it knows about.

Then it will loop through them and use the ProcessorInterface::isServerCompatible() method to determine if the given source is suitable for use with that processor object. The file fetcher will use the first processor that answers true.

You can look at the implementations of FileFetcher\Processor\Local::isServerCompatible() or FileFetcher\Processor\Remote::isServerCompatible() to see how they each handle the question of whether they’re suitable for the source.