HTTP To HDFS Action
The HTTP to HDFS action plugin is available in the Hub.
Plugin version: 1.3.0
Action to fetch data from an external http endpoint and create a file in HDFS.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
URL | Yes | Required. The URL to fetch data from. |
HDFS File Path | Yes | Required. The location to write the data in HDFS. If the file already exists, it will be overwritten. |
HTTP Method | No | Required. The HTTP request method. GET and POST are the allowed methods. Default is GET. |
Request Body | Yes | Optional. Request body. |
Request Headers | Yes | Optional. An optional string of header values to send in each request where the keys and values are delimited by a colon (“:”) and each pair is delimited by a newline (“\n”). |
Output File Format | No | Required. Output data should be written as Text (JSON, XML, txt files) or Binary (zip, gzip, images). Default is Text. |
Charset for Text | No | Required. If text data is selected, this should be the charset of the text being returned. Default is UTF-8. |
Should Follow Redirects ? | No | Required. Whether to automatically follow redirects. Default is true. |
Disable SSL Validation | No | Required. If user enables SSL validation, they will be expected to add the certificate to the trustStore on each machine. Default is true. |
Number of Retries | No | Required. The number of times the request should be retried if the request fails. Default is 3. |
Connection Timeout (milliseconds) | Yes | Optional. The time in milliseconds to wait for a connection. Set to 0 for infinite. Default is 60000 (1 minute). |
Read Timeout (milliseconds) | Yes | Optional. The time in milliseconds to wait for a read. Set to 0 for infinite. Default is 60000 (1 minute). |
Token Key for HDFS File Path | Yes | Optional. The key used to store the file path for the data that was written so that the file source can read from it. Plugins that run at later stages in the pipeline can retrieve the file path using this key through macro substitution:${filePath} where “filePath” is the key specified. Default is filePath. |
Token Key for Response Headers | Yes | Optional. The key used to store the response headers so that they are available to other plugins down the line. Plugins that run at later stages in the pipeline can retrieve the response headers using this through macro substitution:${responseHeaders} where “responseHeaders” is the key specified. Default is responseHeaders. |
Example
This example performs HTTP GET request to http://example.com/data and downloads the csv file to /tmp/data.csv.
Property | Value |
---|---|
URL |
|
HDFS File Path |
|
HTTP Method |
|
Output File Format |
|
Charset for Text |
|
Should Follow Redirects ? |
|
Disable SSL Validation |
|
Number of Retries |
|
Connection Timeout (milliseconds) |
|
Read Timeout (milliseconds) |
|
Created in 2020 by Google Inc.