APEXMALHAR-2427 Kinesis Input Operator documentation.#564
APEXMALHAR-2427 Kinesis Input Operator documentation.#564deepak-narkhede wants to merge 1 commit intoapache:masterfrom
Conversation
|
@chaithu14 Please review. |
a12bd87 to
039bead
Compare
|
|
||
| ### Pre-requisites | ||
|
|
||
| This operator uses the AWS kinesis java sdk version 1.9.10. |
There was a problem hiding this comment.
Is it a pre-requisite ? You may remove the "pre-requisites" and add this information after Maven Aritifiact
| ### Introduction | ||
|
|
||
| The kinesis input operator consumes data from the partitions of a Kinesis shard(s) for processing in Apex. | ||
| The operator is fault-tolerant, scalable and supports input from multiple shards of AWS kinesis streams in a single operator instance. |
There was a problem hiding this comment.
A suggestion : you may highlight "AWS Kinesis Streams" and provide link to https://aws.amazon.com/kinesis/streams/
|  | ||
|
|
||
| #### Configuration properties | ||
|
|
There was a problem hiding this comment.
Add a table with columns "Parameter" , Type, Description, Mandatory , Default Value etc
This will make it readable and consistent with other operator documentation. You may refer to KafkaInputOperator documentation
| #### Abstract Methods | ||
|
|
||
| `T getTuple(Record record)`: Converts the Kinesis Stream record(s) to tuple. | ||
| `void emitTuple(Pair<String, Record> data)`: This method emits tuples extracted from AWS kinesis streams. |
There was a problem hiding this comment.
This should be on another line. Do verify it by viewing the doc in mkdocs on chrome
|
|
||
| - ***streamName*** - String[] | ||
| - Mandatory Parameter. | ||
| - Specifies the name of the stream from where the records to be accessed. |
There was a problem hiding this comment.
records are accessed / records are to be accessed
| - ***strategy*** - PartitionStrategy | ||
| - Operator supports two types of partitioning strategies, `ONE_TO_ONE` and `MANY_TO_ONE`. | ||
|
|
||
| `ONE_TO_ONE`: If this is enabled, the AppMaster creates one input operator instance per kinesis shard partition. So the number of shard partitions equals the number of operator instances. |
There was a problem hiding this comment.
number of operator instances equals the number of shard partitions
| Default Value = 30 Seconds. | ||
|
|
||
| - ***repartitionCheckInterval*** - Long | ||
| - Interval specified in milliseconds. This value specifies the minimum interval between two stat checks. |
| Default value = 1024. | ||
|
|
||
| - ***windowDataManager*** - WindowDataManager | ||
| - If set to a value other than the default, such as `FSWindowDataManager`, specifies that the operator will process the same set of messages in a window before and after a failure. This is important but it comes with higher cost because at the end of each window the operator needs to persist some state with respect to that window. |
There was a problem hiding this comment.
This could be rephrased. The description should convey following .
- What is the purpose of this property ?
- How does the default work ?
- What are my choices if i want to specify any other windowDataManager and how each of those choices will affect the behaviour?
| returned by processStats(...) method, the application master invokes | ||
| definePartitions(...) on the logical instance which is also the | ||
| partitioner instance. Dynamic partition can be disabled by setting the | ||
| parameter repartitionInterval value to a negative value. |
There was a problem hiding this comment.
highlight repartitionInterval
| } | ||
| } | ||
| ``` | ||
| Below is the configuration for “KINESIS-STREAM-NAME” Kinesis stream name: |
There was a problem hiding this comment.
Did not quite understand the meaning of this line. Are you referring to configuration for the app ?
No description provided.