Skip to content

feature: Read input from an argv argument instead of stdin using --input flag #3293

@etosan

Description

@etosan

The problem: No way to read input from argument

I am again dealing with lot of jq-ing and I am constantly having this issue:

I want to send data directly into jq using an argv[] argument and not the standard input descriptor. But so far, and please correct me, if I am wrong, this seems to be impossible.

While some more regular/traditional jq users might oppose the idea, it would be extremely powerful feature, that would come handy in many specialty situations: a subshell, parallel, xargs, execline, or for example socat, or in billions of other such specialty cases.

More than like 5 years ago, I requested something like this, and got redirected to --arg and --argjson. While I was thankful and while those are infinitely useful, it's not the same thing! I wasn't jq-ing that much since then, but I am, once again, and inability to do this is literally killing me.

For example: more modern gnu xargs versions have grown support for -o, --open-tty. This makes xargs process able consume data from it's own /dev/stdin, but still, for each record/line handling "execution", xargs will rebind child's ([command], -exec {} ;) /dev/stdin to it's original controlling terminal again. This allows you to process entries from file/pipe as usual, yet each record "handler" can still communicate with user on tty (for example, for password entry). It's nigh impossible to use jq in this setup efficiently without mucking around with subshelling idioms like X="$(echo "${json_data}" | jq -r '.somefield')". This also requires one to spawn sh -c for each xargs "record", to just be able to do subshelling.

For example: if using JSON as "binary safer" (and structured) string processing format, especially in shell scripts, which is very convenient and powerful ability, one often ends up mucking around with VAR="$(echo "${JSON_DATA}" | jq -r '.somefield')" again, just to extract value of .somefield from specific ${JSON_DATA}. Similarly, despite various modern shell optimizations, this can sometimes (and in certain setups) spawn 3 sub-processes: subshell, echo and jq(!) (and also constructs pipline). All just to "lift" single field (or field chain) from input JSON.

For example: in execline language (which is very similar to socat case) one would benefit greatly from ability to access fields from structured input data directly, making these tools much more powerful. But because jq cannot read it's input from it's argument, one has to wrap the "input sending part" into pipeline command, in case of execline, or into sh -c in case of socat, to get access to the fields, again.

In a nutshell, this feature would come incredibly handy in ad-hoc api explorations, and quick one off jobs, which iterate over larger datasets, using any OS level iterators or executors, that fork a child, but user would also benefit from /dev/stdin being left alone or for left open other uses.

While some might argue, that for such jobs one should use something like python, that language is not concise enough to cut through large swaths of data being pumped through command lines and pipelines, especially ad hoc. On the other hand, jq language is sufficiently terse and syntax efficient for exactly that kind of work.

Suggestion of solution

Thus I propose introduction of --input / -i argument, that would take the next string argument as input, make jq consume it verbatim as an input buffer, preferably completely ignoring /dev/stdin handling. Whether --input should exist in the argv[] as singleton, similarly to "jq program" argument, is probably best left to jq maintainers to decide. But to maintain parity with "jq program" argument handling, and to decrease implementation complexity, I suggest singleton approach, ie only and exactly one --input allowed only, ie either jq would read from stdin or from --input arg.

Usage example

This is little bit contrived, but I hope it illustrates a point well, so please bear with me.

Let's say one needs to do some specific ad-hoc action for each container managed by cri-o on a k8s node. With --input I can get .name field for each record directly (as if I was using unix native cut(1)):

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name'

Annotation (careful invalid shell code!):

  # gets JSON data from some data producer
  crictl ps -o json

  # "slice" and massage the dataset for our needs, ie select specific fields
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]'

  # now apply resulting fields as "named columns" in "subcomand"
  # - we can "reference" fields directly from argv
  | xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name' 

Without --input, this needs to be done instead:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "printf 'name:%s\n' \$(echo '%j' | jq -r '.name')"

Observe, that in first case, %j "variable" is just raw string. The "expansion" is handled by xargs implicitly: it searches for literal string '%j' in it's own argvector and then it just copy pastes it into it's subchild argvector: jq --input '%j' -r '"name:" +.name' ie jq subprocess literally becomes:

From:

['jq', '--input', '%j', '-r', '"name": + .name' ]

to

['jq', '--input', '{"name":"kube-proxy","id":"c31ef8zzssddrrtyt"}', '-r', '"name": + .name' ]

after each "line expansion", at the execve level.

When combined -0 this makes such executions very safe, without worry, that in-between shell will somehow mangle them. And we are not even talking about reduction of number of sub-forks, pipes, file descriptor etc.

Because maximum lengths for each argv element are quite big these days, this allows one to do expansions like these:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "cmd-do-something-cmd-somewhere --name \$(jq -i '%j' -r '.name') --id \$(jq -i '%j' -r '.id')"

Where we save one echo and two fds per pipeline each JSON data field derefercing. If we want to be extra explicit:

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'%j' \
       sh -c "exec cmd-do-something-cmd-somewhere --name \$(exec jq -i '%j' -r '.name') --id \$(exec jq -i '%j' -r '.id')"

But without the --input provision, most concise form I got to, is this (by abusing inline shell functions which removes lot of safety):

crictl ps -o json \
  | jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
    | xargs -0 -I'$j' \
       sh -c "j(){ jq -r \$@;};d(){ echo '\$j';}; cmd-do-something-cmd-somewhere --name \$(d|j '.name') --id \$(d|j '.id')"

While this might seem more compact (because of shell "hacks"), it is a lot worse from points of both execution complexity and string safety.

I believe you can infer much more advanced and even nested usage from here, especially if taking into account more complex jq programs (loaded from files) for initial jq "selector" part of the pipeline (the jq --raw-output0 -c '[.contain... part).

I hope what I wrote makes sense, and will make you consider this feature.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions