-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
The problem: No way to read input from argument
I am again dealing with lot of jq
-ing and I am constantly having this issue:
I want to send data directly into jq
using an argv[]
argument and not the standard input descriptor. But so far, and please correct me, if I am wrong, this seems to be impossible.
While some more regular/traditional jq
users might oppose the idea, it would be extremely powerful feature, that would come handy in many specialty situations: a subshell, parallel
, xargs
, execline
, or for example socat
, or in billions of other such specialty cases.
More than like 5 years ago, I requested something like this, and got redirected to --arg
and --argjson
. While I was thankful and while those are infinitely useful, it's not the same thing! I wasn't jq
-ing that much since then, but I am, once again, and inability to do this is literally killing me.
For example: more modern gnu
xargs
versions have grown support for-o, --open-tty
. This makesxargs
process able consume data from it's own/dev/stdin
, but still, for each record/line handling "execution",xargs
will rebind child's ([command]
,-exec {} ;
)/dev/stdin
to it's original controlling terminal again. This allows you to process entries from file/pipe as usual, yet each record "handler" can still communicate with user on tty (for example, for password entry). It's nigh impossible to usejq
in this setup efficiently without mucking around with subshelling idioms likeX="$(echo "${json_data}" | jq -r '.somefield')"
. This also requires one to spawnsh -c
for eachxargs
"record", to just be able to do subshelling.
For example: if using JSON as "binary safer" (and structured) string processing format, especially in shell scripts, which is very convenient and powerful ability, one often ends up mucking around with
VAR="$(echo "${JSON_DATA}" | jq -r '.somefield')"
again, just to extract value of.somefield
from specific${JSON_DATA}
. Similarly, despite various modern shell optimizations, this can sometimes (and in certain setups) spawn 3 sub-processes:subshell
,echo
andjq
(!) (and also constructs pipline). All just to "lift" single field (or field chain) from input JSON.
For example: in
execline
language (which is very similar tosocat
case) one would benefit greatly from ability to access fields from structured input data directly, making these tools much more powerful. But becausejq
cannot read it's input from it's argument, one has to wrap the "input sending part" intopipeline
command, in case ofexecline
, or intosh -c
in case ofsocat
, to get access to the fields, again.
In a nutshell, this feature would come incredibly handy in ad-hoc api explorations, and quick one off jobs, which iterate over larger datasets, using any OS level iterators or executors, that fork a child, but user would also benefit from /dev/stdin
being left alone or for left open other uses.
While some might argue, that for such jobs one should use something like python
, that language is not concise enough to cut through large swaths of data being pumped through command lines and pipelines, especially ad hoc. On the other hand, jq
language is sufficiently terse and syntax efficient for exactly that kind of work.
Suggestion of solution
Thus I propose introduction of --input / -i
argument, that would take the next string argument as input, make jq
consume it verbatim as an input buffer, preferably completely ignoring /dev/stdin
handling. Whether --input
should exist in the argv[]
as singleton, similarly to "jq program" argument, is probably best left to jq
maintainers to decide. But to maintain parity with "jq program" argument handling, and to decrease implementation complexity, I suggest singleton approach, ie only and exactly one --input
allowed only, ie either jq
would read from stdin
or from --input
arg.
Usage example
This is little bit contrived, but I hope it illustrates a point well, so please bear with me.
Let's say one needs to do some specific ad-hoc action for each container managed by cri-o
on a k8s node. With --input
I can get .name
field for each record directly (as if I was using unix native cut(1)
):
crictl ps -o json \
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
| xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name'
Annotation (careful invalid shell code!):
# gets JSON data from some data producer
crictl ps -o json
# "slice" and massage the dataset for our needs, ie select specific fields
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]'
# now apply resulting fields as "named columns" in "subcomand"
# - we can "reference" fields directly from argv
| xargs -0 -I'%j' jq --input '%j' -r '"name:" + .name'
Without --input
, this needs to be done instead:
crictl ps -o json \
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
| xargs -0 -I'%j' \
sh -c "printf 'name:%s\n' \$(echo '%j' | jq -r '.name')"
Observe, that in first case, %j
"variable" is just raw string. The "expansion" is handled by xargs
implicitly: it searches for literal string '%j'
in it's own argvector and then it just copy pastes it into it's subchild argvector: jq --input '%j' -r '"name:" +.name'
ie jq
subprocess literally becomes:
From:
['jq', '--input', '%j', '-r', '"name": + .name' ]
to
['jq', '--input', '{"name":"kube-proxy","id":"c31ef8zzssddrrtyt"}', '-r', '"name": + .name' ]
after each "line expansion", at the execve
level.
When combined -0
this makes such executions very safe, without worry, that in-between shell will somehow mangle them. And we are not even talking about reduction of number of sub-forks, pipes, file descriptor etc.
Because maximum lengths for each argv element are quite big these days, this allows one to do expansions like these:
crictl ps -o json \
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
| xargs -0 -I'%j' \
sh -c "cmd-do-something-cmd-somewhere --name \$(jq -i '%j' -r '.name') --id \$(jq -i '%j' -r '.id')"
Where we save one echo and two fds per pipeline each JSON data field derefercing. If we want to be extra explicit:
crictl ps -o json \
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
| xargs -0 -I'%j' \
sh -c "exec cmd-do-something-cmd-somewhere --name \$(exec jq -i '%j' -r '.name') --id \$(exec jq -i '%j' -r '.id')"
But without the --input
provision, most concise form I got to, is this (by abusing inline shell functions which removes lot of safety):
crictl ps -o json \
| jq --raw-output0 -c '[.containers[]|{name:.metadata.name,id:.id}]|.[]' \
| xargs -0 -I'$j' \
sh -c "j(){ jq -r \$@;};d(){ echo '\$j';}; cmd-do-something-cmd-somewhere --name \$(d|j '.name') --id \$(d|j '.id')"
While this might seem more compact (because of shell "hacks"), it is a lot worse from points of both execution complexity and string safety.
I believe you can infer much more advanced and even nested usage from here, especially if taking into account more complex jq
programs (loaded from files) for initial jq "selector" part of the pipeline (the jq --raw-output0 -c '[.contain...
part).
I hope what I wrote makes sense, and will make you consider this feature.