Skip to content

Conversation

@niklasmohrin
Copy link
Contributor

@niklasmohrin niklasmohrin commented Sep 24, 2020

Fixes #1193

TODO:

  • Add something like cfg(target = linux) where needed
  • Find out how to do it on Windows
  • Double check all calls to winapi::*
  • More research on GetFileInformationByHandle vs GetFinalPathNameByHandleW (+ Remove the TODO in the function)
  • Check for possible error of nixs fstat
  • Use error handler instead of just eprintln!
  • Add tests?

@niklasmohrin
Copy link
Contributor Author

niklasmohrin commented Sep 24, 2020

Right now, this is just sketched up based on @sharkdp's proposal in the issue thread and has some unwraps and stuff everywhere. I encountered some questions so far:

  1. Do you want the diff of Cargo.lock? I saw you have the file, so I just included it for now, but I can remove it again if you want.
  2. Is there any sense in trying to find out the stats of an InputKind::CustomReader(Box<dyn Read>)? If so, what is it?
  3. I think there is nothing that can be done to detect something like bat a b c | less > b. This is a somewhat stupid case, but it would still result in the infinite loop.
  4. Do you know a good way to detect this on Windows? I did not do anything Windows related yet and don't know what can and cannot be done (I am going to try to reproduce the bug on my Windows machine in a couple of days, maybe it is not even a problem there).
  5. Although cat detects these infinite loops, it still behaves strangely. Consider this:
$ cat --version
cat (GNU coreutils) 8.32
...
$ echo A > a
$ echo B > b
$ cat a b > a
$ cat a
$

$ echo A > a
$ echo B > b
$ cat a b > b
cat: b: input file is output file
$ cat b
A
$

In my opinion, it would be best if no files are changed if a loop is detected, but that would mean to be inconsistent with the behavior of cat and also writing out the contents of the file again (since it is cleared when redirecting stdout to it, even if there is no output). What is the right thing to do here?

  1. ... (to be continued some other time maybe)

@niklasmohrin niklasmohrin force-pushed the detect-same-files branch 2 times, most recently from e6747d9 to f4e0d05 Compare September 24, 2020 21:29
@niklasmohrin
Copy link
Contributor Author

Okay, I found more weird stuff. The files d and e are both 91 lines long.

Behavior on both version 0.15.4 (my normal installation) and the master branch (e4df564):

  • The command bat d e > d only writes the content of e to d (not both files, contents of d are missing).
  • The command bat d e > e is an infinite loop, repeatedly writing the contents of d into e, the file d is untouched.

@niklasmohrin
Copy link
Contributor Author

niklasmohrin commented Sep 28, 2020

I did some testing on my windows machine. Lines with $ are from WSL and lines with (PS) are from MS Powershell.

$ echo A > a
$ echo B > b
$ echo C > c
(PS) .\target\debug\bat.exe a b c > abc
(PS) .\target\debug\bat.exe a b c > b
$ xxd abc
00000000: fffe 4100 0d00 0a00 4200 0d00 0a00 4300  ..A.....B.....C.
00000010: 0d00 0a00
$ xxd b
00000000: fffe 4100 0d00 0a00 4100 0d00 0a00 430a  ..A.....A.....C.
00000010: 0d00 0a00

As you can see, first the file a is written into b, then the file b (which contains the contents of a at this point) is written and then c. Therefore, the original content of b is lost and the content a is written twice, instead of the the expected behavior. However, it is not stuck in a loop, so that's good 🤔.


I was a bit confused about the other stuff: The trailing \n was automaticlly replaced with \r\n and the Unicode Special FFFE was added to the front. According to Wikipedia (see introduction, after the list), this signals the endianness of the file, so I guess this is good to have here. Still, I am not sure whether bat should add bytes in any way, since cat never does (?) on Linux.

As a site note: When concatenating multiple files that were created on Windows (therefore having the leading FFFE), the FFFE is also in the middle of the file:

[Powershell]
$ echo A > a
$ echo B > b
$ echo C > c
$ .\target\debug\bat.exe a b c > abc
[Linux]
$ xxd a
00000000: fffe 4100 0d00 0a00                      ..A.....
$ xxd abc
00000000: fffe 4100 0d00 0a00 fffe 4200 0d00 0a00  ..A.......B.....
00000010: fffe 4300 0d00 0a00                      ..C.....

I don't know whether this is a problem, but according to the Wikipedia article from above, these chararcters are never valid. In VS Code, this file looks like this:
image

The Powershell tool Get-Content (what cat is aliased to by default) produces the file without the FFFE in beetween:

(PS) cat a, b, c > abc
$ xxd abc
00000000: fffe 4100 0d00 0a00 4200 0d00 0a00 4300  ..A.....B.....C.
00000010: 0d00 0a00                                ....

Conclusion

Running bat a b c > b on Windows is not behaving properly, but does not end up in an infinite loop. I will still do some research, how to detect this.

@sharkdp I am not sure what to do with the rest of my findings. Should I open a tracking issue for the weird behaviors, or is none of this really a problem?

@niklasmohrin
Copy link
Contributor Author

I spent some time with the Windows implementation. Firstly, I found that it is impossible to find out to which file stdout is being redirected to in PowerShell, since > is effectively an alias to | Out-File . Thus, stdout is a handle to a pipe, not to a file. See https://stackoverflow.com/questions/64144883/finding-out-where-powershell-stdout-is-redirected-to-as-opposed-to-cmd-exe

In the command prompt however, it works as aspected and all direct conflicts are detectable (bat a b c > b or bat a - c > b < b, ...). Like on Linux, something like bat a b c | less > b is not detectable.

The code has some unsafe blocks to call into the Windows APIs, I consulted the MSDN for all calls and made sure that everything is safe. There are still some TODOs, but they are minor.

Could someone take an early look at the code, even though I don't mark the PR as ready yet @sharkdp @eth-p? I am going to mark it ready when everything is settled and I squashed the commits into useful snippets.

@TheHardew
Copy link

TheHardew commented Oct 3, 2020

Okay, I found more weird stuff. The files d and e are both 91 lines long.

Behavior on both version 0.15.4 (my normal installation) and the master branch (e4df564):

  • The command bat d e > d only writes the content of e to d (not both files, contents of d are missing).
  • The command bat d e > e is an infinite loop, repeatedly writing the contents of d into e, the file d is untouched.

When a shell (at least most shells on linux) sees a redirection ('>' redirection, not '>>', which only appends to the file), it first clears the file that the output is being redirected to.

bat d e > d clears d, then bat starts, it reads d, finds nothing and it can move to reading e. It reads e, prints it to stdout, the shell redirects it do d and the execution stops.

On my linux machine bat d e >> d also created an infinite loop, as expected.

When bat d e > e is invoked, bat reads d, shell redirects it to e, bat now reads e, which has contents of d in it, shell puts it in e, so now e is d concatenated with itself, bat sees that e is not finished, it continues reading e, which is 2 * d, but bat has already read half of it, so it prints only another copy of d, shell redirects it again and now e is 3 * d, an infinite loop is created. There is really no reason to ever do bat d e > e, I discovered this bug by chance because I forgot I should have printed to a temporary file first or used sponge from moreutils, bat d e | sponge e. Still, it can fill up the whole disk, so it'd be good to prevent that from happening, if it's possible. Also, bufferring can cause some issues, like with the bat d e | less > e example, where and infinite loop is only created, if e is sufficently long.

I'm not sure why bat.exe a b c > b doesn't create an infinite loop on windows though, it does for me on linux. Maybe that's also because of buffering? Maybe bat reads the whole file before it is being written to by the shell, it sees the EOF, quits and after that shell outputs it to b, but if a was really big, an infinite loop would be created? Or maybe windows is inefficent and requires bat to quit before it starts writing to b, and if a was gigabytes long swap would be used or maybe even an out of memory crash would occur?

@niklasmohrin
Copy link
Contributor Author

Alright, I tidied up everything and rebased. Like I said, I am gonna squash the commits into 2 or 3 once reviews are done.

@TheHardew Yup, that explanation checks out, nice 👍

For the remaining TODOs from the first post, I could use some input from you guys.

@niklasmohrin niklasmohrin marked this pull request as ready for review October 11, 2020 21:28
@sharkdp
Copy link
Owner

sharkdp commented Oct 29, 2020

@niklasmohrin Thank you very much for your work on this! I haven't forgotten it, but didn't have the time to dig into this myself, so far.

There is quite a lot if platform-specific (and unsafe) code here which might be worth moving to a standalone crate. What do you think?

@niklasmohrin
Copy link
Contributor Author

@sharkdp I guess I could move this code to another repo and publish it on crates.io. By the way, I only recently saw that there is std::os::...::MetadataExt for unix and windows, but a) it is unstable on windows (at least the attributes that we need here) and b) I couldn't see a way to get a Metadata from stdin or stdout without using unsafe to convert fds to Rust Files. Therefore, I don't think it is an appropriate solution here, however it might become one in the future.

Should I start a crate on my personal account or should it be hosted in this repo?

@sharkdp
Copy link
Owner

sharkdp commented Oct 29, 2020

Should I start a crate on my personal account or should it be hosted in this repo?

If you would be up to that, I think that would be great. It sounds like something that could be used in other tools as well.

But please note that I haven't reviewed the changes here so far, so I cannot "promise" that we are actually going to integrate that new crate.

@niklasmohrin niklasmohrin marked this pull request as draft October 31, 2020 17:37
@niklasmohrin
Copy link
Contributor Author

Alright, I moved all the code into a crate on my profile, clircle. This makes the diff here pretty short, let me know what you think!

PS: Also tell me if you have suggestions for the clircle crate or any complaints about it

@niklasmohrin niklasmohrin marked this pull request as ready for review November 1, 2020 21:47
@niklasmohrin niklasmohrin marked this pull request as draft November 1, 2020 22:34
@niklasmohrin
Copy link
Contributor Author

Now everything is cross-platform and Rust 1.40 safe, please go ahead 😄

@niklasmohrin niklasmohrin marked this pull request as ready for review November 1, 2020 23:07
@sharkdp
Copy link
Owner

sharkdp commented Nov 6, 2020

Awesome! Thank you very much for creating a crate for this and updating the PR!

Before we integrate this, I would like to see a new integration test that checks the basic functionality (and also ensures that it works cross-platform). This could be added to tests/integration_tests.rs with a test similar to the other ´.failure()` tests, e.g.:

#[test]
fn fail_non_existing() {
    bat().arg("non-existing-file").assert().failure();
}

I hope that this would be possible with assert_cmd. There is Command::pipe_stdin to pipe the input from a file. But I'm not sure if we can set stdout to a file somehow.

@niklasmohrin
Copy link
Contributor Author

It doesn't look like assert_cmd can do stdout, but I think I could use assert_cmd::cargo::CommandCargoExt::cargo_bin to get a Command from the standard library. Then, I could do something like in the end at https://stackoverflow.com/questions/42726095/how-to-implement-redirection-of-stdout-of-a-child-process-to-a-file to redirect to a file and simulate bat a b c > b.

@sharkdp
Copy link
Owner

sharkdp commented Nov 7, 2020

Sounds great - thank you for looking into this.

@niklasmohrin
Copy link
Contributor Author

Alright, I released another minor version of clircle to sync up with the main branch of the repo, bumped the version here, added the tests and rebased onto master to squash everything into two commits. This should be ready @sharkdp 🎉

@sharkdp
Copy link
Owner

sharkdp commented Nov 15, 2020

@niklasmohrin Thank you very much!

I just tested this locally, and it seems like the output file will still be overwritten - even if bat exits with an error:

▶ echo a > a
▶ echo b > b
▶ bat a b > b
[bat error]: The output file is also an input!
▶ bat b
───────┬─────────────────────────────────────────
       │ File: b   <EMPTY>
───────┴─────────────────────────────────────────

I'm not sure if this has to be fixed (after all, the user definitely asked for b to be overwritten). What do you think?

@niklasmohrin
Copy link
Contributor Author

@sharkdp Yeah, that cannot be mitigated, because the file is cleared before bat even starts if you redirect stdout to the file (see the above comment from @TheHardew). It is also not possible to have bat recover the overwritten contents as I imagined earlier. The good news is, that cat cannot do this either, it just leaves everything that it has written so far in the file, but without the original content.

Overall, there is nothing that can be done to save the contents of the overwritten file when redirecting. We could introduce a -o flag that only opens the file when everything is fine and recommend that over redirection, but I think this is out of the scope of this issue.

@sharkdp
Copy link
Owner

sharkdp commented Nov 23, 2020

Yeah, that cannot be mitigated, because the file is cleared before bat even starts if you redirect stdout to the file (see the above comment from @TheHardew).

Oh, right. Of course. Thanks for the explanation!

Overall, there is nothing that can be done to save the contents of the overwritten file when redirecting. We could introduce a -o flag that only opens the file when everything is fine and recommend that over redirection, but I think this is out of the scope of this issue.

Yes, definitely. I think this new functionality here is already a great improvement.

@sharkdp sharkdp merged commit 4f0116b into sharkdp:master Nov 23, 2020
@sharkdp
Copy link
Owner

sharkdp commented Nov 23, 2020

@niklasmohrin Awesome work on this PR. Thank you very much for the dedication!

@niklasmohrin niklasmohrin mentioned this pull request Nov 24, 2020
@niklasmohrin niklasmohrin deleted the detect-same-files branch January 14, 2021 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Infinite loop when "input file is output file"

3 participants