Skip to content

Conversation

@RaymiiOrg
Copy link
Collaborator

@RaymiiOrg RaymiiOrg commented Nov 6, 2020

I saw the bug (#4) which seems simple enough for me to look at. I choose to assume default values (for name and autocreate size) if none were given, I hope you agree with that (the bug itself has little context, just, subject says enough). I also used std::filesystem and an std::fstream because I'm used to those (I haven't worked with FILE* much), plus split up the code a bit in separate small functions.

Default filename like: pci0.15(ali_ide).disk0.0(file).default.img

Output when nothing is specified for the disk:

pci0.15(ali_ide).disk0.0(file): Disk has no filename attached! Assuming default: pci0.15(ali_ide).disk0.0(file).default.img
pci0.15(ali_ide).disk0.0(file): Could not open diskfile pci0.15(ali_ide).disk0.0(file).default.img
File pci0.15(ali_ide).disk0.0(file).default.img does not exist and no autocreate_size set. Assuming default size: 2000MB
pci0.15(ali_ide).disk0.0(file) 1907MB file pci0.15(ali_ide).disk0.0(file).default.img created
pci0.15(ali_ide).disk0.0(file): Mounted file pci0.15(ali_ide).disk0.0(file).default.img, 3906250 512-byte blocks, 15625/10/25.

If it's a CD drive, autocreation is not done and we just exit (no segfault).

I also considered a sort of configuration file validator before starting the emulator, but that would require quite a bit more work. And, then we'd have a validated config file but in all code would validate again, that seems double work ...

Let me know what you think and if you like the default value idea. My reasoning was that in the config file there already is a disk section, so the user does seem to want a disk, lets do a default that is good enough to install OpenVMS on.

@lenticularis39
Copy link
Owner

lenticularis39 commented Nov 6, 2020

Thank you for submitting this.

  • I like the idea of assigning a default disk image filename, but I believe this is not a good idea for the autocreate size. The fact that the user specified an non-existing file doesn't have to mean they want to create a new one, it may mean there is a typo in the filename, and in that case AXPbox shouldn't create a new file - it should rather exit with an error. Also consider a situation where disk space is limited - the partition being filled with the disk image to 100 % would not be a pleasant surprise for the user.
  • Quitting the application by calling std::exit is definitely not good - even when startup fails cleanup should be done. I'd rather keep the old way of error handling through an exception (see below).
  • I would prefer not introducing experimental/C++17 features into the code without a strong reason, since it can make portability much worse (consider a system unsupported by GCC). I understand you being more comfortable with working with what you know, especially in your first code contribution to a new project, so you can definitely continue using whatever API you want and I'll rewrite the code if needed.
  • There is a .clang_format file you can use with clang-format to format the code (specifically here isn't doesn't really matter, since the formatting is broken at a few more places outside of your changes, so there will be a style fixup commit, but in the future all PRs should conform to the style).

Also this doesn't solve #4, only implements a workaround preventing the situation where the bug triggers from happening in most cases (try setting the disk image filename to an unreadable file - you get the same segfault). The cause of the segfault is probably the cleanup trying to free the disk object, which wasn't created because of the exception being triggered inside its construct (definitely not the exception itself).

Edit: Sorry, I used the wrong executable for the debugger - the segfault in your code after setting the disk file to an unreadable file is a different one:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7865f13 in fseeko64 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7865f13 in fseeko64 () from /lib64/libc.so.6
#1  0x00005555555a06cd in CDiskFile::CDiskFile (this=0x7fffe69c1010, cfg=<optimized out>, sys=<optimized out>, c=<optimized out>, 
    idebus=<optimized out>, idedev=<optimized out>) at /data/dev/axpbox/src/DiskFile.cpp:154
#2  0x0000555555595415 in CConfigurator::initialize (this=0x555555645030) at /data/dev/axpbox/src/Configurator.h:146
#3  0x0000555555595068 in CConfigurator::initialize (this=0x555555644bc0) at /data/dev/axpbox/src/Configurator.cpp:918
#4  0x0000555555595068 in CConfigurator::initialize (this=0x555555643250) at /data/dev/axpbox/src/Configurator.cpp:918
#5  0x00005555555968e9 in CConfigurator::CConfigurator (this=<optimized out>, parent=0x0, name=<optimized out>, 
    value=<optimized out>, 
    text=0x555555642130 "gui=sdl{keyboard.map=\"keys.map\";keyboard.use_mapping=false;}sys0=tsunami{memory.bits=28;rom.srm=\"rom/cl67srmrom.exe\";rom.decompressed=\"rom/decompressed.rom\";rom.flash=\"rom/flash.rom\";rom.dpr=\"rom/dpr."..., textlen=622)
    at /data/dev/axpbox/src/Configurator.cpp:445
#6  0x000055555558e8b2 in main_sim (argc=<optimized out>, argv=<optimized out>) at /data/dev/axpbox/src/AlphaSim.cpp:313
#7  0x00005555555a4708 in main (argc=2, argv=0x7fffffffdcc0) at /data/dev/axpbox/src/Main.cpp:21

However this problem will still persist when using the exception instead of std::exit.

(the original segfault)

Program received signal SIGSEGV, Segmentation fault.
0x00005555555b7ee1 in CSystem::stop_threads (this=0x5555556489a0) at /data/dev/axpbox/src/System.cpp:2391
warning: Source file is more recent than executable.
2391	  for (i = 0; i < iNumCPUs; i++)
(gdb) bt
#0  0x00005555555b7ee1 in CSystem::stop_threads (this=0x5555556489a0) at /data/dev/axpbox/src/System.cpp:2391
#1  0x000055555558e975 in main_sim (argc=<optimized out>, argv=<optimized out>) at /data/dev/axpbox/src/AlphaSim.cpp:385
#2  0x00005555555a3e90 in main (argc=2, argv=0x7fffffffdcc0) at /data/dev/axpbox/src/Main.cpp:21

Let me know if you have any questions related to this - I'm aware it is a little bit confusing.

@lenticularis39
Copy link
Owner

And another comment - please rebase your branch instead of merging the main branch into it, the commit history will look better. (I'll include this and some other things into a guide for contributors.)

@RaymiiOrg
Copy link
Collaborator Author

First of all, thank you for taking the time to look at the pr and write such a detailed response. Helps me a lot and I appreciate it.

Ooh, merge vs. rebase, almost as big a holy war as vim/emacs. I'll undo that commit, so master isn't merged in.

After reading your response I realized indeed that the segfault wasn't caused by the FAILURE_ macro throwing an exception, my assumption was wrong. Thus most of the code I wrote was also incorrect in solving the issue since I didn't understood the cause of the issue.

I've updated the pr, removed the c++17 and std::filesystem, since I only wanted to use that to check if the file exists. stat is not cross platform, but fstream's are, looking at .good()in this case if fine enough.

What are the platforms axpbox targets? compiler wise, gcc, clang, msvc I assume, OS wise Windows, Linux and OS X where-ever those compilers run? Or are *BSDs included as well? C++ 11?

I also moved the clang format file to the root of the project, my IDE (clion) otherwise doesn't want to use it.

(I'm used to a linux/nucleus and GCC specific codebase which recently we've upgraded to support C++ 17, no cross platform required, mostly business logic, might explain some of my code)

I'm now working on a correct solution for the segfault.

@RaymiiOrg RaymiiOrg force-pushed the segfault_on_nonexisting_disk_image branch from 11c40df to e387224 Compare November 6, 2020 19:32
R. van Elst added 3 commits November 6, 2020 21:01
…s[] only

has 7 items. Delete-ing number 8 segfaults, which means that memory wasn't new-ed.
During construction of components, they are registered and counted in a counter.

I first tried to change the array to a std::vector, but that left a half constructed item,
failed as well. Adding an unregister function, and calling that in the destructor,
assures that the component that failed to construct (due to disk throwing an error inside its constructor)
is not used later on in other destructors.
@RaymiiOrg
Copy link
Collaborator Author

RaymiiOrg commented Nov 6, 2020

I think I've found the issue. During construction, the disk component throws an exception, which in main is handled as the last generic exception. There is a counter for the components, which is incremented in the base class. The component never constructs, so the destructor accesses invalid memory. I've added a counter decrement to the base class destructor.

In the image you see iNumComponents = 8, but the eight component in that array is not yet complete (it's blue instead of yellow).

afbeelding

Let me know what you think!

@lenticularis39
Copy link
Owner

First of all, thank you for taking the time to look at the pr and write such a detailed response. Helps me a lot and I appreciate it.

Ooh, merge vs. rebase, almost as big a holy war as vim/emacs. I'll undo that commit, so master isn't merged in.

I certainly prefer merge when adding changes from a PR, I only don't like merging the main branch into a working branch, since there would be multiple merges in the main branch after merging the working branch.

After reading your response I realized indeed that the segfault wasn't caused by the FAILURE_ macro throwing an exception, my assumption was wrong. Thus most of the code I wrote was also incorrect in solving the issue since I didn't understood the cause of the issue.

Yeah, this issue is a tricky one - if it was that easy I would have solved it already.

I've updated the pr, removed the c++17 and std::filesystem, since I only wanted to use that to check if the file exists. stat is not cross platform, but fstream's are, looking at .good()in this case if fine enough.

What are the platforms axpbox targets? compiler wise, gcc, clang, msvc I assume, OS wise Windows, Linux and OS X where-ever those compilers run? Or are *BSDs included as well? C++ 11?

es40 aimed to support a wide range of operating system, including Windows and OpenVMS. I currently aim to target Unix-like systems in general (which are targeted by POCO POSIX support), but in the future I may change that to whatever Qt Core 5 runs on (I'm currently working on this in the qtcore branch, but I may decide not to do this - also see #7). As of the C++ version - es40 is written in very generic C++, esentially using it only as "C with classes"; for AXPbox C++11 is fine (and in fact I would like to modify it to include C++11 features as unique_ptr and so on). It comes with the cost of dropping support of some platforms that don't have a C++11 compiler, but I consider this less important.

(Another fact is the version of CMake used doesn't build without C++11, so I can't build it for example on my Mac OS X Leopard machine, so one could say it's kind of required even now.)

I also moved the clang format file to the root of the project, my IDE (clion) otherwise doesn't want to use it.

Good idea - I didn't know that and set the CLion style manually.

(I'm used to a linux/nucleus and GCC specific codebase which recently we've upgraded to support C++ 17, no cross platform required, mostly business logic, might explain some of my code)

I'm now working on a correct solution for the segfault.

OK :)

@lenticularis39
Copy link
Owner

I think I've found the issue. During construction, the disk component throws an exception, which in main is handled as the last generic exception. There is a counter for the components, which is incremented in the base class. The component never constructs, so the destructor accesses invalid memory. I've added a counter decrement to the base class destructor.

In the image you see iNumComponents = 8, but the eight component in that array is not yet complete (it's blue instead of yellow).

Let me know what you think!

I also noticed that yesterday. The cause is the CSystemComponent parent class registering the component to the system (see CSystemComponent.cpp) in its constructor, and that is of course called before the CDiskFile constructor, so the component object is added, but never finalized since the exception is thrown.

@RaymiiOrg
Copy link
Collaborator Author

Yesterday I added some more commits to handle the issue, (unwinding in the correct order) but not sure if that's the correct way. The segfault for me at least doesn't occur.

I'll try to work the comments you made on platform support and development contributions into the wiki (or the GitHub document which is shown when doing a merge request). Should save you some work.

Copy link
Owner

@lenticularis39 lenticularis39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise this looks nice. I also like the fact you used C++ functions instead of C ones (as in es40) - in the future I would like to rewrite the entire code to be like that :)

src/System.cpp Outdated
}

void CSystem::UnregisterComponent(CSystemComponent *component) {
if (acComponents[iNumComponents] == component)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if unregistering a component this way is a good idea - the destructors may be called in a different order that the constructors.

Maybe the component storing its number and checking for that would help with this?

(I don't generally like the component managing its registration in CSystem, but that's another issue.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think comparing by memory address is less likely to cause race condition when components are destructed i another order. A counter would have to be atomic, and a for loop to compare all items would have the same crash behavior.

Catching the exception at creation and then handling it would be another solution but that would also require some refactoring of the component registration (it happens so early in the base class, it be better if it was registering after constructing for this bug).

I could try to just not set it to a nullptr because it will be deleted right after the call to stop threads? Or try how a std::atomic counter would perform? What has your preference?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - I missed that the component number couldn't be read from an uninitialized component. I take my comment back, you can keep the code as it is.

Copy link
Owner

@lenticularis39 lenticularis39 Nov 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try to just not set it to a nullptr because it will be deleted right after the call to stop threads? Or try how a std::atomic counter would perform? What has your preference?

I might be missing something, but doesn't the if before the nullptr setting check the wrong index? iNumComponents is increased in CSystem::RegisterComponent, so it should point to the next (empty) element, not to the last one. If that's the case then the setting to nullptr isn't needed, because it clearly works without it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, it checks the wrong component. The entire check can be removed, I'll do that right away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done, now it just decrements the counter.

@RaymiiOrg
Copy link
Collaborator Author

Otherwise this looks nice. I also like the fact you used C++ functions instead of C ones (as in es40) - in the future I would like to rewrite the entire code to be like that :)

Which specific c++ functions vs c functions do you refer to?

@lenticularis39
Copy link
Owner

Otherwise this looks nice. I also like the fact you used C++ functions instead of C ones (as in es40) - in the future I would like to rewrite the entire code to be like that :)

Which specific c++ functions vs c functions do you refer to?

std:cerr instead of fprintf and ifstream instead of FILE *.

@lenticularis39 lenticularis39 merged commit 21b5824 into lenticularis39:main Nov 7, 2020
@RaymiiOrg RaymiiOrg deleted the segfault_on_nonexisting_disk_image branch November 17, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants