Guild icon
wafer.space Community
Information / general / Test Submission Platform Issues & Successes?
Between 11/30/2025 23:59 and 01/01/2026 00:00
Avatar
Tim 'mithro' Ansell 12/01/2025 04:22
Dunno how time keeps disappearing.
04:23
I'm just in the process of updating https://test-platform.wafer.space and also looking at deploying the "real" https://platform.wafer.space
Platform for wafer.space low cost silicon manufacturing.
04:29
The platform should be able to run 3 checks in parallel
Avatar
Tim 'mithro' Ansell 12/01/2025 04:48
Avatar
Noritsuna Imamura 12/01/2025 05:01
"Permission denied"...
Avatar
Avatar
Noritsuna Imamura
"Permission denied"...
Tim 'mithro' Ansell 12/01/2025 05:20
Poke me again in about 30m
πŸ‘ 1
Avatar
ReJ aka Renaldas Zioma 12/01/2025 09:24
Mine is stuck between "Check queued" and "Checking..." going back and forth.
09:24
09:24
09:24
cc @Tholin
Avatar
Noritsuna Imamura 12/01/2025 10:19
I'm in the same situation.
Avatar
Tim 'mithro' Ansell 12/01/2025 10:37
Deploying a new version right now which should hopefully fix that.
Avatar
ReJ aka Renaldas Zioma 12/01/2025 10:40
@Tim 'mithro' Ansell do we need to resubmit?
Avatar
Tim 'mithro' Ansell 12/01/2025 10:41
It might start automatically.
10:42
It currently takes ~20m to do a deployment. I need to figure out why it takes so long but that is someone a future problem.
πŸ‘€ 2
πŸ†— 1
😫 1
Avatar
Tim 'mithro' Ansell 12/01/2025 11:35
Well, that didn't work - trying again now.
πŸ†— 1
Avatar
Do you need help with this?
Avatar
Tim 'mithro' Ansell 12/01/2025 11:53
Running into differences between my local workstation and the remote vm.
11:54
Also regretting the decision to not make manufacturing checks work like download with seperate download attempts. One of the things to fix after we get past this bit.
Avatar
IMHO it makes sense just to fire on VMs on demand for this (I think that's with ChipFoundry is doing?)
11:58
For TT sumbission we use fly.io
Avatar
Tim 'mithro' Ansell 12/01/2025 11:58
I am firing up docker containers on demand
Avatar
Their advantage is that they can launch a vm in ~100 ms
11:59
(using firecracker, so it's actually some hybrid between VM and container)
Avatar
Tim 'mithro' Ansell 12/01/2025 11:59
The problem is mostly the django/celery code for managing the vm lifecycle stuff.
Avatar
With fly they manage the lifecycle for you (but their compute is probably x10 more expensive compare with hetzner)
Avatar
Tim 'mithro' Ansell 12/01/2025 12:09
Things will be down for a little bit.
Avatar
Tim 'mithro' Ansell 12/01/2025 12:24
Should be back now.
Avatar
Dispatched!
12:25
Avatar
Tim 'mithro' Ansell 12/01/2025 12:26
Dispatched means it's been sent to celery to run the docker command.....
Avatar
Tim 'mithro' Ansell 12/01/2025 12:38
Well, that is new......
12:41
@Leo Moser (mole99) - Something seems to have gone weird with nix in the docker container...
12:48
12:48
The error is: > error: unable to download 'https://www.python.org/ftp/python/3.12.10/Python-3.12.10.tar.xz': Could not resolve hostname (6) Could not resolve host: www.python.org
12:49
looks like an intermittent DNS error to me?
Avatar
Tim 'mithro' Ansell 12/01/2025 12:49
The docker container doesn't have network access.
12:50
And nix should be running with --offline too...
Avatar
The logs don't show the actual nix command, so unfortunately I can't help further
Avatar
Tim 'mithro' Ansell 12/01/2025 12:55
Precheck for wafer.space MPW runs using the gf180mcu PDK - build: add precheck --help validation step to Dockerfile · wafer-space/gf180mcu-precheck@bf512cd
12:58
docker run --rm --network=none -e COLUMNS=200 -e TERM=xterm-256color -v /home/django/platform.wafer.space/wafer_space/media/projects/562a942d-68e0-4370-8573-9cc36ffafd79/wafer-space.gf180mcu-project-template.r19704603402-a4686 122452.0p5x0p5_gds.chip_top.gds:/input/design.gds:ro -w /workspace --memory 64g ghcr.io/wafer-space/gf180mcu-precheck:latest python3 precheck.py --input /input/design.gds --top "chip_top" --slot 0p5x0p5
Avatar
I guess I have to re-submit the file to retry?
Avatar
Tim 'mithro' Ansell 12/01/2025 13:27
I can make it retry in a moment
Avatar
Noritsuna Imamura 12/01/2025 13:54
When an error occurs, this screen appears and the processing logs disappear. How can I check the error logs?
Avatar
Avatar
Noritsuna Imamura
When an error occurs, this screen appears and the processing logs disappear. How can I check the error logs?
Tim 'mithro' Ansell 12/01/2025 14:09
I just fixed the issue with the logs getting overwritten, its in the process of deploying.
πŸ†— 1
Avatar
Avatar
Noritsuna Imamura
When an error occurs, this screen appears and the processing logs disappear. How can I check the error logs?
Tim 'mithro' Ansell 12/01/2025 14:09
The real issue in your case is that the job which cleans up orphan docker containers cleaned up your docker container.
Avatar
ReJ aka Renaldas Zioma 12/01/2025 14:10
@Tim 'mithro' Ansell is the /home/django/platform… a correct path? since we submit to test-platform… (edited)
Avatar
Tim 'mithro' Ansell 12/01/2025 14:13
Yeah - the test-platform is suppose to just be the production deployment with a different name.
Avatar
Avatar
Tim 'mithro' Ansell
Yeah - the test-platform is suppose to just be the production deployment with a different name.
ReJ aka Renaldas Zioma 12/01/2025 14:16
That's what I get if I try to run docker command on my Ubuntu machine (edited)
Avatar
Tim 'mithro' Ansell 12/01/2025 14:20
Did you change the bit after the -v to match where your GDS file is?
Avatar
Avatar
Tim 'mithro' Ansell
@Leo Moser (mole99) - Something seems to have gone weird with nix in the docker container...
Leo Moser (mole99) 12/01/2025 14:26
Is this still an issue?
Avatar
Tim 'mithro' Ansell 12/01/2025 14:27
@Leo Moser (mole99) - no it seems to go away after I deleted the container and repulled
πŸ‘Œ 1
Avatar
Avatar
Noritsuna Imamura
When an error occurs, this screen appears and the processing logs disappear. How can I check the error logs?
Tim 'mithro' Ansell 12/01/2025 16:14
Check seems to be running for your design now.
πŸ†— 1
Avatar
Avatar
Tim 'mithro' Ansell
Did you change the bit after the -v to match where your GDS file is?
ReJ aka Renaldas Zioma 12/01/2025 16:25
ah, turns out, I need to provide the full path from the ~/ not just local path (I was trying to run docker from the folder with gds files). Works now! (edited)
16:27
works: docker run --rm --network=none -v ~/z80-open-silicon-tapeout/z80_quarter.gds:/input/design.gds:ro ... doesn't: docker run --rm --network=none -v z80_quarter.gds:/input/design.gds:ro ... (edited)
Avatar
Tim 'mithro' Ansell 12/01/2025 16:41
Looks like it might finally be running
partyblob 2
Avatar
ReJ aka Renaldas Zioma 12/01/2025 17:32
It is GREEN!
17:33
πŸ›³οΈ ship it! 🚒
Avatar
Noritsuna Imamura 12/01/2025 20:50
It's GREEN!
πŸŽ‰ 2
Avatar
Tim 'mithro' Ansell 12/01/2025 23:43
Looks like a bunch of the prechecks where able to be churned through last night...
Avatar
Avatar
urish
Do you need help with this?
Tim 'mithro' Ansell 12/02/2025 00:29
The permission issues are because I went a little overboard with the privilege separation. The website can't write any files, only the download workers can. Only the docker workers can start/stop docker containers, etc.
00:32
And of course when running everything locally it all just runs as your user.
00:34
I do regret not putting the webapp and workers in their own docker containers, so then a deploy is just a few docker commands, rather than waiting for ansible to run a whole bunch of SSH commands which seem to each be taking 1.5 minutes rather than the 1 second they should.
Avatar
We can finally join the it's green party too!
05:06
Avatar
Avatar
Tim 'mithro' Ansell
I do regret not putting the webapp and workers in their own docker containers, so then a deploy is just a few docker commands, rather than waiting for ansible to run a whole bunch of SSH commands which seem to each be taking 1.5 minutes rather than the 1 second they should.
1.5 minute to establish the connection or to push the container?
05:08
PDK_ROOT = /workspace/gf180mcu PDK = gf180mcuD Top cell: tt_gf_wrapper Die ID: FFFFFFFF I guess the DIE ID is not yet finalized?
Avatar
Avatar
urish
1.5 minute to establish the connection or to push the container?
Tim 'mithro' Ansell 12/02/2025 06:08
1.5m to run a command on the server using SSH (like cat /etc/hostname)
06:10
Seems like some type of weird interaction between SSH jump hosts, ansible gather facts and other things I don't quite understand yet.
Avatar
Long shot, but could be a DNS issue?
06:12
I've seen cases when failed DNS lookups really slow down systems
Avatar
Shall we also submit on platform? or test-platform is enough?
Avatar
Tim 'mithro' Ansell 12/02/2025 06:23
test-platform should be enough
πŸ‘ 1
Avatar
Avatar
urish
PDK_ROOT = /workspace/gf180mcu PDK = gf180mcuD Top cell: tt_gf_wrapper Die ID: FFFFFFFF I guess the DIE ID is not yet finalized?
Leo Moser (mole99) 12/02/2025 07:54
The die ID needs to be passed to the precheck by the online platform (#72). After this has been implemented, the precheck needs to run again. (@Tim 'mithro' Ansell)
Avatar
Avatar
Tim 'mithro' Ansell
The permission issues are because I went a little overboard with the privilege separation. The website can't write any files, only the download workers can. Only the docker workers can start/stop docker containers, etc.
Leo Moser (mole99) 12/02/2025 07:57
https://platform.wafer.space failed to download the file because of: Download Error: Download failed: [Errno 13] Permission denied: '/home/django/platform.wafer.space/wafer_space/media/projects/91a731b3-d0a9-4cc3-bb72-dbc57afca703' Is this because of the above?
Avatar
Avatar
Leo Moser (mole99)
https://platform.wafer.space failed to download the file because of: Download Error: Download failed: [Errno 13] Permission denied: '/home/django/platform.wafer.space/wafer_space/media/projects/91a731b3-d0a9-4cc3-bb72-dbc57afca703' Is this because of the above?
Tim 'mithro' Ansell 12/02/2025 08:58
Seems like that is fixed now, but yes.
Avatar
Avatar
Tim 'mithro' Ansell
Seems like that is fixed now, but yes.
Leo Moser (mole99) 12/02/2025 08:59
Yes, precheck is running now πŸ‘
Avatar
Tim 'mithro' Ansell 12/02/2025 09:01
I could probably up the number of concurrent prechecks on the platform server. 4 isn't giving test-platform any trouble and it is slower than the primary machine.
πŸ‘Œ 1
Avatar
Noritsuna Imamura 12/02/2025 18:34
Our GDS was all green in the production platform too.
πŸŽ‰ 2
Exported 80 message(s)
Timezone: UTC+0