How to Modify BeagleBone-AI64 to run memtester with maximum memory to run the tests on
What are we talking about? github.com/jna... for the very excellent memtester - the friend of all embedded engineers..
Quick video and write up since someone was asking me how to make sure we can run memtester.. Feel free to comment and improve - memory corruptions can happen due to many many many reasons - rarely, very rarely do real problems in physical DDR (layout or phy or DDR memory show up) - so hold on to the socks the first time you see a kernel oops on null pointer - you most probably have a driver bug...) anyways.. for the handfull of other instances.. maybe this might help..,
* Make sure that there is no rproc firmware (like on ai64 j7 firmware) in /lib/firmware (usually soft links and so) that will get loaded by remote proc and reserve DDR - GPUs or Display, camera are few other examples of memory hogs.. good drivers see that the device is not physically connected and wont allocate, but you never know.. just get rid of them (reduce the variables of what you are debugging)
* also see how much memory you can free up to run the memtester with.. so you'd rather not have the GUI, systemd launching off stuff etc.. essentially drop straight to a "no frills" shell = So modify bootargs (example in this video is extlinux.conf - if you use uEnv.txt or what ever the flavor is, there is always some means to set kernel bootargs) - just add init=/bin/sh to drop straight to shell
* You also want to log the terminal - memtester does try to continue after seeing a fail.. so make sure to log the console - esp if your console (most default consoles do) have a traceback (or history) limit of how much old data it will let you scroll back.
* Scope of test - you want to maximize your test and boards - and if possible conditions (if that is a need).
* Duration of test - make sure you run multiple loops and on as many boards (good and bad) and long duration - we typically do runs over weeks - just in case..
NOTES
1. There is usually oom (out of memory ) killer in kernel that goes around kills "rouge" processes that go and greedy with DDR.. so you also need to factor that what free -h says is not something linux kernel can actually commit (aka what memtester can use)...
2. This video was a quick little way to get going.. If you are really hitting this, there are more stuff you should really be doing if you are on your board - example: drop as much carve outs and CMA pool and boot a bare kernel (just a CPU, uart, ram, storage - better to run memtester from mmc or usb or what ever since last thing you want is a cramfs poking out 100MB of space from your ability to test DDR).
3. Reminder again - this is not fool proof either - dry solders in production have happened, they have created issues running memtester (warm package expands - dry solder gets connected..) power off the board and start --- ooops.. memtester will fail if you run it fast enough... memtester runs at userspace memory - it tests everything - so if you have processor creating a problem... cache line issues or over/undervoltage problems... or if dvfs transition is done with freq vs voltage sequencing wrong (real world is always the best - but very hard to reproduce to a point of being able to track down)... meh... tons of possibilities can escape and can get caught - start from contolled or configurations that you have control over and grow over from there.
Anyways.. Good luck .... but remember, you are'nt alone in this journey and if this is your first attempt of memtester, if you love embedded, this wont be your last time either ..
Негізгі бет memtester test procedure
Пікірлер