abelio
Valued Contributor

FAZ 1000D - diskIO 100%

Hello everybody

 

A customer FAZ 1000D running 6.0.8 is showing a permanent 100% disk I/O usage penalizing the whole operation since a couple of days

( widget screenshot attached)

 

Could anyone please share some strategies to attack the issue?

Who would be good candidates to searching  for?

 

The output for 'exec top' CLI command is showing the usual postgres process on top as another FAZes, but we couldn't find any disk related process causing this behaviour

 

Thanks for your advice in advance,

 

 

 

regards


__ Abel

4 REPLIES 4
ede_pfau
Esteemed Contributor III

hi abelio,

 

the FAZ might be stuck on rebuilding the index files. Would you dare to run

execute sql-local rebuild-db
in an attempt to rebuild the sql db from scratch? It's nearly like formatting the disks in boot mode but preserving firmware and settings.

 

BTW, the widget just looks like my own workload these days...


Ede

"Kernel panic: Aiee, killing interrupt handler!"
abelio
Valued Contributor

Hello Ede,

Thanks for your comment.

Indeed, but i was noticed that finished a couple of weeks ago.

# diagnose sql status rebuild-db 
Rebuilding log SQL database accomplished on Sat May 23 07:20:53 2020

 

On the other hand, almost 50 devices,  near 10GB logs/day, no ADoms, not a big deal for an 1000D All seems normal, with the exception of this widget and   penalized reports using week/month logs

 

Some daemon to restart or to look into? 

 

 

 

 

 

regards


__ Abel

ede_pfau
Esteemed Contributor III

Hi Abelio,

 

I guess a disk error would lead to indefinite attempts to read or write, causing a deadlock.

When you reboot the FAZ, get into the boot menu and have a look which options for formatting the log disks you have. Or all disks, meaning you would have to reload the firmware via tftp.

Hopefully, all settings have been saved previously.

 

If even that measure would not lead to normal disk load it might be time to submit a service call for RMA.


Ede

"Kernel panic: Aiee, killing interrupt handler!"
abelio
Valued Contributor

Hi Ede, thanks once again

If so, shouldn't there be another evidence for such hardware fail elsewhere?

 

I mean, (I'm not arguing) just exploring the scenario before take more drastic actions.

 

diag sys raid hwinfo

diag sys raid status

 

shouldn't show things like 'errors'  or 'degraded' ?

 

best regards

 

 

regards


__ Abel