Just found another VM where this happened: liwa3-2.linkwatcher.eqiad1.wikimedia.cloud
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Sat, Nov 23
Fri, Nov 22
Change #1091733 merged by Majavah:
[operations/puppet@production] keepalived::failover: Support IPv6
Change #1091732 merged by Majavah:
[operations/puppet@production] keepalived: Split failover config template to new class
Tue, Nov 19
With BBU:
root@backup1012:~$ ./storcli64 show all J { "Controllers":[ { "Command Status" : { "CLI Version" : "007.3103.0000.0000 Aug 22, 2024", "Operating system" : "Linux 6.1.0-26-amd64", "Status Code" : 0, "Status" : "Success", "Description" : "None" }, "Response Data" : { "Number of Controllers" : 1, "Host Name" : "backup1012", "Operating System " : "Linux 6.1.0-26-amd64", "System Overview" : [ { "Ctl" : 0, "Model" : "SAS3908", "Ports" : 8, "PDs" : 24, "DGs" : 1, "DNOpt" : 0, "VDs" : 1, "VNOpt" : 0, "BBU" : "Opt", "sPR" : "On", "DS" : "1&2", "EHS" : "Y", "ASOs" : 4, "Hlth" : "Opt" } ], "ASO" : [ { "Ctl" : 0, "Cl" : "X", "SAS" : "U", "MD" : "U", "R6" : "U", "WC" : "U", "R5" : "U", "SS" : "U", "FP" : "U", "Re" : "X", "CR" : "X", "RF" : "X", "CO" : "X", "CW" : "X", "HA" : "X", "SSHA" : "X" } ] } } ] }
Currently not supported by the pyyaml https://github.com/yaml/pyyaml/issues/90
Sat, Nov 16
Fri, Nov 15
Change #1091733 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] keepalived::failover: Support IPv6
Change #1091732 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] keepalived: Split failover config template to new class
Change #1091731 merged by Jcrespo:
[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups
Change #1091731 had a related patch set uploaded (by Jcrespo; author: Jcrespo):
[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups
Thu, Nov 14
Change #1091249 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] resolvconf: don't update resolv.conf with 0 nameservers
The issue is resolved, I created this task to track it in case it happens again.
Tue, Nov 5
Mon, Oct 28
Sat, Oct 26
As a workaround, is it possible to use a greasemonkey script (browser addon) on Android to make the "m." be removed from the URL when the user agent is set to desktop?
Oct 25 2024
Summary of an IRC chat between me, Jayme and Matthew:
Oct 23 2024
In T377853#10251006, @jcrespo wrote:perccli and storecli are not exactly the same either, existing script fails with:
Failed to execute ['/usr/local/lib/nagios/plugins/get-raid-status-perccli']: KeyError 'BBU_Info'
Nope, megacli doesn't work. That's the one option I tried first, before going into this rabbit hole. 0:-)
does megacli work? That's the tool (from the megacli package) that I use to interact with the RAID controller on the existing config-J systems (Dell).
Oct 22 2024
perccli and storecli are not exactly the same either, existing script fails with:
Failed to execute ['/usr/local/lib/nagios/plugins/get-raid-status-perccli']: KeyError 'BBU_Info'
After testing on older hosts, storecli seems to work on older hosts from a different vendor, so either there is something broken with this host in particular (I would need a second new host with a RAID controller to check) or we should upgrade the cli to use this other tool, that seems to work for both vendors.
Oct 11 2024
Haven't seen widespread problems with export_smart_data_dump.service in the last 90d
Sep 24 2024
Change #719100 abandoned by Muehlenhoff:
[operations/puppet@production] puppetmaster: puppet prometheus reporting
Reason:
No longer needed
Sep 23 2024
Change #1074358 merged by Muehlenhoff:
[operations/puppet@production] icinga: Enable profile::auto_restarts::service for keyholder-proxy
Sep 20 2024
Change #1074358 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):
[operations/puppet@production] icinga: Enable profile::auto_restarts::service for keyholder-proxy
Sep 18 2024
I had a chat with Filippo, the keyholder-proxy is not the daemon that needs re-arming when restarted, so it can be done anytime without extra manual steps.
Sep 16 2024
Sep 13 2024
Sep 10 2024
Change #724733 merged by JHathaway:
[operations/puppet@production] P:tlsproxy::instance: Drop numa_networking global
Sep 4 2024
Aug 26 2024
Aug 7 2024
The series of patch has led to the removal of umask from git::clone
Change #927986 merged by Elukey:
[operations/puppet@production] git: remove umask from git::clone
Change #1056985 merged by Elukey:
[operations/puppet@production] cumin: set git::clone umask to match requested file mode
Jul 29 2024
Change #1057815 merged by Slyngshede:
[operations/puppet@production] P:openldap::management Unbreak cross-validate-accounts script.
Change #1057815 had a related patch set uploaded (by Slyngshede; author: Slyngshede):
[operations/puppet@production] data.yaml Unbreak cross-validate-accounts script.
Jul 25 2024
Change #1056985 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] cumin: set git::clone umask to match requested file mode
Change #1056981 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] cumin: clone homer public repo with default parameters
Change #927986 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] git: remove umask from git::clone
Thank you @Clement_Goubert ! Makes sense to me, I'll adjust the task accordingly
I think the mediawiki-config check is still needed since we're building the image with that repo copy on the deployment server.
Change #1056201 merged by Elukey:
[operations/puppet@production] puppetmaster: set git::clone umask to match requested file mode
Jul 23 2024
Change #1056201 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] puppetmaster: set git::clone umask to match requested file mode
Jul 22 2024
Change #1054892 merged by Andrea Denisse:
[operations/puppet@production] grafana: clone grafana-grizzly with default parameters
Change #1054890 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: remove OpenTofu git::clone file mode
Jul 19 2024
Jul 18 2024
Change #1054889 merged by Btullis:
[operations/puppet@production] statistics: remove git::clone file mode
Jul 17 2024
Change #1054892 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] grafana: clone grafana-grizzly with default parameters
Change #1054890 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] openstack: remove OpenTofu git::clone file mode
Change #1054889 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] statistics: remove git::clone file mode
Jun 27 2024
I got another error at backup2002 (es5):
2024-06-26 17:07:31 [ERROR] - Could not read data from enwiki.blobs_cluster27: Lost connection to MySQL server during query
Jun 26 2024
This seems to not be reproducible, maybe it was related to cold caches after reboot? Lowering the priority as not happening again since, but wanting to trace it at some point.
Jun 18 2024
How about adding a MAILTO to the timer and mail a specific list / team / group? I think that alerting via IRC is becoming less reliable and direct email would be more effective. (or even automatic ticket creation)
I think the last step to do here is to validate that any rsync failures will get reported on IRC. Then we can consider all the immediate followups of this incident done, and more slowly continue on with the larger work at T367119: Install a default timeout for systemd::timer::jobs.
Jun 17 2024
One other option: Add a separate wrapper define systemd::timer::job_capped which has the timeout as a mandatory argument (but without a default). And then reach out to SRE teams to migrate the jobs based what the respective uses cases needed.
Alternatives to consider:
- Make this a required field instead of adding a default [harder up-front but potentially safer]
- Make omitting this field wmf puppet style guide violation [slower version of the above]
Jun 11 2024
Change #1041760 merged by CDanis:
[operations/puppet@production] puppetserver syncs: also add monitoring + timeout
Change #1041760 had a related patch set uploaded (by CDanis; author: CDanis):
[operations/puppet@production] puppetserver syncs: also add monitoring + timeout
Change #1041217 merged by CDanis:
[operations/puppet@production] enable monitoring+logging for puppetmaster syncs
Jun 10 2024
Change #1041217 had a related patch set uploaded (by CDanis; author: CDanis):
[operations/puppet@production] enable monitoring+logging for puppetmaster syncs