iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: http://phabricator.wikimedia.org/feed/?projectPHIDs=PHID-PROJ-533xcl4be2dyrmenjftl
Query: Advanced Search
Page MenuHomePhabricator
Feed Advanced Search

Sat, Nov 23

Andrew added a comment to T379927: Puppet removed "nameserver" line from /etc/resolv.conf.

Just found another VM where this happened: liwa3-2.linkwatcher.eqiad1.wikimedia.cloud

Sat, Nov 23, 1:39 PM · Patch-For-Review, Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team

Fri, Nov 22

Maintenance_bot removed a project from T380057: Keepalived Puppet module: Support IPv6: Patch-For-Review.
Fri, Nov 22, 4:31 PM · IPv6, Puppet
taavi closed T380057: Keepalived Puppet module: Support IPv6 as Resolved.
Fri, Nov 22, 4:26 PM · IPv6, Puppet
gerritbot added a comment to T380057: Keepalived Puppet module: Support IPv6.

Change #1091733 merged by Majavah:

[operations/puppet@production] keepalived::failover: Support IPv6

https://gerrit.wikimedia.org/r/1091733

Fri, Nov 22, 4:16 PM · IPv6, Puppet
gerritbot added a comment to T380057: Keepalived Puppet module: Support IPv6.

Change #1091732 merged by Majavah:

[operations/puppet@production] keepalived: Split failover config template to new class

https://gerrit.wikimedia.org/r/1091732

Fri, Nov 22, 4:16 PM · IPv6, Puppet

Tue, Nov 19

jcrespo added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

With BBU:

root@backup1012:~$ ./storcli64 show all J
{
"Controllers":[
{
        "Command Status" : {
                "CLI Version" : "007.3103.0000.0000 Aug 22, 2024",
                "Operating system" : "Linux 6.1.0-26-amd64",
                "Status Code" : 0,
                "Status" : "Success",
                "Description" : "None"
        },
        "Response Data" : {
                "Number of Controllers" : 1,
                "Host Name" : "backup1012",
                "Operating System " : "Linux 6.1.0-26-amd64",
                "System Overview" : [
                        {
                                "Ctl" : 0,
                                "Model" : "SAS3908",
                                "Ports" : 8,
                                "PDs" : 24,
                                "DGs" : 1,
                                "DNOpt" : 0,
                                "VDs" : 1,
                                "VNOpt" : 0,
                                "BBU" : "Opt",
                                "sPR" : "On",
                                "DS" : "1&2",
                                "EHS" : "Y",
                                "ASOs" : 4,
                                "Hlth" : "Opt"
                        }
                ],
                "ASO" : [
                        {
                                "Ctl" : 0,
                                "Cl" : "X",
                                "SAS" : "U",
                                "MD" : "U",
                                "R6" : "U",
                                "WC" : "U",
                                "R5" : "U",
                                "SS" : "U",
                                "FP" : "U",
                                "Re" : "X",
                                "CR" : "X",
                                "RF" : "X",
                                "CO" : "X",
                                "CW" : "X",
                                "HA" : "X",
                                "SSHA" : "X"
                        }
                ]
        }
}
]
}
Tue, Nov 19, 1:35 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
dcaro added a comment to T250622: Preserve formatting and comments etc. in ENC Hiera.

Currently not supported by the pyyaml https://github.com/yaml/pyyaml/issues/90

Tue, Nov 19, 10:31 AM · cloud-services-team, Cloud-VPS, Puppet

Sat, Nov 16

taavi renamed T250622: Preserve formatting and comments etc. in ENC Hiera from Preserve formatting etc. in horizon hiera editor to Preserve formatting and comments etc. in ENC Hiera.
Sat, Nov 16, 12:38 PM · cloud-services-team, Cloud-VPS, Puppet
taavi merged T287395: Feature request: persistent comments in Cloud Horizon hiera configuration into T250622: Preserve formatting and comments etc. in ENC Hiera.
Sat, Nov 16, 12:37 PM · cloud-services-team, Cloud-VPS, Puppet

Fri, Nov 15

taavi added a project to T380057: Keepalived Puppet module: Support IPv6: IPv6.
Fri, Nov 15, 2:32 PM · IPv6, Puppet
Maintenance_bot removed a project from T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool: Patch-For-Review.
Fri, Nov 15, 2:31 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
gerritbot added a comment to T380057: Keepalived Puppet module: Support IPv6.

Change #1091733 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] keepalived::failover: Support IPv6

https://gerrit.wikimedia.org/r/1091733

Fri, Nov 15, 2:15 PM · IPv6, Puppet
gerritbot added a project to T380057: Keepalived Puppet module: Support IPv6: Patch-For-Review.
Fri, Nov 15, 2:14 PM · IPv6, Puppet
gerritbot added a comment to T380057: Keepalived Puppet module: Support IPv6.

Change #1091732 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] keepalived: Split failover config template to new class

https://gerrit.wikimedia.org/r/1091732

Fri, Nov 15, 2:14 PM · IPv6, Puppet
gerritbot added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

Change #1091731 merged by Jcrespo:

[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups

https://gerrit.wikimedia.org/r/1091731

Fri, Nov 15, 2:10 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
gerritbot added a project to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool: Patch-For-Review.
Fri, Nov 15, 2:06 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
gerritbot added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

Change #1091731 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] backup: Move Dell bacula hosts to mediabackups

https://gerrit.wikimedia.org/r/1091731

Fri, Nov 15, 2:06 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
taavi created T380057: Keepalived Puppet module: Support IPv6.
Fri, Nov 15, 2:04 PM · IPv6, Puppet

Thu, Nov 14

gerritbot added a project to T379927: Puppet removed "nameserver" line from /etc/resolv.conf: Patch-For-Review.
Thu, Nov 14, 3:30 PM · Patch-For-Review, Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team
gerritbot added a comment to T379927: Puppet removed "nameserver" line from /etc/resolv.conf.

Change #1091249 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] resolvconf: don't update resolv.conf with 0 nameservers

https://gerrit.wikimedia.org/r/1091249

Thu, Nov 14, 3:30 PM · Patch-For-Review, Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team
fnegri closed T379927: Puppet removed "nameserver" line from /etc/resolv.conf as Resolved.

The issue is resolved, I created this task to track it in case it happens again.

Thu, Nov 14, 3:29 PM · Patch-For-Review, Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team
fnegri created T379927: Puppet removed "nameserver" line from /etc/resolv.conf.
Thu, Nov 14, 3:28 PM · Patch-For-Review, Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team

Tue, Nov 5

lmata edited projects for T370530: Clean up "git repo needs merge" checks, added: SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).
Tue, Nov 5, 5:11 PM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting

Mon, Oct 28

elukey triaged T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool as Medium priority.
Mon, Oct 28, 2:23 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations

Sat, Oct 26

D4n2016 added a comment to T60425: Mobile site does not automatically redirect to desktop version (and not possible to use browser "use desktop view").

As a workaround, is it possible to use a greasemonkey script (browser addon) on Android to make the "m." be removed from the URL when the user agent is set to desktop?

Sat, Oct 26, 10:34 AM · MobileFrontend (Tracking), Puppet, User-Jdlrobson

Oct 25 2024

Izno merged T378193: enabling a mobile browser's desktop mode should redirect user from mdot to desktop site into T60425: Mobile site does not automatically redirect to desktop version (and not possible to use browser "use desktop view").
Oct 25 2024, 7:09 PM · MobileFrontend (Tracking), Puppet, User-Jdlrobson
elukey added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

Summary of an IRC chat between me, Jayme and Matthew:

Oct 25 2024, 9:45 AM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations

Oct 23 2024

lmata moved T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool from Inbox to Radar on the observability board.
Oct 23 2024, 2:11 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
MatthewVernon added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

Perhaps relevantly, I was screenshotting the BMC storage page on another one of these hosts:

ms-be2082-disks.png (455×394 px, 34 KB)

Oct 23 2024, 9:01 AM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
jcrespo added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

perccli and storecli are not exactly the same either, existing script fails with:

Failed to execute ['/usr/local/lib/nagios/plugins/get-raid-status-perccli']: KeyError 'BBU_Info'
Oct 23 2024, 8:55 AM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
jcrespo added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

Nope, megacli doesn't work. That's the one option I tried first, before going into this rabbit hole. 0:-)

Oct 23 2024, 8:51 AM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
MatthewVernon added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

does megacli work? That's the tool (from the megacli package) that I use to interact with the RAID controller on the existing config-J systems (Dell).

Oct 23 2024, 8:41 AM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations

Oct 22 2024

jcrespo added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

perccli and storecli are not exactly the same either, existing script fails with:

Failed to execute ['/usr/local/lib/nagios/plugins/get-raid-status-perccli']: KeyError 'BBU_Info'
Oct 22 2024, 3:37 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
jcrespo added a comment to T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.

After testing on older hosts, storecli seems to work on older hosts from a different vendor, so either there is something broken with this host in particular (I would need a second new host with a RAID controller to check) or we should upgrade the cli to use this other tool, that seems to work for both vendors.

Oct 22 2024, 3:11 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
jcrespo updated the task description for T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.
Oct 22 2024, 2:46 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations
jcrespo created T377853: RAID monitoring on new hardware spec requires new or updated user space cli tool.
Oct 22 2024, 2:42 PM · DC-Ops, SRE-tools, observability, Puppet, Infrastructure-Foundations

Oct 11 2024

colewhite closed T251293: Facter is slow on a few hosts as Resolved.

Haven't seen widespread problems with export_smart_data_dump.service in the last 90d

Oct 11 2024, 6:28 PM · Infrastructure-Foundations, Puppet, SRE

Sep 24 2024

Maintenance_bot removed a project from T283585: Add additional prometheus metrics to puppet runs: Patch-For-Review.
Sep 24 2024, 12:31 PM · Infrastructure-Foundations, User-jbond, observability, Puppet
gerritbot added a comment to T283585: Add additional prometheus metrics to puppet runs.

Change #719100 abandoned by Muehlenhoff:

[operations/puppet@production] puppetmaster: puppet prometheus reporting

Reason:

No longer needed

https://gerrit.wikimedia.org/r/719100

Sep 24 2024, 12:22 PM · Infrastructure-Foundations, User-jbond, observability, Puppet

Sep 23 2024

Maintenance_bot removed a project from T374711: keyholder-proxy doesn't restart on config change: Patch-For-Review.
Sep 23 2024, 7:30 AM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations
gerritbot added a comment to T374711: keyholder-proxy doesn't restart on config change.

Change #1074358 merged by Muehlenhoff:

[operations/puppet@production] icinga: Enable profile::auto_restarts::service for keyholder-proxy

https://gerrit.wikimedia.org/r/1074358

Sep 23 2024, 6:55 AM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations

Sep 20 2024

gerritbot added a project to T374711: keyholder-proxy doesn't restart on config change: Patch-For-Review.
Sep 20 2024, 7:30 AM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations
gerritbot added a comment to T374711: keyholder-proxy doesn't restart on config change.

Change #1074358 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] icinga: Enable profile::auto_restarts::service for keyholder-proxy

https://gerrit.wikimedia.org/r/1074358

Sep 20 2024, 7:30 AM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations

Sep 18 2024

elukey added a comment to T374711: keyholder-proxy doesn't restart on config change.

I had a chat with Filippo, the keyholder-proxy is not the daemon that needs re-arming when restarted, so it can be done anytime without extra manual steps.

Sep 18 2024, 7:43 AM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations

Sep 16 2024

elukey added a project to T374711: keyholder-proxy doesn't restart on config change: User-Elukey.
Sep 16 2024, 2:12 PM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations
joanna_borun triaged T374711: keyholder-proxy doesn't restart on config change as Low priority.
Sep 16 2024, 2:11 PM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations

Sep 13 2024

fgiunchedi created T374711: keyholder-proxy doesn't restart on config change.
Sep 13 2024, 12:33 PM · User-Elukey, Puppet, Keyholder, SRE, Infrastructure-Foundations

Sep 10 2024

Maintenance_bot removed a project from T263578: puppetdb seems to be slow on host reimage: Patch-For-Review.
Sep 10 2024, 7:30 PM · User-jbond, Infrastructure-Foundations, Puppet
gerritbot added a comment to T263578: puppetdb seems to be slow on host reimage.

Change #724733 merged by JHathaway:

[operations/puppet@production] P:tlsproxy::instance: Drop numa_networking global

https://gerrit.wikimedia.org/r/724733

Sep 10 2024, 6:58 PM · User-jbond, Infrastructure-Foundations, Puppet

Sep 4 2024

joanna_borun triaged T309281: Remove prod-specific bits from cloud puppetmasters as Low priority.
Sep 4 2024, 2:19 PM · cloud-services-team, Puppet, Cloud-VPS

Aug 26 2024

joanna_borun triaged T371980: Puppet git::clone should default mode to 0644 (read-only) instead of 0755 as Low priority.
Aug 26 2024, 2:44 PM · Infrastructure-Foundations, Puppet, Release-Engineering-Team

Aug 7 2024

hashar created T371980: Puppet git::clone should default mode to 0644 (read-only) instead of 0755.
Aug 7 2024, 2:22 PM · Infrastructure-Foundations, Puppet, Release-Engineering-Team
hashar closed T338277: Puppet git::clone probably does not need `umask` parameter as Resolved.

The series of patch has led to the removal of umask from git::clone

Aug 7 2024, 1:44 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #927986 merged by Elukey:

[operations/puppet@production] git: remove umask from git::clone

https://gerrit.wikimedia.org/r/927986

Aug 7 2024, 12:56 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1056985 merged by Elukey:

[operations/puppet@production] cumin: set git::clone umask to match requested file mode

https://gerrit.wikimedia.org/r/1056985

Aug 7 2024, 8:35 AM · Patch-For-Review, Puppet, Release-Engineering-Team

Jul 29 2024

Maintenance_bot removed a project from T371221: Single member group breaks cross validation script: Patch-For-Review.
Jul 29 2024, 10:30 AM · Puppet
SLyngshede-WMF closed T371221: Single member group breaks cross validation script as Resolved.
Jul 29 2024, 9:55 AM · Puppet
gerritbot added a comment to T371221: Single member group breaks cross validation script.

Change #1057815 merged by Slyngshede:

[operations/puppet@production] P:openldap::management Unbreak cross-validate-accounts script.

https://gerrit.wikimedia.org/r/1057815

Jul 29 2024, 9:54 AM · Puppet
gerritbot added a project to T371221: Single member group breaks cross validation script: Patch-For-Review.
Jul 29 2024, 8:55 AM · Puppet
gerritbot added a comment to T371221: Single member group breaks cross validation script.

Change #1057815 had a related patch set uploaded (by Slyngshede; author: Slyngshede):

[operations/puppet@production] data.yaml Unbreak cross-validate-accounts script.

https://gerrit.wikimedia.org/r/1057815

Jul 29 2024, 8:54 AM · Puppet
SLyngshede-WMF created T371221: Single member group breaks cross validation script.
Jul 29 2024, 8:52 AM · Puppet

Jul 25 2024

gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1056985 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] cumin: set git::clone umask to match requested file mode

https://gerrit.wikimedia.org/r/1056985

Jul 25 2024, 4:53 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1056981 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] cumin: clone homer public repo with default parameters

https://gerrit.wikimedia.org/r/1056981

Jul 25 2024, 4:31 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #927986 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] git: remove umask from git::clone

https://gerrit.wikimedia.org/r/927986

Jul 25 2024, 3:49 PM · Patch-For-Review, Puppet, Release-Engineering-Team
fgiunchedi renamed T370530: Clean up "git repo needs merge" checks from Port or delete "git repo needs merge" icinga check to Clean up "git repo needs merge" checks.
Jul 25 2024, 12:21 PM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting
fgiunchedi added a comment to T370530: Clean up "git repo needs merge" checks.

Thank you @Clement_Goubert ! Makes sense to me, I'll adjust the task accordingly

Jul 25 2024, 12:14 PM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting
Clement_Goubert added a comment to T370530: Clean up "git repo needs merge" checks.

I think the mediawiki-config check is still needed since we're building the image with that repo copy on the deployment server.

Jul 25 2024, 10:35 AM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1056201 merged by Elukey:

[operations/puppet@production] puppetmaster: set git::clone umask to match requested file mode

https://gerrit.wikimedia.org/r/1056201

Jul 25 2024, 9:10 AM · Patch-For-Review, Puppet, Release-Engineering-Team

Jul 23 2024

gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1056201 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] puppetmaster: set git::clone umask to match requested file mode

https://gerrit.wikimedia.org/r/1056201

Jul 23 2024, 4:30 PM · Patch-For-Review, Puppet, Release-Engineering-Team

Jul 22 2024

gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054892 merged by Andrea Denisse:

[operations/puppet@production] grafana: clone grafana-grizzly with default parameters

https://gerrit.wikimedia.org/r/1054892

Jul 22 2024, 7:14 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054890 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openstack: remove OpenTofu git::clone file mode

https://gerrit.wikimedia.org/r/1054890

Jul 22 2024, 9:56 AM · Patch-For-Review, Puppet, Release-Engineering-Team

Jul 19 2024

fgiunchedi created T370530: Clean up "git repo needs merge" checks.
Jul 19 2024, 2:29 PM · SRE Observability (FY2024/2025-Q2), Puppet, MW-on-K8s, Observability-Alerting

Jul 18 2024

gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054889 merged by Btullis:

[operations/puppet@production] statistics: remove git::clone file mode

https://gerrit.wikimedia.org/r/1054889

Jul 18 2024, 9:23 AM · Patch-For-Review, Puppet, Release-Engineering-Team

Jul 17 2024

gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054892 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] grafana: clone grafana-grizzly with default parameters

https://gerrit.wikimedia.org/r/1054892

Jul 17 2024, 2:40 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054890 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] openstack: remove OpenTofu git::clone file mode

https://gerrit.wikimedia.org/r/1054890

Jul 17 2024, 2:26 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a project to T338277: Puppet git::clone probably does not need `umask` parameter: Patch-For-Review.
Jul 17 2024, 2:18 PM · Patch-For-Review, Puppet, Release-Engineering-Team
gerritbot added a comment to T338277: Puppet git::clone probably does not need `umask` parameter.

Change #1054889 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] statistics: remove git::clone file mode

https://gerrit.wikimedia.org/r/1054889

Jul 17 2024, 2:18 PM · Patch-For-Review, Puppet, Release-Engineering-Team

Jun 27 2024

jcrespo raised the priority of T367882: Possible weird interaction between es backups and puppet runs leading to failures from Low to Medium.

I got another error at backup2002 (es5):

2024-06-26 17:07:31 [ERROR] - Could not read data from enwiki.blobs_cluster27: Lost connection to MySQL server during query
Jun 27 2024, 7:23 AM · Data-Persistence, database-backups, Puppet

Jun 26 2024

jcrespo triaged T367882: Possible weird interaction between es backups and puppet runs leading to failures as Low priority.

This seems to not be reproducible, maybe it was related to cold caches after reboot? Lowering the priority as not happening again since, but wanting to trace it at some point.

Jun 26 2024, 3:18 PM · Data-Persistence, database-backups, Puppet

Jun 18 2024

Dzahn added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

How about adding a MAILTO to the timer and mail a specific list / team / group? I think that alerting via IRC is becoming less reliable and direct email would be more effective. (or even automatic ticket creation)

Jun 18 2024, 3:47 PM · Infrastructure-Foundations, Puppet
jcrespo updated the task description for T367882: Possible weird interaction between es backups and puppet runs leading to failures.
Jun 18 2024, 2:24 PM · Data-Persistence, database-backups, Puppet
jcrespo added projects to T367882: Possible weird interaction between es backups and puppet runs leading to failures: Puppet, database-backups, Data-Persistence.
Jun 18 2024, 2:22 PM · Data-Persistence, database-backups, Puppet
CDanis added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

I think the last step to do here is to validate that any rsync failures will get reported on IRC. Then we can consider all the immediate followups of this incident done, and more slowly continue on with the larger work at T367119: Install a default timeout for systemd::timer::jobs.

Jun 18 2024, 1:52 PM · Infrastructure-Foundations, Puppet

Jun 17 2024

CDanis triaged T367119: Install a default timeout for systemd::timer::jobs as Low priority.
Jun 17 2024, 3:09 PM · Infrastructure-Foundations, Puppet
joanna_borun assigned T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10) to CDanis.
Jun 17 2024, 3:08 PM · Infrastructure-Foundations, Puppet
joanna_borun edited projects for T367028: systemd-timer-mail-wrapper should not send mail as root@wikimedia.org from Cloud VPS, added: Puppet; removed Puppet-Core, Infrastructure-Foundations.
Jun 17 2024, 3:05 PM · Puppet, Cloud-VPS
MoritzMuehlenhoff added a comment to T367119: Install a default timeout for systemd::timer::jobs.

One other option: Add a separate wrapper define systemd::timer::job_capped which has the timeout as a mandatory argument (but without a default). And then reach out to SRE teams to migrate the jobs based what the respective uses cases needed.

Jun 17 2024, 2:57 PM · Infrastructure-Foundations, Puppet
CDanis added a comment to T367119: Install a default timeout for systemd::timer::jobs.

Alternatives to consider:

  • Make this a required field instead of adding a default [harder up-front but potentially safer]
  • Make omitting this field wmf puppet style guide violation [slower version of the above]
Jun 17 2024, 2:28 PM · Infrastructure-Foundations, Puppet

Jun 11 2024

Maintenance_bot removed a project from T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10): Patch-For-Review.
Jun 11 2024, 8:30 PM · Infrastructure-Foundations, Puppet
gerritbot added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Change #1041760 merged by CDanis:

[operations/puppet@production] puppetserver syncs: also add monitoring + timeout

https://gerrit.wikimedia.org/r/1041760

Jun 11 2024, 8:07 PM · Infrastructure-Foundations, Puppet
gerritbot added a project to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10): Patch-For-Review.
Jun 11 2024, 7:57 PM · Infrastructure-Foundations, Puppet
gerritbot added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Change #1041760 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] puppetserver syncs: also add monitoring + timeout

https://gerrit.wikimedia.org/r/1041760

Jun 11 2024, 7:57 PM · Infrastructure-Foundations, Puppet
Maintenance_bot removed a project from T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10): Patch-For-Review.
Jun 11 2024, 7:31 PM · Infrastructure-Foundations, Puppet
gerritbot added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Change #1041217 merged by CDanis:

[operations/puppet@production] enable monitoring+logging for puppetmaster syncs

https://gerrit.wikimedia.org/r/1041217

Jun 11 2024, 7:30 PM · Infrastructure-Foundations, Puppet

Jun 10 2024

CDanis created T367119: Install a default timeout for systemd::timer::jobs.
Jun 10 2024, 8:56 PM · Infrastructure-Foundations, Puppet
gerritbot added a project to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10): Patch-For-Review.
Jun 10 2024, 8:21 PM · Infrastructure-Foundations, Puppet
gerritbot added a comment to T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).

Change #1041217 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] enable monitoring+logging for puppetmaster syncs

https://gerrit.wikimedia.org/r/1041217

Jun 10 2024, 8:21 PM · Infrastructure-Foundations, Puppet
CDanis updated subscribers of T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).
Jun 10 2024, 8:19 PM · Infrastructure-Foundations, Puppet
CDanis updated the task description for T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).
Jun 10 2024, 8:17 PM · Infrastructure-Foundations, Puppet
CDanis created T367113: Puppetmaster volatile data not synced to all puppet frontends for a month and a half (2024-04-27 to 2024-06-10).
Jun 10 2024, 7:25 PM · Infrastructure-Foundations, Puppet