Blog · LRI AEM-60DC8

Case study: monitoring a 48 V rectifier in a Tier III edge data center

How a regional operator of Tier III edge data centers instrumented 48 V rectifiers and a LiFePO4 bank with AEM-60DC8, retired phantom alarms and avoided a battery crash — anonymized illustrative case study.

LRI EngineeringMon May 25 2026 21:00:00 GMT-0300 (Brasilia Standard Time)

Transparency notice: this is an anonymized illustrative case study, built from recurring market patterns observed across edge data center operators in Brazil. It does not correspond to a specific customer. The MTBF, MTTR, avoided cost numbers and the 50 mV cell drift cited are illustrative values based on typical orders of magnitude in the sector; they must be replaced with real field data when applied to an identifiable customer. The goal of this material is to demonstrate the technical architecture and engineering reasoning, not to claim results from a real-world case.

A Brazilian regional operator running 12 Tier III edge sites in the Southeast lived with a generic 48 V rectifier alarm that came back every six months with no identified root cause. The on-call engineer would drive to the site, see the panel blinking OK, do a precautionary power-cycle and drive back. The internal hypothesis was cell drift in the LiFePO4 bank, but nobody had granular data to prove it. This case study reconstructs, in a Situation–Task–Action–Result format, how two AEM-60DC8 units per technical room, integrated into the existing DCIM, turned a blind alarm into planned predictive maintenance.

Situation

The operator runs 12 Tier III edge sites across inland São Paulo, the Paraíba Valley and southern Minas Gerais. Each site has one to three technical rooms, all built to the same template:

4 racks of critical equipment (edge routers, ToR switches, CDN cache servers, security appliances).
Two 48 V rectifiers in parallel redundancy (N+1), each rated 200 A at -48 V DC.
LiFePO4 battery bank sized for 30 min autonomy at full load, 16 cells in series (51.2 V nominal, 48.0–57.6 V range) and a proprietary BMS.
Existing DCIM (Schneider EcoStruxure IT or Vertiv LIFE Services) with a Modbus TCP gateway per room.
Remote NOC 24/7 in São Paulo capital, regional on-call squad, contractual 4-hour SLA for on-site arrival.

The rectifiers expose only binary alarms to the DCIM through dry contacts (AC fail, DC low, overtemp, general fault), with no granular reading of bus voltage or current. The bank's BMS exposes SOC and aggregated pack voltage over Modbus, but no per-cell real-time voltage.

The problem

Every five to seven months, in three of the twelve sites, the DCIM fired the alarm "General Fault — Rectifier". No further context. The on-call playbook said:

Connect via VPN and check DCIM telemetry.
The alarm was binary — nothing beyond the bit being on.
Try remote reset. In half the cases, the bit cleared within 30 to 90 minutes. In the other half, it stayed active.
If persistent, the on-call drove to the site — average travel time 2.5 hours.
On site, the technician found a normal panel: bus voltage OK, balanced current, healthy SOC. Rectifier power-cycle, alarm gone.

Over two years, the team logged 37 occurrences across 12 sites. In 23 the technician went on site without finding a fault — wasted trip. In 4 a cell out of range was identified after manual sweep. In 1 the bank crashed under discharge during an 18-minute outage, taking the site down for 4 minutes until AC came back — a customer-reportable incident.

Suspicion pointed to cell drift: cells losing capacity or showing transient imbalance during fast recharge after an AC cut. But the BMS did not expose per-cell readings in real time, and its internal log was hard to extract. With no granular data, there was no way to prove the hypothesis.

The task

Scope validated with operations and finance:

Instrument the DC bus and critical cells without replacing rectifier or bank — room shutdown vetoed by SLA.
Integrate with the existing DCIM (EcoStruxure or Vertiv LIFE), without a second screen for the NOC.
Target cost per room: up to BRL 8,000 (~USD 1,500) in hardware (illustrative).
Timeline: pilot in 30 days, rollout to the remaining 11 sites within 6 months.
Success criterion: cut wasted trips by at least 60% in the first half-year after full rollout.

The hardest constraint was integration without rip-and-replace. Replacing the rectifier meant 4 to 6 hours of shutdown and SLA renegotiation. Replacing the bank meant six-figure CAPEX per site. Engineering needed a parallel instrumentation layer, reading what the rectifier and BMS already delivered to the physical bus, without entering the power path.

The solution

The approved architecture was: two AEM-60DC8 units per room, in complementary roles.

Unit A — output bus of rectifiers A+B. Connected to the main -48 V bus that feeds the DC strip of the racks. It measures rectifier A output (channel 1), rectifier B output (channel 2), common bus after the OR diode (channel 3), and voltage before and after the main DC breaker (channels 4 and 5). Three channels left for future expansion.

Unit B — LiFePO4 battery bank. Connected to eight tap points sampling strategic cells: 1, 2, 3, 4 (negative end, historically more drift-prone), 8, 9 (center, average baseline) and 15, 16 (positive end). A pragmatic compromise that covers both ends and the center, well within the 0–60 V range (each LiFePO4 cell sits between 2.5 V and 3.65 V).

Communication topology: the two AEM-60DC8 units sit on their own RS-485 bus, dedicated to instrumentation, physically separated from the BMS bus. Modbus addresses 1 (bus) and 2 (bank), 19200 bps, even parity. A Modbus RTU to Modbus TCP gateway (industrial, DIN, 24 V auxiliary supply) bridges to the management LAN. The DCIM consumes via Modbus TCP — on EcoStruxure through a generic driver, on Vertiv LIFE through the existing OPC UA integrator.

Why two and not one? Failure domain separation. A short in the bank during a measurement must not bring down the main bus measurement. Channel-to-channel isolation provides electrical safety; physically splitting into two units adds architectural safety.

Polling and alarms configured

Polling tuned to extract relevant signal without saturating the RS-485 or DCIM history:

Variable	Polling rate	Local retention	Alarm
-48 V bus voltage (channels 1–3 of Unit A)	1 Hz	30 days	±2% of nominal (47.0 V to 49.0 V in float)
Voltage before/after DC breaker (channels 4–5 of Unit A)	0.2 Hz (5 s)	30 days	Differential > 0.5 V indicates degraded breaker
Sampled per-cell voltage (8 channels of Unit B)	0.5 Hz (2 s)	30 days	Drift > 30 mV from the median of sampled cells
Drift trend (aggregate calculation)	1 sample/min	90 days in DCIM	Sustained drift > 20 mV for 10 minutes

The 30-day persistent log on the AEM-60DC8 itself (firmware v1.03) keeps history available even with the TCP gateway down. Escalation in three levels:

Warning (amber): drift 20–30 mV on one cell. Notifies the shift, does not page.
Alarm (red): drift > 30 mV sustained for 10 min, or bus outside ±2%. Pages on-call.
Critical (red + page): drift > 50 mV or rectifier differential > 0.8 V. Pages on-call and supervisor.

Rule of thumb: do not wake anyone for data that can wait until tomorrow. Drift evolves over hours, not minutes.

What the monitoring revealed

Direct result, in the first month after pilot instrumentation: during the second fast-recharge event after an AC cut (an 8-minute outage caused by utility maintenance), cell 16 showed a transient drift of approximately 50 mV relative to the median of the other sampled cells (illustrative value; in a real deployment, record the measured value).

The behavior was specific: near-zero drift in normal float, growing drift during the first 4 minutes of high-current recharge, partial recovery upon entering float. The clinical pattern of a cell with elevated internal resistance — invisible in BMS aggregate readings, invisible in binary rectifier alarms, invisible to visual inspection.

With the data, engineering opened an RMA before the cell failed under real discharge. Replacement was scheduled in a planned maintenance window, with the site running on a single rectifier momentarily. No on-call paged, no wasted trip, no end customer impacted.

In the following months, the system identified equivalent patterns in 2 of the 12 sites, all resolved during scheduled maintenance (illustrative numbers).

Lessons learned

Five lessons applicable to any similar project:

Separate the measurement bus from the power bus. Keeping instrumentation RS-485 on its own bus simplifies troubleshooting and isolates cabling faults.
Redundancy in critical spots. Two units with separate domains (bus × bank) cost more than one, but prevent one fault from collapsing observability of the entire site.
Do not poll too aggressively. Bus voltage at 10 Hz adds nothing that 1 Hz cannot provide; LiFePO4 drift evolves on a minute scale.
Graduated alarms save nights of sleep. Three-level escalation reduces on-call fatigue and keeps focus on what matters.
Local persistence matters. Thirty days of log on the AEM-60DC8 itself guarantees that a gateway failure does not erase history. The log is the source of truth.

A sixth lesson, cultural: a rectifier's binary alarm is technical debt disguised as a finished product. Parallel instrumentation complements what the rectifier should always have reported.

KPIs before and after

Numbers are illustrative, 18 months pre × 12 months post-rollout across the 12 sites. In a real project, draw them from the customer's DCIM history and on-call tickets.

KPI	Before (baseline)	After (12 months)	Change
MTBF between "General Fault" alarms	~6 months per site	no rootless occurrences	n/a — alarm retired
MTTR for critical DC event	4 h 20 min (travel + diagnosis)	35 min (remote diagnosis, planned maintenance)	-87%
Wasted trips/year (12 sites)	23	2	-91%
Customer-reportable incidents per bank	1 (discharge crash)	0	-100%
Recovered energy (kWh/year via crash prevention)	n/a	~180 kWh (illustrative)	—
Avoided travel cost (BRL/year, fleet)	—	~BRL 95,000 (illustrative)	—
Total instrumentation CAPEX (12 sites)	—	~BRL 95,000 (illustrative)	payback ~12 months

Important: BRL 95,000 and 180 kWh are illustrative orders of magnitude. In a real disclosure with an identified customer, replace them with the customer's actual figures.

Replicability

Consolidated checklist to replicate the architecture:

Survey electrical topology: rectifier nominal, bank chemistry and capacity, BMS Modbus exposure.
Confirm measured voltages stay within 0–60 V on all channels (including equalization and transients).
Verify the existing DCIM: native Modbus TCP driver, OPC UA gateway need, simultaneous tag limit.
Define measurement points: main bus, each rectifier output, strategic cells (ends + center).
Specify industrial Modbus RTU→TCP gateway with 24 V auxiliary supply and redundant power.
Reserve a dedicated RS-485 bus for instrumentation, separated from the BMS bus.
Configure Modbus addresses before physical installation (1 = bus, 2 = bank — standardize across the fleet).
Validate galvanic isolation suitable for the environment (5 kV for sites exposed to atmospheric surges).
Define polling rates: bus at 1 Hz, cells at 0.5 Hz, 30-day persistent log.
Define three alarm levels (amber warning / red alarm / critical page) calibrated with the NOC team.
Document the runbook: what to do on drift of 20, 30, 50 mV on a cell.
Installation window: 2 to 4 hours with the rectifier in single mode momentarily, no load drop.
Post-installation validation: 72 hours of continuous monitoring before retiring the old generic alarm.
Periodic calibration: yearly field offset verification against a reference multimeter.
Bank end-of-life: replacement trigger at sustained drift > 40 mV or loss of usable SOC > 20%.

FAQ

Why two AEM-60DC8 units per room instead of a single 16-channel device?

Failure domain separation. A single device would work electrically, but concentrates risk: one communication or power fault collapses observability for the whole room. Two devices with distinct Modbus addresses preserve partial observability even if one unit fails.

Does the measurement interfere with the power bus?

No. The AEM-60DC8 measures at high impedance (megaohms), drawing negligible current. Installation as a parallel tap, without interrupting the power path.

Does this monitoring replace the bank's BMS?

No. The BMS remains responsible for protection, active balancing and critical-condition shutdown. The AEM-60DC8 acts as an independent observability layer, exposing granular data to the DCIM in real time — something many proprietary BMS units simply do not provide.

What happens if the Modbus TCP gateway goes down?

The 30-day persistent log inside the AEM-60DC8 itself (firmware v1.03) preserves history. When the gateway returns, the DCIM retrieves retroactive data via the history block in the Modbus map.

Are these numbers (MTBF, MTTR, avoided cost) real?

They are illustrative values. In a real replication, draw them from the customer's DCIM history and on-call tickets. The direction (fewer wasted trips, lower MTTR, retired binary alarms) tends to repeat; magnitude varies.