Matin Zarei

Data Detective

Coffee-Fueled Coder

Insight Hunter

Cloud-Ready Analyst

Storyteller with Data

Big Data Whisperer

Refactor Survivor

Matin Zarei

Data Detective

Coffee-Fueled Coder

Insight Hunter

Cloud-Ready Analyst

Storyteller with Data

Big Data Whisperer

Refactor Survivor

Signal Name Standardization

This project improves SCADA signal naming consistency by combining semantic similarity detection with structured parsing. First, we use Sentence-BERT to generate embeddings of raw signal names and calculate pairwise similarities. High-similarity pairs (e.g., >0.90) are flagged to identify duplicates or inconsistently named signals, reducing the need for full manual review.

In parallel, we developed a custom parser that deconstructs each raw name into meaningful components—such as Region, Site Type, Asset Levels, and Signal Type—based on known patterns and domain-specific keywords. This structured breakdown helps analysts understand each signal’s context and supports further standardization efforts across large datasets.