Instance Overlap Measures

Instance-level measures quantify overlap between classes at the data point level.

N3 - Error Rate of Nearest Neighbor

Measures how often nearest neighbors have different class labels.

cm = ComplexityMeasures(X, y)
n3 = cm.calculate_N3()
print(f"N3: {n3:.4f}")

Interpretation: - 0.0: No overlap, perfect separation - 0.1-0.3: Moderate overlap - > 0.3: High overlap

Best for: Quick assessment of overall overlap

N4 - Non-linearity of Nearest Neighbor

Measures non-linearity by interpolating between nearest neighbors.

n4 = cm.calculate_N4()

Interpretation: - Low: Linear boundary - High: Non-linear, complex boundary

kDN - k-Disagreeing Neighbors

Fraction of instances with disagreeing neighbors.

kdn = cm.calculate_kDN(k=5)

Parameters: - k: Number of neighbors (default: 5)

Interpretation: - Low: Clear class regions - High: Mixed neighborhoods

CM - Class Imbalance Metric

Measures class imbalance combined with overlap.

cm_score = cm.calculate_CM()

Interpretation: - Higher values indicate more severe imbalance with overlap

R-value - Overlap Region Size

Estimates the size of the overlap region.

r_value = cm.calculate_R_value()

Interpretation: - 0.0: No overlap - 1.0: Complete overlap

D3 - Disjunct Class Percentage

Percentage of instances in disjunct regions.

d3 = cm.calculate_D3()

Interpretation: - High: Many isolated instances - Low: Cohesive class regions

SI - Silhouette Index

Measures how well instances fit their class cluster.

si = cm.calculate_SI()

Interpretation: - 1.0: Perfect clustering - 0.0: Overlapping clusters - -1.0: Misclassified instances

Borderline - Borderline Instance Ratio

Fraction of instances near the class boundary.

borderline = cm.calculate_borderline()

Interpretation: - High: Many boundary instances (difficult) - Low: Clear separation

Degree of Overlap

Overall degree of class overlap.

overlap = cm.calculate_degree_of_overlap()