`L i dZddlZddlmZmZmZmZddlmZm Z ddl Z ddl m Z ddlmZddlmZmZmZmZdd lmZdd lmZdd lmZmZdd lmZdd lmZddl m!Z!m"Z"m#Z#m$Z$ddl%m&Z&ddl'm(Z(m)Z)ddl*m+Z,ddl-m.Z.dZ/dZ0e$dggde"eddddgdgddddddd Z1 d+d!Z2d"Z3d#Z4d$Z5e6e1e3e4e5%Z7d&Z8Gd'd(eeZ9Gd)d*ee.e9Z:y),z}Hierarchical Agglomerative Clustering These routines perform some hierarchical agglomerative clustering of some input data. N)heapifyheappopheappush heappushpop)IntegralReal)sparse)connected_components) BaseEstimatorClassNamePrefixFeaturesOutMixin ClusterMixin _fit_context)DistanceMetric)METRIC_MAPPING64)_VALID_METRICSpaired_distances) check_array) IntFloatDict) HasMethodsInterval StrOptionsvalidate_params)_fix_connected_components) check_memory validate_data)_hierarchical_fast)AgglomerationTransformc|jd}|jd|k7s|jd|k7r%td|jd|j||jz}tj|stj |}|j dk7r|j}t|\}}|dkDr+tjd|zdt|||||d }||fS) aR Fixes the connectivity matrix. The different steps are: - copies it - makes it symmetric - converts it to LIL if necessary - completes it if necessary. Parameters ---------- X : array-like of shape (n_samples, n_features) Feature matrix representing `n_samples` samples to be clustered. connectivity : sparse matrix, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. The matrix is assumed to be symmetric and only the upper triangular half is used. Default is `None`, i.e, the Ward algorithm is unstructured. affinity : {"euclidean", "precomputed"}, default="euclidean" Which affinity to use. At the moment `precomputed` and ``euclidean`` are supported. `euclidean` uses the negative squared Euclidean distance between points. Returns ------- connectivity : sparse matrix The fixed connectivity matrix. n_connected_components : int The number of connected components in the graph. rrz%Wrong shape for connectivity matrix: z when X is lilzxthe number of connected components of the connectivity matrix is %d > 1. Completing it to avoid stopping the tree early.r  stacklevel connectivity)Xgraphn_connected_componentscomponent_labelsmetricmode) shape ValueErrorTr issparse lil_matrixformattolilr warningswarnr)r%r$affinity n_samplesr'labelss d/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/sklearn/cluster/_agglomerative.py_fix_connectivityr8.s F I! )\-?-?-Bi-O!!177 ,   ,..0L ??< (((6 e##))+ &:,%G"F!  ')? @  1#9#   / //cddlm}|jtjd}tj |j jj}||j |j dk(<||j}|j}d|j |j |k(<tj|j|j|j gj} | tj| jddd d f} t!j"| } | d d d dfjt$} tj&|tj(} t+| |D]&\} \}}|| |k\rn||kr| | |<||ks"| | |<(|r| d d df}| ||| |fS| ||| fS) z Perform single linkage clustering on sparse data via the minimum spanning tree from scipy.sparse.csgraph, then using union-find to label. The parent array is then generated by walking through the tree. r)minimum_spanning_treeFcopydtyper mergesortkindN)scipy.sparse.csgraphr;astypenpfloat64finfodatar?epstocsrtocoovstackrowcolr-argsort _hierarchical_single_linkage_labelintarangeintp enumerate)r$r5n_nodes n_clustersr'return_distancer; epsilon_valuemst mst_arraysingle_linkage_tree children_parentileftright distancess r7_single_linkage_treerczs; &&rzz&>LHH<#4#4#:#:;??M0=Ll''1,- 2 2 4 5C ))+C+,CHHSXX &' 377CGGSXX6799I"**Y[[^+FIJI(==iH#ArrE*11#6IYYwbgg .F%i;=D%  !a7l  '>F4L 7?F5M '1- 0)VYNN ,i ??r9 array-like)rd sparse matrixNr`closedboolean)r%r$rWrXTprefer_skip_nested_validationFr$rWrXc Z tj|}|jdk(rtj|d}|j\}}|ddlm}|tjddtj|d }|j|}|ddddfjtj}|r|dddf} |d|d| fS|d|dfSt||d \}} | d|zdz } n||kDrtd ||fzd|z|z } g} g} g}t|j D]Z\}}|j#||Dcgc] }||ks | }}| j%t'||gz| j%|\tj(| tjd} tj(| tjd} tj*| d}d|d|tj*| |fd}||d|tj,t'| tj.d}t1j2||| | |t5t7|| | }t9|tj:| tj}tj<| t>}g}|rtj,| |z } tj,| t>d}tA|| D]} tC|\}}}||r||rn||c||<||<|j#||fdx||<||<|r| ||z <||||z||<||||z||<g} |jEdd||<t1jF||| ||t1jF||| ||| Dcgc]}||j#|c}|j#| tj(| tjd} tj,| jtjd} | jE|t'| }tj,|tj.d}t1j2||| | |tA|Dcgc]}tI||||| |fc}|} |D!cgc] }!|!ddd }}!tj(|}|rtjJd z} || | || fS|| | |fScc}wcc}wcc}wcc}!w)a@Ward clustering based on a Feature matrix. Recursively merges the pair of clusters that minimally increases within-cluster variance. The inertia matrix uses a Heapq-based representation. This is the structured version, that takes into account some topological structure between samples. Read more in the :ref:`User Guide `. Parameters ---------- X : array-like of shape (n_samples, n_features) Feature matrix representing `n_samples` samples to be clustered. connectivity : {array-like, sparse matrix}, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. The matrix is assumed to be symmetric and only the upper triangular half is used. Default is None, i.e, the Ward algorithm is unstructured. n_clusters : int, default=None `n_clusters` should be less than `n_samples`. Stop early the construction of the tree at `n_clusters.` This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. In this case, the complete tree is not computed, thus the 'children' output is of limited use, and the 'parents' output should rather be used. This option is valid only when specifying a connectivity matrix. return_distance : bool, default=False If `True`, return the distance between the clusters. Returns ------- children : ndarray of shape (n_nodes-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_[i - n_samples]`. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node `n_samples + i`. n_connected_components : int The number of connected components in the graph. n_leaves : int The number of leaves in the tree. parents : ndarray of shape (n_nodes,) or None The parent of each node. Only returned when a connectivity matrix is specified, elsewhere 'None' is returned. distances : ndarray of shape (n_nodes-1,) Only returned if `return_distance` is set to `True` (for compatibility). The distances between the centers of the nodes. `distances[i]` corresponds to a weighted Euclidean distance between the nodes `children[i, 1]` and `children[i, 2]`. If the nodes refer to leaves of the tree, then `distances[i]` is their unweighted Euclidean distance. Distances are updated in the following way (from scipy.hierarchy.linkage): The new entry :math:`d(u,v)` is computed as follows, .. math:: d(u,v) = \sqrt{\frac{|v|+|s|} {T}d(v,s)^2 + \frac{|v|+|t|} {T}d(v,t)^2 - \frac{|v|} {T}d(s,t)^2} where :math:`u` is the newly joined cluster consisting of clusters :math:`s` and :math:`t`, :math:`v` is an unused cluster in the forest, :math:`T=|v|+|s|+|t|`, and :math:`|*|` is the cardinality of its argument. This is also known as the incremental algorithm. Examples -------- >>> import numpy as np >>> from sklearn.cluster import ward_tree >>> X = np.array([[1, 2], [1, 4], [1, 0], ... [4, 2], [4, 4], [4, 0]]) >>> children, n_connected_components, n_leaves, parents = ward_tree(X) >>> children array([[0, 1], [3, 5], [2, 6], [4, 7], [8, 9]]) >>> n_connected_components 1 >>> n_leaves 6 rrNr hierarchyPartial build of the tree is implemented only for structured clustering (i.e. with explicit connectivity). The algorithm will build the full tree and only retain the lower branches required for the specified number of clustersr r"W) requirements euclideanr4z]Cannot provide more clusters than samples. %i n_clusters was asked, and there are %i samples.C)r?order)rwr>Frng@)&rEasarrayndimreshaper+ scipy.clusterrpr2r3requirewardrDrTr8r,rUrowsappendextendlenarrayzerosemptyrFrPcompute_ward_distlistziprrSonesboolrangerfill _get_parentsrsqrt)"r%r$rWrXr5 n_featuresrpoutr]rbr'rV coord_row coord_colAindrMr_ moments_1 moments_2inertiar^ used_nodechildren not_visitedkinertjrN n_additionsiniidxn_leavescs" r7 ward_treersZ 1 Avv{ JJq' "GGIz+  ! MM;  JJqs +nnQ2A2J%%bgg. AqD IaD); ;aD0 0+< <+,(L(i-!#  !()45  i-*,II Al//0 S  )QSq)) H   "''=I"''=I,IIjy':.c:IIjyhhs9~RZZsCG##Iy)YPWX3w 956G GYYwbgg .Ft,IHHHWy01 ((7$c:K9g &"Y!'*KE1a| !  !!q 6!9A&++ ! y| ',Ia)m $!|il2 !  |il2 !   A""1Q4FKH""1Q4FKH%./c3q / HHYbggSA HHY__BGG3G q)n hh{"**C@'' 9iTWXFK;EWXc'CHa38 9XE"YJH!)*A$B$*H*xx!HGGC)O, /69LL/6AA_*n 0 Y +s T T7T$T# T(c  %tj|}|jdk(rtj|d}|j\}}t j t jdd} ||} |dk(r6tjtj|d r td |d d l m } |tjd d|dk(rf|jd |jdk7rtd|jtj |jd d\} } || | f}nP|dk(rd}nH|dvrd}nAt#|r6||}tj |jd d\} } || | f}|dk(r|dk7rt#|s|t$vrt'j(|}tj*|tj,}t j.||}|tj0|j2ddddf}t j4|}n| j7|||}|ddddfj9t:d}|r|dddf}|d|d|fS|d|dfSt=|||\}}|j?}|j@|jBk7}|j@||_ |jB||_!|jD||_"~|dk(r;||j@|jBfj9tjFd}n(tI||j@||jB|}||_"| d|zdz }n||ksJd|z|z }|dk(rtK||||||S|rtjL||z }tjL|tN}tQ}|jS}tUtW|jD|jXD]\%\}}t[tj|tj\tj|tjF|%<|j_%fd tW||D~ta|tjb|tj\}tjd|tj\}g}tg||D]} ti|}||jjr||jlrn+|jj} |jl} |r|jn|||z <|x|| <|| <|jq| | f|| }|| } || z||<dx|| <|| <| || || ||| }!|!D];\}"}#||"jq||#ts|t jt|#||"=|!||<d x|| <|| <|}$tjv|ddddd!f}|r|||$||fS|||$|fS#t$r&} td|jd|d| d} ~ wwxYw)"a Linkage agglomerative clustering based on a Feature matrix. The inertia matrix uses a Heapq-based representation. This is the structured version, that takes into account some topological structure between samples. Read more in the :ref:`User Guide `. Parameters ---------- X : array-like of shape (n_samples, n_features) Feature matrix representing `n_samples` samples to be clustered. connectivity : sparse matrix, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. The matrix is assumed to be symmetric and only the upper triangular half is used. Default is `None`, i.e, the Ward algorithm is unstructured. n_clusters : int, default=None Stop early the construction of the tree at `n_clusters`. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. In this case, the complete tree is not computed, thus the 'children' output is of limited use, and the 'parents' output should rather be used. This option is valid only when specifying a connectivity matrix. linkage : {"average", "complete", "single"}, default="complete" Which linkage criteria to use. The linkage criterion determines which distance to use between sets of observation. - "average" uses the average of the distances of each observation of the two sets. - "complete" or maximum linkage uses the maximum distances between all observations of the two sets. - "single" uses the minimum of the distances between all observations of the two sets. affinity : str or callable, default='euclidean' Which metric to use. Can be 'euclidean', 'manhattan', or any distance known to paired distance (see metric.pairwise). return_distance : bool, default=False Whether or not to return the distances between the clusters. Returns ------- children : ndarray of shape (n_nodes-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_[i - n_samples]`. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node `n_samples + i`. n_connected_components : int The number of connected components in the graph. n_leaves : int The number of leaves in the tree. parents : ndarray of shape (n_nodes, ) or None The parent of each node. Only returned when a connectivity matrix is specified, elsewhere 'None' is returned. distances : ndarray of shape (n_nodes-1,) Returned when `return_distance` is set to `True`. distances[i] refers to the distance between children[i][0] and children[i][1] when they are merged. See Also -------- ward_tree : Hierarchical clustering with ward linkage. rrmN)completeaveragesinglez1Unknown linkage option, linkage should be one of z, but z was givencosine)axisz;Cosine affinity cannot be used when X contains zero vectorsrrorqr r" precomputedz6Distance matrix should be square, got matrix of shape )rl2rt)l1 manhattan cityblockrr>r@rA)methodr)Fr<ru)r)c3^K|]$\}}|ks tj||&ywN)rP WeightedEdge).0rdrs r7 zlinkage_tree..s2 6:aPQTWPWM & &q#q 1 s --rn)A#tn     GYYwbgg .Frww/IH9g &7#D Ytvv%6 FF FF '+{{Ia)m $ !!q F1IAllSy ! &++ ! y|adAaD)S#>  EFC cFMM!Q  Wm88AsC D  E !!qt?DHxx!!TrT'*H/69LL +Xv ==e ##%w 0  s.X!! Y*!Y  Yc"d|d<t|i|S)Nrrrargskwargss r7_complete_linkagers"F9  ( ((r9c"d|d<t|i|S)Nrrrrs r7_average_linkagers!F9  ( ((r9c"d|d<t|i|S)Nrrrrs r7_single_linkagers F9  ( ((r9)r}rrrc~||kDrtd|d|dt|ddz g}t|dz D].}||d |z }t||d t ||d 0t j |t j}t|D] \}}||tj| ||<"|S)aFunction cutting the ward tree for a given number of clusters. Parameters ---------- n_clusters : int or ndarray The number of clusters to form. children : ndarray of shape (n_nodes-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_[i - n_samples]`. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node `n_samples + i`. n_leaves : int Number of leaves of the tree. Returns ------- labels : array [n_samples] Cluster labels for each point. z+Cannot extract more clusters than samples: z% clusters were given for a tree with z leaves.rnrrr>) r,maxrrrrErrTrUrP_hc_get_descendent) rWrrnodes_these_childrenlabelr_nodes r7_hc_cutrs0H 9l?z S  8B< 1$ % &E :> "/!58)h"67**+EN1--. / HHXRWW -EU#O4MNm..uhIJO Lr9c @eZdZUdZeeddddgeeedhze ge e ddgdd e dged hd geee jgeed dddgd gd Zeed< ddddd dddddZedddZdZdfd ZxZS)AgglomerativeClusteringa Agglomerative Clustering. Recursively merges pair of clusters of sample data; uses linkage distance. Read more in the :ref:`User Guide `. Parameters ---------- n_clusters : int or None, default=2 The number of clusters to find. It must be ``None`` if ``distance_threshold`` is not ``None``. metric : str or callable, default="euclidean" Metric used to compute the linkage. Can be "euclidean", "l1", "l2", "manhattan", "cosine", or "precomputed". If linkage is "ward", only "euclidean" is accepted. If "precomputed", a distance matrix is needed as input for the fit method. If connectivity is None, linkage is "single" and affinity is not "precomputed" any valid pairwise distance metric can be assigned. For an example of agglomerative clustering with different metrics, see :ref:`sphx_glr_auto_examples_cluster_plot_agglomerative_clustering_metrics.py`. .. versionadded:: 1.2 memory : str or object with the joblib.Memory interface, default=None Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory. connectivity : array-like, sparse matrix, or callable, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from `kneighbors_graph`. Default is ``None``, i.e, the hierarchical clustering algorithm is unstructured. For an example of connectivity matrix using :class:`~sklearn.neighbors.kneighbors_graph`, see :ref:`sphx_glr_auto_examples_cluster_plot_agglomerative_clustering.py`. compute_full_tree : 'auto' or bool, default='auto' Stop early the construction of the tree at ``n_clusters``. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be ``True`` if ``distance_threshold`` is not ``None``. By default `compute_full_tree` is "auto", which is equivalent to `True` when `distance_threshold` is not `None` or that `n_clusters` is inferior to the maximum between 100 or `0.02 * n_samples`. Otherwise, "auto" is equivalent to `False`. linkage : {'ward', 'complete', 'average', 'single'}, default='ward' Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. - 'ward' minimizes the variance of the clusters being merged. - 'average' uses the average of the distances of each observation of the two sets. - 'complete' or 'maximum' linkage uses the maximum distances between all observations of the two sets. - 'single' uses the minimum of the distances between all observations of the two sets. .. versionadded:: 0.20 Added the 'single' option For examples comparing different `linkage` criteria, see :ref:`sphx_glr_auto_examples_cluster_plot_linkage_comparison.py`. distance_threshold : float, default=None The linkage distance threshold at or above which clusters will not be merged. If not ``None``, ``n_clusters`` must be ``None`` and ``compute_full_tree`` must be ``True``. .. versionadded:: 0.21 compute_distances : bool, default=False Computes distances between clusters even if `distance_threshold` is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead. .. versionadded:: 0.24 For an example of dendrogram visualization, see :ref:`sphx_glr_auto_examples_cluster_plot_agglomerative_dendrogram.py`. Attributes ---------- n_clusters_ : int The number of clusters found by the algorithm. If ``distance_threshold=None``, it will be equal to the given ``n_clusters``. labels_ : ndarray of shape (n_samples) Cluster labels for each point. n_leaves_ : int Number of leaves in the hierarchical tree. n_connected_components_ : int The estimated number of connected components in the graph. .. versionadded:: 0.21 ``n_connected_components_`` was added to replace ``n_components_``. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 children_ : array-like of shape (n_samples-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_[i - n_samples]`. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node `n_samples + i`. distances_ : array-like of shape (n_nodes-1,) Distances between nodes in the corresponding place in `children_`. Only computed if `distance_threshold` is used or `compute_distances` is set to `True`. See Also -------- FeatureAgglomeration : Agglomerative clustering but for features instead of samples. ward_tree : Hierarchical clustering with ward linkage. Examples -------- >>> from sklearn.cluster import AgglomerativeClustering >>> import numpy as np >>> X = np.array([[1, 2], [1, 4], [1, 0], ... [4, 2], [4, 4], [4, 0]]) >>> clustering = AgglomerativeClustering().fit(X) >>> clustering AgglomerativeClustering() >>> clustering.labels_ array([1, 1, 1, 0, 0, 0]) For a comparison of Agglomerative clustering with other clustering algorithms, see :ref:`sphx_glr_auto_examples_cluster_plot_cluster_comparison.py` rNr`rfrcacherdreautorhr)rWr)memoryr$compute_full_treerdistance_thresholdcompute_distances_parameter_constraintsrtr}F)r)rr$rrrrct||_||_||_||_||_||_||_||_yr)rWrrr$rrr)r) selfrWr)rr$rrrrs r7__init__z AgglomerativeClustering.__init__s@%"4 (!2  !2r9Tric@t||d}|j|S)aFit the hierarchical clustering from features, or distance matrix. Parameters ---------- X : array-like, shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``metric='precomputed'``. y : Ignored Not used, present here for API consistency by convention. Returns ------- self : object Returns the fitted instance. r )ensure_min_samples)r_fit)rr%ys r7fitzAgglomerativeClustering.fits & $a 8yy|r9ct|j}|jdu|jduz s t d|j|j s t d|j dk(r'|jdk7rt |jdt|j }|j}|j5t|jr|j|}t|gd}t|}|j }|jd }|d k(r+|jd }n|jtd d |zk}|j}|rd}i}|j dk7r|j |d <|j|d<|j} | duxs |j} |j||f||| d|} | dd\|_|_|_} | r | d|_|j+t)j*|j&| k\dz|_n|j|_|r2t/|j,|j |j$|_|St3j4| d} t)j6| d|} t)j8t)j:| | |_|S)ahFit without validation Parameters ---------- X : ndarray of shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``metric='precomputed'``. Returns ------- self : object Returns the fitted instance. Nz_Exactly one of n_clusters and distance_threshold has to be set, and the other needs to be None.zT>T(fll<(  %!+    SV QS O5t~w !"gDO  " " .  4F!FG!K   $D  "4#3#3T^^T^^TDL #//eDFWWVJY/0F??299V+*m_< =   7+T2%$G(&2I>s>#6#6#89:;'afEtL'[ $D  3 3*56*dL))r9rc `eZdZUdZeeddddgeeedhze ge e ddgdd e dged hd geee jge geed dddgd gd Zeed< ddddd dej&dddfd Zeddfd ZedZxZS)FeatureAgglomerationaAgglomerate features. Recursively merges pair of clusters of features. Refer to :ref:`sphx_glr_auto_examples_cluster_plot_feature_agglomeration_vs_univariate_selection.py` for an example comparison of :class:`FeatureAgglomeration` strategy with a univariate feature selection strategy (based on ANOVA). Read more in the :ref:`User Guide `. Parameters ---------- n_clusters : int or None, default=2 The number of clusters to find. It must be ``None`` if ``distance_threshold`` is not ``None``. metric : str or callable, default="euclidean" Metric used to compute the linkage. Can be "euclidean", "l1", "l2", "manhattan", "cosine", or "precomputed". If linkage is "ward", only "euclidean" is accepted. If "precomputed", a distance matrix is needed as input for the fit method. .. versionadded:: 1.2 memory : str or object with the joblib.Memory interface, default=None Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory. connectivity : array-like, sparse matrix, or callable, default=None Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from `kneighbors_graph`. Default is `None`, i.e, the hierarchical clustering algorithm is unstructured. compute_full_tree : 'auto' or bool, default='auto' Stop early the construction of the tree at `n_clusters`. This is useful to decrease computation time if the number of clusters is not small compared to the number of features. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be ``True`` if ``distance_threshold`` is not ``None``. By default `compute_full_tree` is "auto", which is equivalent to `True` when `distance_threshold` is not `None` or that `n_clusters` is inferior to the maximum between 100 or `0.02 * n_samples`. Otherwise, "auto" is equivalent to `False`. linkage : {"ward", "complete", "average", "single"}, default="ward" Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion. - "ward" minimizes the variance of the clusters being merged. - "complete" or maximum linkage uses the maximum distances between all features of the two sets. - "average" uses the average of the distances of each feature of the two sets. - "single" uses the minimum of the distances between all features of the two sets. pooling_func : callable, default=np.mean This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument `axis=1`, and reduce it to an array of size [M]. distance_threshold : float, default=None The linkage distance threshold at or above which clusters will not be merged. If not ``None``, ``n_clusters`` must be ``None`` and ``compute_full_tree`` must be ``True``. .. versionadded:: 0.21 compute_distances : bool, default=False Computes distances between clusters even if `distance_threshold` is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead. .. versionadded:: 0.24 Attributes ---------- n_clusters_ : int The number of clusters found by the algorithm. If ``distance_threshold=None``, it will be equal to the given ``n_clusters``. labels_ : array-like of (n_features,) Cluster labels for each feature. n_leaves_ : int Number of leaves in the hierarchical tree. n_connected_components_ : int The estimated number of connected components in the graph. .. versionadded:: 0.21 ``n_connected_components_`` was added to replace ``n_components_``. n_features_in_ : int Number of features seen during :term:`fit`. .. versionadded:: 0.24 feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during :term:`fit`. Defined only when `X` has feature names that are all strings. .. versionadded:: 1.0 children_ : array-like of shape (n_nodes-1, 2) The children of each non-leaf node. Values less than `n_features` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_features` is a non-leaf node and has children `children_[i - n_features]`. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node `n_features + i`. distances_ : array-like of shape (n_nodes-1,) Distances between nodes in the corresponding place in `children_`. Only computed if `distance_threshold` is used or `compute_distances` is set to `True`. See Also -------- AgglomerativeClustering : Agglomerative clustering samples instead of features. ward_tree : Hierarchical clustering with ward linkage. Examples -------- >>> import numpy as np >>> from sklearn import datasets, cluster >>> digits = datasets.load_digits() >>> images = digits.images >>> X = np.reshape(images, (len(images), -1)) >>> agglo = cluster.FeatureAgglomeration(n_clusters=32) >>> agglo.fit(X) FeatureAgglomeration(n_clusters=32) >>> X_reduced = agglo.transform(X) >>> X_reduced.shape (1797, 32) rNr`rfrrrdrerrhr) rWr)rr$rr pooling_funcrrrrtr}F)r)rr$rrrrrc Bt ||||||||| ||_y)N)rWrr$rrr)rr)rrr) rrWr)rr$rrrrrr s r7rzFeatureAgglomeration.__init__s9 !%/1/  )r9Tricxt||d}t| |j|j|_|S)aaFit the hierarchical clustering on the data. Parameters ---------- X : array-like of shape (n_samples, n_features) The data. y : Ignored Not used, present here for API consistency by convention. Returns ------- self : object Returns the transformer. r )ensure_min_features)rrrr-r_n_features_outr s r7rzFeatureAgglomeration.fits6" $q 9  QSS#// r9ct)zAFit and return the result of each sample's clustering assignment.)AttributeError)rs r7rz FeatureAgglomeration.fit_predict2s r9r r)r r rrrrrrrrrrrrrrrrrEmeanrrrpropertyrrrs@r7rr_sPf !T&A4H s>*m_< =   7+T2%$G(&2I>s>#6#6#89:;! 'afEtL'[ $D ") WW)256*r9r)NNrrtF);rr2heapqrrrrnumbersrrnumpyrEscipyr rCr baser r rrmetricsrmetrics._dist_metricsrmetrics.pairwiserrutilsrutils._fast_dictrutils._param_validationrrrr utils.graphrutils.validationrrrrP_feature_agglomerationrr8rcrrrrrrrrrrr9r7r1s 99"5 %4?+ 4:2: I0X5@x^=!T&A4H%;  #'"&$gBgBZ   Q>j) ) )      ,dQ)lMQ)h V#%;=TVr9