JL iwn ddlZddlmZmZmZGddeZdZedk(reyy#e$rY-wxYw)N) DendrogramVectorSpaceClusterercosine_distancecLeZdZdZd dZd dZddZdZdZdZ d Z d Z d Z y) GAAClustereraM The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c. This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process. Nc\tj|||||_d|_d|_yN)r__init__ _num_clusters _dendrogram_groups_values)self num_clusters normalisesvd_dimensionss W/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/nltk/cluster/gaac.pyr zGAAClusterer.__init__s,%%dI~F)"c t|Dcgc]&}tj|tj(c}|_t j ||||Scc}wr )rnumpyarrayfloat64r rcluster)rvectorsassign_clusterstracevectors rrzGAAClusterer.cluster#sK%>E FFU[[ / F $++D'?ERR Gs+AcRt|}dg|z}|}tj|}||f}tj|ttj z}t |D]-} t | dz|D]} t|| || || | f</|t|jdkDrtj|j|\} } |rtd| | fz|j||| | tj |dd| f<tj || ddf<|| || z|| <|jj|| || |dz}|| dzdxxxdzccc||| <|t|jdkDr|j!|jy)N)dtypezmerging %d and %d)lenrarangeonesfloatinfrangermaxr unravel_indexargminprint_merge_similaritiesr mergeupdate_clusters) rrrN cluster_len cluster_count index_mapdimsdistijs rcluster_vectorspacez GAAClusterer.cluster_vectorspace*s LcAg  LLO 1vzz$e,uyy8q EA1q5!_ E,WQZDQT  E Ec$"4"4a88&&t{{}d;DAq)QF23  $ $T;1 =DAJDAJ)^k!n Q M a!eg ! # IaL)c$"4"4a88, T//0rcP||}||}||z}|d||f|z|d||f|zz|d||f<|d||fxx|zcc<|||dz|f|z||dz||f|zz|||dz|f<|||dzdf|z|||dzdf|zz|||dzdf<|||dzdfxx|zcc<y)Nr)rr2r.r3r4i_weightj_weight weight_sums rr*z GAAClusterer._merge_similaritiesPsq>q>( 2A2q5kH,tBQBE{X/EERaRU  RaRU z!  AEAI  )DQA,>,I I QA \ 1q57 +h6aQj9IH9TTQAZ QAZJ&rc|jj|}g|_|D]}t|dkDsJ|jr|j |d}nt j|d}|ddD](}|jr||j |z }$||z }*|t|z}|jj|t|j|_ yNrr) r groups _centroidsr _should_normalise _normaliserrappendr )rrclustersrcentroidrs rr,zGAAClusterer.update_clusterscs##**<8 -Gwr)rrbestr3rCr2s rclassify_vectorspacez!GAAClusterer.classify_vectorspaceus\t))* !Aq)H"684D4$q'>ay  ! Awrc|jS)zi :return: The dendrogram representing the current clustering :rtype: Dendrogram )r rs r dendrogramzGAAClusterer.dendrogram~s rc|jSr r rHs rrzGAAClusterer.num_clusterss!!!rc d|jzS)Nz*rKrHs r__repr__zGAAClusterer.__repr__s;d>P>PPPr)rTN)FF)F) __name__ __module__ __qualname____doc__r rr5r*r,rFrIrrMr7rrrrs7 # S$1L'&2$ "Qrrcddlm}ddgddgddgddgddgddgfDcgc]}tj|}}|d}|j |d}t d|t d |t d |t |j jtjddg}t d |zd t |j|t ycc}w)zO Non-interactive demonstration of the clusterers with simple 2-D data. r)rrTz Clusterer:z Clustered:zAs:z classify(%s): )endN) nltk.clusterrrrrr)rIshowclassify)rfr clustererrBrs rdemor]s **+AAAAAQRTUPV'WX!u{{1~XGXQI  $/H , " ,  % G![[!Q F /F ", )  V $% G%YsC-__main__) r ImportErrornltk.cluster.utilrrrrr]rNr7rrrasV POyQ'yQx : zF  s ,44