L iADdZddlmZddlZddlmZddlmZddlZddlZ ddl m Z ddl m Z erddlmZddlZddlZndd lmZed Zed Ze eZdd ZGd dej.j0ZGddZ d ddZy)aNotations in this Gaussian process implementation X_train: Observed parameter values with the shape of (len(trials), len(params)). y_train: Observed objective values with the shape of (len(trials), ). x: (Possibly batched) parameter value(s) to evaluate with the shape of (..., len(params)). cov_fX_fX: Kernel matrix X = V[f(X)] with the shape of (len(trials), len(trials)). cov_fx_fX: Kernel matrix Cov[f(x), f(X)] with the shape of (..., len(trials)). cov_fx_fx: Kernel scalar value x = V[f(x)]. This value is constant for the Matern 5/2 kernel. cov_Y_Y_inv: The inverse of the covariance matrix (V[f(X) + noise_var])^-1 with the shape of (len(trials), len(trials)). cov_Y_Y_inv_Y: `cov_Y_Y_inv @ y` with the shape of (len(trials), ). max_Y: The maximum of Y (Note that we transform the objective values such that it is maximized.) sqd: The squared differences of each dimension between two points. is_categorical: A boolean array with the shape of (len(params), ). If is_categorical[i] is True, the i-th parameter is categorical. ) annotationsN)Any) TYPE_CHECKING)*single_blas_thread_if_scipy_v1_15_or_newer) get_logger)Callable) _LazyImportscipytorchctj|}tj|r|Stjdtj |d}tj |tj|tjtj||tjddtj|tjtj||tj ddS)NzDClip non-finite values to the min/max finite values for GP fittings.r)axis) npisfiniteallwarningswarnanyclipwheremininfmax)valuesis_values_finite is_any_finites S/mnt/ssd/data/python-lab/Trading/venv/lib/python3.12/site-packages/optuna/_gp/gp.pywarn_and_convert_infr/s{{6* vv  MMXYFF+!4M 77 rxx0@&"&&'QXY Z\_` rxx0@&266''RYZ []`a c0eZdZeddZeddZy)Matern52Kernelctjd|z}tj| }|d|z|zdzz}d|dzz|z}|j||S)a This method calculates `exp(-sqrt5d) * (1/3 * sqrt5d ** 2 + sqrt5d + 1)` where `sqrt5d = sqrt(5 * squared_distance)`. Please note that automatic differentiation by PyTorch does not work well at `squared_distance = 0` due to zero division, so we manually save the derivative, i.e., `-5/6 * (1 + sqrt5d) * exp(-sqrt5d)`, for the exact derivative calculation. Notice that the derivative of this function is taken w.r.t. d**2, but not w.r.t. d. g?g)r sqrtexpsave_for_backward)ctxsquared_distancesqrt5dexp_partvalderivs rforwardzMatern52Kernel.forward@shA 00199fW%5$44v=ABFQJ'(2 e$ rc(|j\}||zS)z Let x be squared_distance, f(x) be forward(ctx, x), and g(f) be a provided function, then deriv := df/dx, grad := dg/df, and deriv * grad = df/dx * dg/df = dg/dx. ) saved_tensors)r(gradr-s rbackwardzMatern52Kernel.backwardSs $$t|rN)r(rr) torch.Tensorreturnr3)r(rr1r3r4r3)__name__ __module__ __qualname__ staticmethodr.r2rrr!r!?s($rr!ceZdZ d dZed dZd dZ d d dZdddZddZ ddZ y) GPRegressorc||_||_||_|jd|jdz j |_|jj rT|j d|jfdkDjtj|j d|jf<d|_ d|_ ||_ ||_ ||_y)N.r)_is_categorical_X_train_y_train unsqueezesquare__squared_X_diffrtyper float64 _cov_Y_Y_chol_cov_Y_Y_inv_Yinverse_squared_lengthscales kernel_scale noise_var)selfis_categoricalX_trainy_trainrIrJrKs r__init__zGPRegressor.__init__^s .  ' 1 1" 58I8I"8M MVVX    # # %$$S$*>*>%>?#Ed5==!  d&:&:!: ;3737,H)("rc~dtj|jjj z S)Ng?)rr%rIdetachnumpy)rLs r length_scaleszGPRegressor.length_scalesvs.RWWT>>EEGMMOPPPrc|j |jJdtj5|j j j }dddtj|jjdxx|jjz cc<tjj|}tjj!|j"tjj!||j$j dd}tj&||_tj&||_|j(j |_d|j(_|j,j |_d|j,_|jj |_ d|j_y#1swYxYw)Nz(Cannot call cache_matrix more than once.rT)lowerF)rGrHr no_gradkernelrRrSr diag_indicesr@shaperKitemlinalgcholeskyr solve_triangularTrA from_numpyrIr1rJ)rLcov_Y_Y cov_Y_Y_chol cov_Y_Y_inv_Ys r _cache_matrixzGPRegressor._cache_matrixzs    &4+>+>+F 6 5 6 F ]]_ 5kkm**,224G 5  3 3A 678DNN.r)rDr@ndimrBrCr?rrEr rFmatmulrIr!applyrJ)rLX1X2sqdsqdists rrXzGPRegressor.kernels :: :&&Cz]] ggl27 R0@2<##F+d.?.???rc Z|j |jJd|jdk(}|s|n|jd}tj j |j|x}|j}tj j|jtj j|jj|dddd}|rb|rJd|j||}||j|jdd z } | jd d jd n@|j}|tj j ||z } | jd |r"|jd| jdfS|| fS) a) This method computes the posterior mean and variance given the points `x` where both mean and variance tensors will have the shape of x.shape[:-1]. If ``joint=True``, the joint posterior will be computed. The posterior mean and variance are computed as: mean = cov_fx_fX @ inv(cov_fX_fX + noise_var * I) @ y, and var = cov_fx_fx - cov_fx_fX @ inv(cov_fX_fX + noise_var * I) @ cov_fx_fX.T. Please note that we clamp the variance to avoid negative values due to numerical errors. z+Call cache_matrix before calling posterior.r$rTF)upperleftz3Call posterior with joint=False for a single point.r=)dim1dim2r)rGrHrfrBr r\vecdotrXr^r_rg transposediagonal clamp_min_rJsqueeze) rLxjointis_single_pointx_ cov_fx_fXmeanV cov_fx_fxvar_s r posteriorzGPRegressor.posteriors{    *t/B/B/N 9 8 9 N&&A+%Q1;;q>||"" B#?9ATATU LL ) )    LL ) )$*<*<*>*> QU\a ) b *  & ](] ]& B+Iqxx (;(;B(CDDD MMrM + 6 6s ;))Iu||229a@@D OOC 5D Qa1V4QU,Vrc>|jjd}d|ztjdtjzz}|j |j tj|tjzz}tjj|}|jjj }tjj||jdddfddddf}d||zz}||z|zS)a This method computes the marginal log-likelihood of the kernel hyperparameters given the training dataset (X, y). Assume that N = len(X) in this method. Mathematically, the closed form is given as: -0.5 * log((2*pi)**N * det(C)) - 0.5 * y.T @ inv(C) @ y = -0.5 * log(det(C)) - 0.5 * y.T @ inv(C) @ y + const, where C = cov_Y_Y = cov_fX_fX + noise_var * I and inv(...) is the inverse operator. We exploit the full advantages of the Cholesky decomposition (C = L @ L.T) in this method: 1. The determinant of a lower triangular matrix is the diagonal product, which can be computed with N flops where log(det(C)) = log(det(L.T @ L)) = 2 * log(det(L)). 2. Solving linear system L @ u = y, which yields u = inv(L) @ y, costs N**2 flops. Note that given `u = inv(L) @ y` and `inv(C) = inv(L @ L.T) = inv(L).T @ inv(L)`, y.T @ inv(C) @ y is calculated as (inv(L) @ y) @ (inv(L) @ y). In principle, we could invert the matrix C first, but in this case, it costs: 1. 1/3*N**3 flops for the determinant of inv(C). 2. 2*N**2-N flops to solve C @ alpha = y, which is alpha = inv(C) @ y. Since the Cholesky decomposition costs 1/3*N**3 flops and the matrix inversion costs 2/3*N**3 flops, the overall cost for the former is 1/3*N**3+N**2+N flops and that for the latter is N**3+2*N**2-N flops. rgdtypeNF)rn)r@rZmathlogpirXrKr eyerFr\r]rusumr^rA)rLn_pointsconstraL logdet_partinv_L_y quad_parts rmarginal_log_likelihoodz#GPRegressor.marginal_log_likelihoods4==&&q)x$((1tww;"77++-$..599XU]]3["[[ LL ! !' *zz|'')--// ,,//4==D3IQV/WXY[\X\]Gg-. U"Y..rc  jjd tjtjj j jtjjjtjjjdzz gg}d  fd }t5tjj||ddd|i}dddjst!d|j"t%j&|j(}t%j*|d _t%j*| _r%t%j,t$j. nt%j*| dzz_ j1S#1swYxYw) Nr$gGz?ctj|jd}tj5tj|d_tj|_r%tjtjntj|dzz_ j z }|j|jdz}r|dk(sJdddj|jjjfS#1swYAxYw)NTrr$r)r r`requires_grad_ enable_gradr&rIrJtensorrFrKrr2r1r[rRrS) raw_paramsraw_params_tensorlossraw_noise_var_graddeterministic_objective log_prior minimum_noisen_paramsrLs r loss_funcz1GPRegressor._fit_kernel_params..loss_funcs( % 0 0 < K KD Q ""$ N49II>OPYQY>Z4[1$)II.?.I$J!/LLemmD#4X\#BCmS 44664H %6%;%;HqL%I"26HA6MMM N99; 1 6 6 = = ? E E GG G N Ns C EE Tzl-bfgs-bgtol)jacmethodoptionszOptimization failed: r)r np.ndarrayr4ztuple[float, np.ndarray])r@rZr concatenaterrIrRrSrJr[rKrr optimizeminimizesuccess RuntimeErrormessager r`rxr&rrFrd) rLrrrrinitial_raw_paramsrresraw_params_opt_tensorrs ```` @r_fit_kernel_paramszGPRegressor._fit_kernel_paramss==&&q) ^^t88??AGGIJFF4,,1134FF4>>..04-3GGH    H H"8 9 ..))"! *C {{!6s{{mDE E % 0 0 7,1II6KIX6V,W)!II&;H&EF' LLemm <+@A+N!OO   -  s !'G))G2)rMr3rNr3rOr3rIr3rJr3rKr3r4None)r4r)r4r)NN)ritorch.Tensor | Nonerjrr4r3)F)rxr3ryboolr4z!tuple[torch.Tensor, torch.Tensor])r4r3) r%Callable[[GPRegressor], torch.Tensor]rfloatrrrrr4r;) r5r6r7rPpropertyrTrdrXrrrr9rrr;r;]s#$## # '3 # # # # #0QQ#8IM@%@2E@ @<#WJ!/F@8@@"& @  @  @rr;c Jtjjddztjd fd }|} ||}d} || fD]} t tj tj tj | j | j| jj||||cStjd| d|} | j| S#t$r } | } Yd} ~ d} ~ wwxYw) Nr$rrc ttjtjtjddjdjdjS)Nr=rprMrNrOrIrJrK)r;r r`clone)XYdefault_kernel_paramsrMsr _default_gprz'fit_kernel_params.._default_gprKso ++N;$$Q'$$Q')>s)C)I)I)K.r288:+B/557   rr)rrrrz/The optimization of kernel parameters failed: z< The default initial kernel parameters will be used instead.)r4r;)r onesrZrFr;r`rIrJrKrrloggerwarningrd)rrrMrrr gpr_cacherrdefault_gpr_cacheerrorgpr_cache_to_usee default_gprrs``` @rfit_kernel_paramsr>s-"JJqwwqzA~U]]K  % N E'(9: $//?((+((+-=-Z-Z-::*44 ! #+(? ! $ NN :5'BF F.K  E sA:D  D"DD")rrr4r)Ng{Gz?)rrrrrMrrrrrrrrzGPRegressor | Nonerrr4r;)__doc__ __future__rrtypingrrrrSr"optuna._gp.scipy_blas_thread_patchroptuna.loggingrcollections.abcrr r optuna._importsr r5rrautogradFunctionr!r;rr9rrrs&#  Y%(+  E  E H   U^^,,<^^P%)77775 7  7 " 7"7 77r