[L iHUddlmZdZddgZddlmZmZmZmZm Z m Z m Z m Z m Z mZddlmZddlmZddlmZdd lmZdd lmZmZmZmZmZmZmZdd lmZm Z m!Z!m"Z"m#Z#m$Z$m%Z%dd l&m'Z'dd l(m)Z)e rddl*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1ddl2m3Z3dZ4de5d<ddZ6eejnejpfZ9de5d<ee9e ejne ejpfZ:de5d<Gdde$Z;Gdde"e;Zz_invert..Fs341aA3s)dictlistitems)ds r7_invertr=Ds 34 ?3 33r _LXMLParser_ParserOrParserClassceZdZUejZded<dZded<ded<dZd ed <d gZ d ed <ee e e e gZd ed<dZded<edZded<eeZded<ded<ded<ded<ded<d1fd Zd2d!Zd3d"Zd4d#Z d5 d6fd$ Zd7d%Z d8 d9d&Zd:d'Zd;d(Zif dd+Zd?d,Z d@d-Z!dAd.Z"dBd/Z#dCd0Z$xZ%S)DrzType[etree.XMLParser]DEFAULT_PARSER_CLASSTboolis_xmlzType[ProcessingInstruction]processing_instruction_classzlxml-xmlr.NAMExml Iterable[str]ALTERNATE_NAMESfeaturesiint CHUNK_SIZEz$http://www.w3.org/XML/1998/namespace)rGr)DEFAULT_NSMAPSr*DEFAULT_NSMAPS_INVERTEDz)List[Optional[_InvertedNamespaceMapping]]nsmapsOptional[Set[str]]empty_element_tagsrparserOptional[etree.XMLParser]_default_parsercbtt| ||j|jy)zLet the BeautifulSoup object know about the standard namespace mapping. :param soup: A `BeautifulSoup`. N)superrinitialize_soup_register_namespacesrM)selfsoup __class__s r7rWz%LXMLTreeBuilderForXML.initialize_soupps) #T:4@ !!$"5"56r>c|jJt|jD]:\}}|s ||jjvs"||jj|<<y)aLet the BeautifulSoup object know about namespaces encountered while parsing the document. This might be useful later on when creating CSS selectors. This will track (almost) all namespaces, even ones that were only in scope for part of the document. If two namespaces have the same prefix, only the first one encountered will be tracked. Un-prefixed namespaces are not tracked. :param mapping: A dictionary mapping namespace prefixes to URIs. N)rZr:r; _namespaces)rYmappingkeyvalues r7rXz*LXMLTreeBuilderForXML._register_namespaces{s^yy$$$w}}/ 3JC s$))"7"77.3 %%c* 3r>cZ|j |jS|j|d|S)zFind the default parser for the given encoding. :return: Either a parser object or a class, which will be instantiated with default arguments. Ttargetrecoverencoding)rTrBrYres r7default_parserz$LXMLTreeBuilderForXML.default_parsers4    +'' '((dX(VVr>cT|j|}t|r ||d|}|S)zInstantiate an appropriate parser for the given encoding. :param encoding: A string. :return: A parser object such as an `etree.XMLParser`. Trb)rgcallable)rYrerRs r7 parser_forz LXMLTreeBuilderForXML.parser_fors/$$X. F 4IF r>c ||_d|_|jg|_t |j g|_|jr t|_ n t|_ d|vr t|d<tt|:di|y)Nattribute_dict_classr3)rTrZrNrOr9rMactive_namespace_prefixesrDrrErrrVr__init__)rYrRrQkwargsr[s r7rnzLXMLTreeBuilderForXML.__init__sw & 334 *.t/B/B*C)D& ;;0HD -0ED - ! /-=F) * #T3=f=r>cZ|ddk(r d|vr|ddjdd\}}||fSd|fS)Nr{})split)rYtag namespacenames r7 _getNsTagzLXMLTreeBuilderForXML._getNsTagsD q6S=SCZ!!"gmmC3OItt$ $c{r>c#K|jstj|dt|tr#t |dkDr |ddk(r|dd}|d|dft|tr|j dd|dfyg}|r|j|g}|r|j|t||||j | }|jD]}|j||dfyw) aBRun any preliminary steps necessary to make incoming markup acceptable to the parser. lxml really wants to get a bytestring and convert it to Unicode itself. So instead of using UnicodeDammit to convert the bytestring to Unicode using different encodings, this implementation uses EncodingDetector to iterate over the encodings, and tell lxml to try to parse the document as each one in turn. :param markup: Some markup -- hopefully a bytestring. :param user_specified_encoding: The user asked to try this encoding. :param document_declared_encoding: The markup itself claims to be in this encoding. :param exclude_encodings: The user asked _not_ to try any of these encodings. :yield: A series of 4-tuples: (markup, encoding, declared encoding, has undergone character replacement) Each 4-tuple represents a strategy for converting the document to Unicode and parsing it. Each strategy will be tried in turn. ) stacklevelrursNFutf8)known_definite_encodingsuser_encodingsis_htmlexclude_encodings) rDrwarn_if_markup_looks_like_xml isinstancer.lenencodeappendr# encodingsmarkup) rYruser_specified_encodingdocument_declared_encodingrr}r~detectorres r7prepare_markupz$LXMLTreeBuilderForXML.prepare_markups B{{ # @ @TU V fc "6{Q6!90E#E$ :EA A fc "==(&2LeT T 46 " % + +,C D*, %  ! !"< =# %=) O/  !** QH??H.H%P P QsC(C*ct|tr t|}nt|tr t |}|j Jj |j} |j|j j|_ |jj|t|dk7rS|j |j}t|dk7r|jj|t|dk7rS|jjy#ttt j"f$r}t%|d}~wwxYw)Nr)rbytesrr.rrZreadrLrjoriginal_encodingrRfeedrcloseUnicodeDecodeError LookupErrorr ParserErrorr$)rYriodataes r7rzLXMLTreeBuilderForXML.feeds fe $B  $&!Byy$$$wwt' *//$))*E*EFDK KK  T "d)q.wwt/t9>KK$$T* d)q. KK   "K1B1BC *&q) ) *s"B%D#D##E E  Ec(|jg|_yr2)rNrO)rYs r7rzLXMLTreeBuilderForXML.close7s334 r>cx|jJt|tsJ|j}|j D].\}}t|tsJt|tsJ|||<0d}d}t |dk(r4t |j dkDr|j jdnt |dkDr|j||j jt|t|jd} | j|d| vr| d=|jj| t|j D]\} }td| d} ||| <|j} t|j D]D\} }|j| \}} ||| | <"|j!|}t|| |} || | <F|j|\}}|j!|}|jj#|||| |jdy)Nrrsxmlnszhttp://www.w3.org/2000/xmlns/) namespaces)rZrr.rlr;rrOrrXr=r9rmupdater:rrx_prefix_for_namespacehandle_starttag)rYruattribnsmap new_attribr5r6nsprefixrvcurrent_mappingprefix attribute final_attribattrr`s r7startzLXMLTreeBuilderForXML.start:sCyy$$$#s###  % % ' LLN DAqa% %%a% %%JqM  04-1 u:?s4;;/!3 KK  t $ Z!^  % %e , KK  wu~ . #4#A#A"#EFO  " "5 ) _$#B'  * * 1 1/ B&*%++-%8 2! /V%D )2 9%  2'+&?&?&A  0 0 23 +KD%"nnT2OIt %* T"55i@*8T9E%* T" +, 3--i8 !!    55b9 " r>cZ|yt|jD]}|||vs ||cSy)z9Find the currently active prefix for the given namespace.N)reversedrO)rYrvinverted_nsmaps r7rz+LXMLTreeBuilderForXML._prefix_for_namespacesB  &t{{3 1N)i>.I%i00 1r>c|jJt|tsJ|jj|j |\}}d}|(t |j D]}|||vs ||}n|jj||t|j dkDr8|j j}||jjyyy)Nrs) rZrr.endDatarxrrO handle_endtagrpoprm)rYrurvrrout_of_scope_nsmaps r7endzLXMLTreeBuilderForXML.endsyy$$$#s### , 3  "*4;;"7 !-)~2M-i8H  X. t{{ a "&!2 !-..224 . r>c|jJ|jj|dz|z}|jj||jj|jy)N )rZr handle_datarE)rYrcrs r7pizLXMLTreeBuilderForXML.pisZyy$$$ |d" d# $;;cz|jJt|tsJ|jj|yr2)rZrr.r)rYrs r7rzLXMLTreeBuilderForXML.datas4yy$$$$$$$ d#r>c|jJ|jjtj|||}|jj ||jjty)N)containerClass)rZrr_string_for_name_and_idsr)rYrwpubidsystemdoctype_strings r7doctypezLXMLTreeBuilderForXML.doctypes]yy$$$  99$vN n- 1r>c|jJt|tsJ|jj|jj ||jjt y)z#Handle comments as Comment objects.N)rZrr.rrr)rYtexts r7commentzLXMLTreeBuilderForXML.commentsVyy$$$$$$$  d# '"r>c d|zS)See `TreeBuilder`.z) %sr3rYfragments r7test_fragment_to_documentz/LXMLTreeBuilderForXML.test_fragment_to_documents ;hFFr>)rZr,returnNone)r^zDict[str, str]rrreOptional[_Encoding]rr@)rerrr?)NN)rRrSrQrPror)rur.rzTuple[Optional[str], str])NNN) rr+rrrrrzOptional[_Encodings]rzRIterable[Tuple[Union[str, bytes], Optional[_Encoding], Optional[_Encoding], bool]]rr+rr)rr)ru str | bytesrzDict[str | bytes, str | bytes]rr)rr)rvzOptional[_NamespaceURL]rzOptional[_NamespacePrefix])rurrr)rcr.rr.rr)rrrr)rwr.rr.rr.rr)rrrrrr.rr.)&__name__ __module__ __qualname__r XMLParserrB__annotations__rDrFrIr/r"rr rJrLr9rMr=rNrWrXrgrjrnrxrrrrrrrrrrr __classcell__)r[s@r7rrOs27///AFD"==D#&+WO], $T3jAHmAJ)-1W(XN%X9@9P6P 55** K.. 732W  -115>)>/> >,8<:>26 QQQQ"5QQ%8 QQ 0 QQ   QQf*45$& W W /W ! W  W r 0  # 5.=$ 2#Gr>czeZdZUeZded<dgZded<eeeee e gzZ ded<dZ ded <dd Z dd Zdd Zy )rr.rFz lxml-htmlrHrIrJFrCrDc"tjSr2)r HTMLParserrfs r7rgzLXMLTreeBuilder.default_parsersr>cF|jJ|jj} |j||_|jj ||jj y#t ttjf$r}t|d}~wwxYwr2) rZrrjrRrrrrrrr$)rYrrers r7rzLXMLTreeBuilder.feedsyy$$$99.. *//(3DK KK  V $ KK   "K1B1BC *&q) ) *sA A22B  BB c d|zS)rz%sr3rs r7rz)LXMLTreeBuilder.test_fragment_to_documents -88r>Nrrr)rrrr/rFrrIr:rrr rJrDrgrrr3r>r7rrsND#&1]O]2"?3tT46TTHmTFD  *9r>N)r<dict[Any, Any]rr)= __future__r __license____all__typingrrr r r r r rrrrrrtyping_extensionsrr-r bs4.elementrrrrrrr bs4.builderrrrrr r!r" bs4.dammitr#bs4.exceptionsr$ bs4._typingr%r&r'r(r)r*r+bs4r,r/rr=rrr?r@rrr3r>r7rs"     '(/" c4 u0@0@@A YA"'eoo&U-=-=(>>#i CGKCGL 9o'<9r>