Package translate :: Package storage :: Module html :: Class htmlfile
[hide private]
[frames] | no frames]

Class htmlfile

source code

markupbase.ParserBase --+    
                        |    
    HTMLParser.HTMLParser --+
                            |
               object --+   |
                        |   |
    base.TranslationStore --+
                            |
                           htmlfile
Known Subclasses:

Nested Classes [hide private]
  UnitClass
A unit of translatable/localisable HTML content
Instance Methods [hide private]
 
__init__(self, includeuntaggeddata=None, inputfile=None)
Initialize and reset this instance.
source code
 
guess_encoding(self, htmlsrc)
Returns the encoding of the html text.
source code
 
do_encoding(self, htmlsrc)
Return the html text properly encoded based on a charset.
source code
 
phprep(self, text)
Replaces all instances of PHP with placeholder tags, and returns the new text and a dictionary of tags.
source code
 
reintrophp(self, text)
Replaces the PHP placeholders in text with the real code
source code
 
parse(self, htmlsrc)
parser to process the given source string
source code
 
addhtmlblock(self, text) source code
 
strip_html(self, text)
Strip unnecessary html from the text.
source code
 
has_translatable_content(self, text)
Check if the supplied HTML snippet has any content that needs to be translated.
source code
 
startblock(self, tag) source code
 
endblock(self) source code
 
handle_starttag(self, tag, attrs) source code
 
handle_startendtag(self, tag, attrs) source code
 
handle_endtag(self, tag) source code
 
handle_data(self, data) source code
 
handle_charref(self, name) source code
 
handle_entityref(self, name) source code
 
handle_comment(self, data) source code
 
handle_pi(self, data) source code

Inherited from HTMLParser.HTMLParser: check_for_whole_start_tag, clear_cdata_mode, close, error, feed, get_starttag_text, goahead, handle_decl, parse_endtag, parse_pi, parse_starttag, reset, set_cdata_mode, unescape, unknown_decl

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Inherited from base.TranslationStore: __str__, addsourceunit, addunit, findunit, getunits, isempty, makeindex, save, savefile, setsourcelanguage, settargetlanguage, translate, unit_iter

Inherited from base.TranslationStore (private): _assignname

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__

Class Methods [hide private]

Inherited from base.TranslationStore: parsefile, parsestring

Class Variables [hide private]
  markingtags = ['p', 'title', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6...
  markingattrs = []
  includeattrs = ['alt', 'summary', 'standby', 'abbr', 'content']

Inherited from HTMLParser.HTMLParser: CDATA_CONTENT_ELEMENTS

Inherited from markupbase.ParserBase (private): _decl_otherchars

Inherited from base.TranslationStore: Extensions, Mimetypes

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, includeuntaggeddata=None, inputfile=None)
(Constructor)

source code 

Initialize and reset this instance.

Overrides: object.__init__
(inherited documentation)

guess_encoding(self, htmlsrc)

source code 

Returns the encoding of the html text.

We look for 'charset=' within a meta tag to do this.

phprep(self, text)

source code 
Replaces all instances of PHP with placeholder tags, and returns
the new text and a dictionary of tags.  The current implementation
replaces <?foo?> with <?md5(foo)?>.  The hash => code conversions
are stored in self.phpdict for later use in restoring the real PHP.

The purpose of this is to remove all potential "tag-like" code from
inside PHP.  The hash looks nothing like an HTML tag, but the following
PHP:
  $a < $b ? $c : ($d > $e ? $f : $g)
looks like it contains an HTML tag:
  < $b ? $c : ($d >
to nearly any regex.  Hence, we replace all contents of PHP with simple
strings to help our regexes out.

parse(self, htmlsrc)

source code 

parser to process the given source string

Overrides: base.TranslationStore.parse
(inherited documentation)

strip_html(self, text)

source code 

Strip unnecessary html from the text.

HTML tags are deemed unnecessary if it fully encloses the translatable text, eg. '<a href="index.html">Home Page</a>'.

HTML tags that occurs within the normal flow of text will not be removed, eg. 'This is a link to the <a href="index.html">Home Page</a>.'

handle_starttag(self, tag, attrs)

source code 
Overrides: HTMLParser.HTMLParser.handle_starttag

handle_startendtag(self, tag, attrs)

source code 
Overrides: HTMLParser.HTMLParser.handle_startendtag

handle_endtag(self, tag)

source code 
Overrides: HTMLParser.HTMLParser.handle_endtag

handle_data(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_data

handle_charref(self, name)

source code 
Overrides: HTMLParser.HTMLParser.handle_charref

handle_entityref(self, name)

source code 
Overrides: HTMLParser.HTMLParser.handle_entityref

handle_comment(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_comment

handle_pi(self, data)

source code 
Overrides: HTMLParser.HTMLParser.handle_pi

Class Variable Details [hide private]

markingtags

Value:
['p',
 'title',
 'h1',
 'h2',
 'h3',
 'h4',
 'h5',
 'h6',
...