442 lines
15 KiB
Plaintext
442 lines
15 KiB
Plaintext
|
=head1 NAME
|
||
|
|
||
|
perlcompile - Introduction to the Perl Compiler-Translator
|
||
|
|
||
|
=head1 DESCRIPTION
|
||
|
|
||
|
Perl has always had a compiler: your source is compiled into an
|
||
|
internal form (a parse tree) which is then optimized before being
|
||
|
run. Since version 5.005, Perl has shipped with a module
|
||
|
capable of inspecting the optimized parse tree (C<B>), and this has
|
||
|
been used to write many useful utilities, including a module that lets
|
||
|
you turn your Perl into C source code that can be compiled into an
|
||
|
native executable.
|
||
|
|
||
|
The C<B> module provides access to the parse tree, and other modules
|
||
|
("back ends") do things with the tree. Some write it out as
|
||
|
bytecode, C source code, or a semi-human-readable text. Another
|
||
|
traverses the parse tree to build a cross-reference of which
|
||
|
subroutines, formats, and variables are used where. Another checks
|
||
|
your code for dubious constructs. Yet another back end dumps the
|
||
|
parse tree back out as Perl source, acting as a source code beautifier
|
||
|
or deobfuscator.
|
||
|
|
||
|
Because its original purpose was to be a way to produce C code
|
||
|
corresponding to a Perl program, and in turn a native executable, the
|
||
|
C<B> module and its associated back ends are known as "the
|
||
|
compiler", even though they don't really compile anything.
|
||
|
Different parts of the compiler are more accurately a "translator",
|
||
|
or an "inspector", but people want Perl to have a "compiler
|
||
|
option" not an "inspector gadget". What can you do?
|
||
|
|
||
|
This document covers the use of the Perl compiler: which modules
|
||
|
it comprises, how to use the most important of the back end modules,
|
||
|
what problems there are, and how to work around them.
|
||
|
|
||
|
=head2 Layout
|
||
|
|
||
|
The compiler back ends are in the C<B::> hierarchy, and the front-end
|
||
|
(the module that you, the user of the compiler, will sometimes
|
||
|
interact with) is the O module. Some back ends (e.g., C<B::C>) have
|
||
|
programs (e.g., I<perlcc>) to hide the modules' complexity.
|
||
|
|
||
|
Here are the important back ends to know about, with their status
|
||
|
expressed as a number from 0 (outline for later implementation) to
|
||
|
10 (if there's a bug in it, we're very surprised):
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item B::Bytecode
|
||
|
|
||
|
Stores the parse tree in a machine-independent format, suitable
|
||
|
for later reloading through the ByteLoader module. Status: 5 (some
|
||
|
things work, some things don't, some things are untested).
|
||
|
|
||
|
=item B::C
|
||
|
|
||
|
Creates a C source file containing code to rebuild the parse tree
|
||
|
and resume the interpreter. Status: 6 (many things work adequately,
|
||
|
including programs using Tk).
|
||
|
|
||
|
=item B::CC
|
||
|
|
||
|
Creates a C source file corresponding to the run time code path in
|
||
|
the parse tree. This is the closest to a Perl-to-C translator there
|
||
|
is, but the code it generates is almost incomprehensible because it
|
||
|
translates the parse tree into a giant switch structure that
|
||
|
manipulates Perl structures. Eventual goal is to reduce (given
|
||
|
sufficient type information in the Perl program) some of the
|
||
|
Perl data structure manipulations into manipulations of C-level
|
||
|
ints, floats, etc. Status: 5 (some things work, including
|
||
|
uncomplicated Tk examples).
|
||
|
|
||
|
=item B::Lint
|
||
|
|
||
|
Complains if it finds dubious constructs in your source code. Status:
|
||
|
6 (it works adequately, but only has a very limited number of areas
|
||
|
that it checks).
|
||
|
|
||
|
=item B::Deparse
|
||
|
|
||
|
Recreates the Perl source, making an attempt to format it coherently.
|
||
|
Status: 8 (it works nicely, but a few obscure things are missing).
|
||
|
|
||
|
=item B::Xref
|
||
|
|
||
|
Reports on the declaration and use of subroutines and variables.
|
||
|
Status: 8 (it works nicely, but still has a few lingering bugs).
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head1 Using The Back Ends
|
||
|
|
||
|
The following sections describe how to use the various compiler back
|
||
|
ends. They're presented roughly in order of maturity, so that the
|
||
|
most stable and proven back ends are described first, and the most
|
||
|
experimental and incomplete back ends are described last.
|
||
|
|
||
|
The O module automatically enabled the B<-c> flag to Perl, which
|
||
|
prevents Perl from executing your code once it has been compiled.
|
||
|
This is why all the back ends print:
|
||
|
|
||
|
myperlprogram syntax OK
|
||
|
|
||
|
before producing any other output.
|
||
|
|
||
|
=head2 The Cross Referencing Back End
|
||
|
|
||
|
The cross referencing back end (B::Xref) produces a report on your program,
|
||
|
breaking down declarations and uses of subroutines and variables (and
|
||
|
formats) by file and subroutine. For instance, here's part of the
|
||
|
report from the I<pod2man> program that comes with Perl:
|
||
|
|
||
|
Subroutine clear_noremap
|
||
|
Package (lexical)
|
||
|
$ready_to_print i1069, 1079
|
||
|
Package main
|
||
|
$& 1086
|
||
|
$. 1086
|
||
|
$0 1086
|
||
|
$1 1087
|
||
|
$2 1085, 1085
|
||
|
$3 1085, 1085
|
||
|
$ARGV 1086
|
||
|
%HTML_Escapes 1085, 1085
|
||
|
|
||
|
This shows the variables used in the subroutine C<clear_noremap>. The
|
||
|
variable C<$ready_to_print> is a my() (lexical) variable,
|
||
|
B<i>ntroduced (first declared with my()) on line 1069, and used on
|
||
|
line 1079. The variable C<$&> from the main package is used on 1086,
|
||
|
and so on.
|
||
|
|
||
|
A line number may be prefixed by a single letter:
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item i
|
||
|
|
||
|
Lexical variable introduced (declared with my()) for the first time.
|
||
|
|
||
|
=item &
|
||
|
|
||
|
Subroutine or method call.
|
||
|
|
||
|
=item s
|
||
|
|
||
|
Subroutine defined.
|
||
|
|
||
|
=item r
|
||
|
|
||
|
Format defined.
|
||
|
|
||
|
=back
|
||
|
|
||
|
The most useful option the cross referencer has is to save the report
|
||
|
to a separate file. For instance, to save the report on
|
||
|
I<myperlprogram> to the file I<report>:
|
||
|
|
||
|
$ perl -MO=Xref,-oreport myperlprogram
|
||
|
|
||
|
=head2 The Decompiling Back End
|
||
|
|
||
|
The Deparse back end turns your Perl source back into Perl source. It
|
||
|
can reformat along the way, making it useful as a de-obfuscator. The
|
||
|
most basic way to use it is:
|
||
|
|
||
|
$ perl -MO=Deparse myperlprogram
|
||
|
|
||
|
You'll notice immediately that Perl has no idea of how to paragraph
|
||
|
your code. You'll have to separate chunks of code from each other
|
||
|
with newlines by hand. However, watch what it will do with
|
||
|
one-liners:
|
||
|
|
||
|
$ perl -MO=Deparse -e '$op=shift||die "usage: $0
|
||
|
code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op;
|
||
|
die$@ if$@; rename$was,$_ unless$was eq $_}'
|
||
|
-e syntax OK
|
||
|
$op = shift @ARGV || die("usage: $0 code [...]");
|
||
|
chomp(@ARGV = <ARGV>) unless @ARGV;
|
||
|
foreach $_ (@ARGV) {
|
||
|
$was = $_;
|
||
|
eval $op;
|
||
|
die $@ if $@;
|
||
|
rename $was, $_ unless $was eq $_;
|
||
|
}
|
||
|
|
||
|
The decompiler has several options for the code it generates. For
|
||
|
instance, you can set the size of each indent from 4 (as above) to
|
||
|
2 with:
|
||
|
|
||
|
$ perl -MO=Deparse,-si2 myperlprogram
|
||
|
|
||
|
The B<-p> option adds parentheses where normally they are omitted:
|
||
|
|
||
|
$ perl -MO=Deparse -e 'print "Hello, world\n"'
|
||
|
-e syntax OK
|
||
|
print "Hello, world\n";
|
||
|
$ perl -MO=Deparse,-p -e 'print "Hello, world\n"'
|
||
|
-e syntax OK
|
||
|
print("Hello, world\n");
|
||
|
|
||
|
See L<B::Deparse> for more information on the formatting options.
|
||
|
|
||
|
=head2 The Lint Back End
|
||
|
|
||
|
The lint back end (B::Lint) inspects programs for poor style. One
|
||
|
programmer's bad style is another programmer's useful tool, so options
|
||
|
let you select what is complained about.
|
||
|
|
||
|
To run the style checker across your source code:
|
||
|
|
||
|
$ perl -MO=Lint myperlprogram
|
||
|
|
||
|
To disable context checks and undefined subroutines:
|
||
|
|
||
|
$ perl -MO=Lint,-context,-undefined-subs myperlprogram
|
||
|
|
||
|
See L<B::Lint> for information on the options.
|
||
|
|
||
|
=head2 The Simple C Back End
|
||
|
|
||
|
This module saves the internal compiled state of your Perl program
|
||
|
to a C source file, which can be turned into a native executable
|
||
|
for that particular platform using a C compiler. The resulting
|
||
|
program links against the Perl interpreter library, so it
|
||
|
will not save you disk space (unless you build Perl with a shared
|
||
|
library) or program size. It may, however, save you startup time.
|
||
|
|
||
|
The C<perlcc> tool generates such executables by default.
|
||
|
|
||
|
perlcc myperlprogram.pl
|
||
|
|
||
|
=head2 The Bytecode Back End
|
||
|
|
||
|
This back end is only useful if you also have a way to load and
|
||
|
execute the bytecode that it produces. The ByteLoader module provides
|
||
|
this functionality.
|
||
|
|
||
|
To turn a Perl program into executable byte code, you can use C<perlcc>
|
||
|
with the C<-b> switch:
|
||
|
|
||
|
perlcc -b myperlprogram.pl
|
||
|
|
||
|
The byte code is machine independent, so once you have a compiled
|
||
|
module or program, it is as portable as Perl source (assuming that
|
||
|
the user of the module or program has a modern-enough Perl interpreter
|
||
|
to decode the byte code).
|
||
|
|
||
|
See B<B::Bytecode> for information on options to control the
|
||
|
optimization and nature of the code generated by the Bytecode module.
|
||
|
|
||
|
=head2 The Optimized C Back End
|
||
|
|
||
|
The optimized C back end will turn your Perl program's run time
|
||
|
code-path into an equivalent (but optimized) C program that manipulates
|
||
|
the Perl data structures directly. The program will still link against
|
||
|
the Perl interpreter library, to allow for eval(), C<s///e>,
|
||
|
C<require>, etc.
|
||
|
|
||
|
The C<perlcc> tool generates such executables when using the -opt
|
||
|
switch. To compile a Perl program (ending in C<.pl>
|
||
|
or C<.p>):
|
||
|
|
||
|
perlcc -opt myperlprogram.pl
|
||
|
|
||
|
To produce a shared library from a Perl module (ending in C<.pm>):
|
||
|
|
||
|
perlcc -opt Myperlmodule.pm
|
||
|
|
||
|
For more information, see L<perlcc> and L<B::CC>.
|
||
|
|
||
|
=over 4
|
||
|
|
||
|
=item B
|
||
|
|
||
|
This module is the introspective ("reflective" in Java terms)
|
||
|
module, which allows a Perl program to inspect its innards. The
|
||
|
back end modules all use this module to gain access to the compiled
|
||
|
parse tree. You, the user of a back end module, will not need to
|
||
|
interact with B.
|
||
|
|
||
|
=item O
|
||
|
|
||
|
This module is the front-end to the compiler's back ends. Normally
|
||
|
called something like this:
|
||
|
|
||
|
$ perl -MO=Deparse myperlprogram
|
||
|
|
||
|
This is like saying C<use O 'Deparse'> in your Perl program.
|
||
|
|
||
|
=item B::Asmdata
|
||
|
|
||
|
This module is used by the B::Assembler module, which is in turn used
|
||
|
by the B::Bytecode module, which stores a parse-tree as
|
||
|
bytecode for later loading. It's not a back end itself, but rather a
|
||
|
component of a back end.
|
||
|
|
||
|
=item B::Assembler
|
||
|
|
||
|
This module turns a parse-tree into data suitable for storing
|
||
|
and later decoding back into a parse-tree. It's not a back end
|
||
|
itself, but rather a component of a back end. It's used by the
|
||
|
I<assemble> program that produces bytecode.
|
||
|
|
||
|
=item B::Bblock
|
||
|
|
||
|
This module is used by the B::CC back end. It walks "basic blocks".
|
||
|
A basic block is a series of operations which is known to execute from
|
||
|
start to finish, with no possibility of branching or halting.
|
||
|
|
||
|
=item B::Bytecode
|
||
|
|
||
|
This module is a back end that generates bytecode from a
|
||
|
program's parse tree. This bytecode is written to a file, from where
|
||
|
it can later be reconstructed back into a parse tree. The goal is to
|
||
|
do the expensive program compilation once, save the interpreter's
|
||
|
state into a file, and then restore the state from the file when the
|
||
|
program is to be executed. See L</"The Bytecode Back End">
|
||
|
for details about usage.
|
||
|
|
||
|
=item B::C
|
||
|
|
||
|
This module writes out C code corresponding to the parse tree and
|
||
|
other interpreter internal structures. You compile the corresponding
|
||
|
C file, and get an executable file that will restore the internal
|
||
|
structures and the Perl interpreter will begin running the
|
||
|
program. See L</"The Simple C Back End"> for details about usage.
|
||
|
|
||
|
=item B::CC
|
||
|
|
||
|
This module writes out C code corresponding to your program's
|
||
|
operations. Unlike the B::C module, which merely stores the
|
||
|
interpreter and its state in a C program, the B::CC module makes a
|
||
|
C program that does not involve the interpreter. As a consequence,
|
||
|
programs translated into C by B::CC can execute faster than normal
|
||
|
interpreted programs. See L</"The Optimized C Back End"> for
|
||
|
details about usage.
|
||
|
|
||
|
=item B::Debug
|
||
|
|
||
|
This module dumps the Perl parse tree in verbose detail to STDOUT.
|
||
|
It's useful for people who are writing their own back end, or who
|
||
|
are learning about the Perl internals. It's not useful to the
|
||
|
average programmer.
|
||
|
|
||
|
=item B::Deparse
|
||
|
|
||
|
This module produces Perl source code from the compiled parse tree.
|
||
|
It is useful in debugging and deconstructing other people's code,
|
||
|
also as a pretty-printer for your own source. See
|
||
|
L</"The Decompiling Back End"> for details about usage.
|
||
|
|
||
|
=item B::Disassembler
|
||
|
|
||
|
This module turns bytecode back into a parse tree. It's not a back
|
||
|
end itself, but rather a component of a back end. It's used by the
|
||
|
I<disassemble> program that comes with the bytecode.
|
||
|
|
||
|
=item B::Lint
|
||
|
|
||
|
This module inspects the compiled form of your source code for things
|
||
|
which, while some people frown on them, aren't necessarily bad enough
|
||
|
to justify a warning. For instance, use of an array in scalar context
|
||
|
without explicitly saying C<scalar(@array)> is something that Lint
|
||
|
can identify. See L</"The Lint Back End"> for details about usage.
|
||
|
|
||
|
=item B::Showlex
|
||
|
|
||
|
This module prints out the my() variables used in a function or a
|
||
|
file. To get a list of the my() variables used in the subroutine
|
||
|
mysub() defined in the file myperlprogram:
|
||
|
|
||
|
$ perl -MO=Showlex,mysub myperlprogram
|
||
|
|
||
|
To get a list of the my() variables used in the file myperlprogram:
|
||
|
|
||
|
$ perl -MO=Showlex myperlprogram
|
||
|
|
||
|
[BROKEN]
|
||
|
|
||
|
=item B::Stackobj
|
||
|
|
||
|
This module is used by the B::CC module. It's not a back end itself,
|
||
|
but rather a component of a back end.
|
||
|
|
||
|
=item B::Stash
|
||
|
|
||
|
This module is used by the L<perlcc> program, which compiles a module
|
||
|
into an executable. B::Stash prints the symbol tables in use by a
|
||
|
program, and is used to prevent B::CC from producing C code for the
|
||
|
B::* and O modules. It's not a back end itself, but rather a
|
||
|
component of a back end.
|
||
|
|
||
|
=item B::Terse
|
||
|
|
||
|
This module prints the contents of the parse tree, but without as much
|
||
|
information as B::Debug. For comparison, C<print "Hello, world.">
|
||
|
produced 96 lines of output from B::Debug, but only 6 from B::Terse.
|
||
|
|
||
|
This module is useful for people who are writing their own back end,
|
||
|
or who are learning about the Perl internals. It's not useful to the
|
||
|
average programmer.
|
||
|
|
||
|
=item B::Xref
|
||
|
|
||
|
This module prints a report on where the variables, subroutines, and
|
||
|
formats are defined and used within a program and the modules it
|
||
|
loads. See L</"The Cross Referencing Back End"> for details about
|
||
|
usage.
|
||
|
|
||
|
=back
|
||
|
|
||
|
=head1 KNOWN PROBLEMS
|
||
|
|
||
|
The simple C backend currently only saves typeglobs with alphanumeric
|
||
|
names.
|
||
|
|
||
|
The optimized C backend outputs code for more modules than it should
|
||
|
(e.g., DirHandle). It also has little hope of properly handling
|
||
|
C<goto LABEL> outside the running subroutine (C<goto &sub> is okay).
|
||
|
C<goto LABEL> currently does not work at all in this backend.
|
||
|
It also creates a huge initialization function that gives
|
||
|
C compilers headaches. Splitting the initialization function gives
|
||
|
better results. Other problems include: unsigned math does not
|
||
|
work correctly; some opcodes are handled incorrectly by default
|
||
|
opcode handling mechanism.
|
||
|
|
||
|
BEGIN{} blocks are executed while compiling your code. Any external
|
||
|
state that is initialized in BEGIN{}, such as opening files, initiating
|
||
|
database connections etc., do not behave properly. To work around
|
||
|
this, Perl has an INIT{} block that corresponds to code being executed
|
||
|
before your program begins running but after your program has finished
|
||
|
being compiled. Execution order: BEGIN{}, (possible save of state
|
||
|
through compiler back-end), INIT{}, program runs, END{}.
|
||
|
|
||
|
=head1 AUTHOR
|
||
|
|
||
|
This document was originally written by Nathan Torkington, and is now
|
||
|
maintained by the perl5-porters mailing list
|
||
|
I<perl5-porters@perl.org>.
|
||
|
|
||
|
=cut
|