\chapter{Native-code compilation (ocamlopt)} \label{c:nativecomp}
%HEVEA\cutname{native.html}

This chapter describes the OCaml high-performance
native-code compiler "ocamlopt", which compiles OCaml source files to
native code object files and links these object files to produce
standalone executables.

The native-code compiler is only available on certain platforms.
It produces code that runs faster than the bytecode produced by
"ocamlc", at the cost of increased compilation time and executable code
size. Compatibility with the bytecode compiler is extremely high: the
same source code should run identically when compiled with "ocamlc" and
"ocamlopt".

It is not possible to mix native-code object files produced by "ocamlopt"
with bytecode object files produced by "ocamlc": a program must be
compiled entirely with "ocamlopt" or entirely with "ocamlc". Native-code
object files produced by "ocamlopt" cannot be loaded in the toplevel
system "ocaml".

\section{s:native-overview}{Overview of the compiler}

The "ocamlopt" command has a command-line interface very close to that
of "ocamlc". It accepts the same types of arguments, and processes them
sequentially, after all options have been processed:

\begin{itemize}
\item
Arguments ending in ".mli" are taken to be source files for
compilation unit interfaces. Interfaces specify the names exported by
compilation units: they declare value names with their types, define
public data types, declare abstract data types, and so on. From the
file \var{x}".mli", the "ocamlopt" compiler produces a compiled interface
in the file \var{x}".cmi". The interface produced is identical to that
produced by the bytecode compiler "ocamlc".

\item
Arguments ending in ".ml" are taken to be source files for compilation
unit implementations. Implementations provide definitions for the
names exported by the unit, and also contain expressions to be
evaluated for their side-effects.  From the file \var{x}".ml", the "ocamlopt"
compiler produces two files: \var{x}".o", containing native object code,
and \var{x}".cmx", containing extra information for linking and
optimization of the clients of the unit. The compiled implementation
should always be referred to under the name \var{x}".cmx" (when given
a ".o" or ".obj" file, "ocamlopt" assumes that it contains code compiled from C,
not from OCaml).

The implementation is checked against the interface file \var{x}".mli"
(if it exists) as described in the manual for "ocamlc"
(chapter~\ref{c:camlc}).

\item
Arguments ending in ".cmx" are taken to be compiled object code.  These
files are linked together, along with the object files obtained
by compiling ".ml" arguments (if any), and the OCaml standard
library, to produce a native-code executable program. The order in
which ".cmx" and ".ml" arguments are presented on the command line is
relevant: compilation units are initialized in that order at
run-time, and it is a link-time error to use a component of a unit
before having initialized it. Hence, a given \var{x}".cmx" file must come
before all ".cmx" files that refer to the unit \var{x}.

\item
Arguments ending in ".cmxa" are taken to be libraries of object code.
Such a library packs in two files (\var{lib}".cmxa" and \var{lib}".a"/".lib")
a set of object files (".cmx" and ".o"/".obj" files). Libraries are build with
"ocamlopt -a" (see the description of the "-a" option below). The object
files contained in the library are linked as regular ".cmx" files (see
above), in the order specified when the library was built. The only
difference is that if an object file contained in a library is not
referenced anywhere in the program, then it is not linked in.

\item
Arguments ending in ".c" are passed to the C compiler, which generates
a ".o"/".obj" object file. This object file is linked with the program.

\item
Arguments ending in ".o", ".a" or ".so" (".obj", ".lib" and ".dll"
under Windows) are assumed to be C object files and
libraries. They are linked with the program.

\end{itemize}

The output of the linking phase is a regular Unix or Windows
executable file. It does not need "ocamlrun" to run.

The compiler is able to emit some information on its internal stages:

\begin{itemize}
\item
%  The following two paragraphs are a duplicate from the description of the batch compiler.
".cmt" files for the implementation of the compilation unit
and ".cmti" for signatures if the option "-bin-annot" is passed to it (see the
description of "-bin-annot" below).
Each such file contains a typed abstract syntax tree (AST), that is produced
during the type checking procedure. This tree contains all available information
about the location and the specific type of each term in the source file.
The AST is partial if type checking was unsuccessful.

These ".cmt" and ".cmti" files are typically useful for code inspection tools.

\item
".cmir-linear" files for the implementation of the compilation unit
if the option "-save-ir-after scheduling" is passed to it.
Each such file contains a low-level intermediate representation,
produced by the instruction scheduling pass.

An external tool can perform low-level optimisations,
such as code layout, by transforming a ".cmir-linear" file.
To continue compilation, the compiler can be invoked with (a possibly modified)
".cmir-linear" file as an argument, instead of the corresponding source file.
\end{itemize}

\section{s:native-options}{Options}

The following command-line options are recognized by "ocamlopt".
The options "-pack", "-a", "-shared", "-c", "-output-obj" and
"-output-complete-obj" are mutually exclusive.

% Configure boolean variables used by the macros in unified-options.etex
\compfalse
\nattrue
\topfalse
% unified-options gathers all options across the native/bytecode
% compilers and toplevel
\input{unified-options.tex}

\paragraph{Options for the 64-bit x86 architecture}
The 64-bit code generator for Intel/AMD x86 processors ("amd64"
architecture) supports the following additional options:

\begin{options}
\item["-fPIC"] Generate position-independent machine code.  This is
the default.
\item["-fno-PIC"] Generate position-dependent machine code.
\end{options}

\paragraph{Contextual control of command-line options}

The compiler command line can be modified ``from the outside''
with the following mechanisms. These are experimental
and subject to change. They should be used only for experimental and
development work, not in released packages.

\begin{options}
\item["OCAMLPARAM" \rm(environment variable)]
A set of arguments that will be inserted before or after the arguments from
the command line. Arguments are specified in a comma-separated list
of "name=value" pairs. A "_" is used to specify the position of
the command line arguments, i.e. "a=x,_,b=y" means that "a=x" should be
executed before parsing the arguments, and "b=y" after. Finally,
an alternative separator can be specified as the
first character of the string, within the set ":|; ,".
\item["ocaml_compiler_internal_params" \rm(file in the stdlib directory)]
A mapping of file names to lists of arguments that
will be added to the command line (and "OCAMLPARAM") arguments.
\end{options}

\section{s:native-common-errors}{Common errors}

The error messages are almost identical to those of "ocamlc".
See section~\ref{s:comp-errors}.

\section{s:native:running-executable}{Running executables produced by ocamlopt}

Executables generated by "ocamlopt" are native, stand-alone executable
files that can be invoked directly.  They do
not depend on the "ocamlrun" bytecode runtime system nor on
dynamically-loaded C/OCaml stub libraries.

During execution of an "ocamlopt"-generated executable,
the following environment variables are also consulted:
\begin{options}
\item["OCAMLRUNPARAM"]  Same usage as in "ocamlrun"
  (see section~\ref{s:ocamlrun-options}), except that option "l"
  is ignored (the operating system's stack size limit
  is used instead).
\item["CAMLRUNPARAM"]  If "OCAMLRUNPARAM" is not found in the
  environment, then "CAMLRUNPARAM" will be used instead.  If
  "CAMLRUNPARAM" is not found, then the default values will be used.
\end{options}

\section{s:compat-native-bytecode}{Compatibility with the bytecode compiler}

This section lists the known incompatibilities between the bytecode
compiler and the native-code compiler. Except on those points, the two
compilers should generate code that behave identically.

\begin{itemize}

\item Signals are detected only when the program performs an
allocation in the heap. That is, if a signal is delivered while in a
piece of code that does not allocate, its handler will not be called
until the next heap allocation.

\item On ARM and PowerPC processors (32 and 64 bits), fused
  multiply-add (FMA) instructions can be generated for a
  floating-point multiplication followed by a floating-point addition
  or subtraction, as in "x *. y +. z".  The FMA instruction avoids
  rounding the intermediate result "x *. y", which is generally
  beneficial, but produces floating-point results that differ slightly
  from those produced by the bytecode interpreter.

\item The native-code compiler performs a number of optimizations that
the bytecode compiler does not perform, especially when the Flambda
optimizer is active.  In particular, the native-code compiler
identifies and eliminates ``dead code'', i.e.\ computations that do
not contribute to the results of the program.  For example,
\begin{verbatim}
        let _ = ignore M.f
\end{verbatim}
contains a reference to compilation unit "M" when compiled to
bytecode.  This reference forces "M" to be linked and its
initialization code to be executed.  The native-code compiler
eliminates the reference to "M", hence the compilation unit "M" may
not be linked and executed.  A workaround is to compile "M" with the
"-linkall" flag so that it will always be linked and executed, even if
not referenced.  See also the "Sys.opaque_identity" function from the
"Sys" standard library module.

\item Before 4.10, stack overflows, typically caused by excessively
  deep recursion, are not always turned into a "Stack_overflow"
  exception like with the bytecode compiler. The runtime system makes
  a best effort to trap stack overflows and raise the "Stack_overflow"
  exception, but sometimes it fails and a ``segmentation fault'' or
  another system fault occurs instead.

\end{itemize}
