Skip to content

Commit 5e26279

Browse files
committed
Formalize symbol names reserved for Perl's use in XS
This creates a regular expression pattern of names that we feel free to expose to XS code's namespace. Hence they are names reserved for our use, and should any conflicts arise, the module needs to change, not us. Naturally, the pattern is pretty restrictive. Any symbol beginning with "PL_" Any symbol containing perl, Perl, or PERL, usually delimitted on both sides so as to keep it from being part of a larger word. Any other spelling that we expose could be considered to pollute the XS code space. We have felt free to do that all the time. Any new function's short name will do that. And we generally feel free to create macros with arbitrary names which could conflict with an existing XS name. Some important potential conflicts are: New keywords: We create an exposed KEY_foo macro. Some existing modules use some of these. My grep of CPAN shows maybe a dozen of these get used; mostly KEY_END. config.h is full of symbols like HAS_foo, I_bar, and others that are all exposed. I don't imagine we can claim to reserve any symbol beginning with either HAS_ or I_. And I don't know what to do here. Informally, myself and others have used a trailing underscore to indicate a private symbol. There are a few distributions that use some of these anyway. And there has been pushback when new short symbols that use this convention have been added. I would like to get a formal rule about use of this convention. There are 200+ of these currently. We could reserve any names with trailing underscores, or if that is too much, any ending in, say, '_pl_' or '_PL_'. We have 3000+ undocumented macro names that don't end in underscores and which are currently visible to XS code. This number includes the KEY_foo ones, but not the ones in config.h. To deal with namespace pollution, we have had the -DNO_SHORT_NAMES Configure option for use just with embedded perls. This hasn't worked at least since we added inline functions, and it always applied to only functions. I have a WIP to get this to work again, and to extend it to work with documented macros. It just occurred to me how to make this be customizable, so that downstream someone could add a list of symbols that should only exist as 'Perl_foo', and then recompile, leaving short names for everything not in the list.
1 parent 55c5b50 commit 5e26279

File tree

1 file changed

+25
-4
lines changed

1 file changed

+25
-4
lines changed

regen/embed.pl

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -104,14 +104,35 @@ BEGIN
104104
pod/perlreapi.pod
105105
);
106106

107+
# A regular expression that matches names that are externally visible, but
108+
# Perl reserves for itself. Generally, we want things to be delimitted on
109+
# both sides to show it isn't part of a larger word such as 'hyperlink',
110+
# 'perlustrate', or 'properly'. Underscores delimit besides the typical ^ or
111+
# \b. All caps PERL has looser rules to accommodate the many existing symbols
112+
# where everything is jammed together, and the less likelihood that something
113+
# with all caps is innocently referring to something unrelated to Perl.
114+
my $names_reserved_for_perl_use_re =
115+
qr/ ^ ( PL_ \w+ \b
116+
| perl_ # The underscore delimits
117+
| Perl [_A-Z] # Uppercase delimits here too
118+
| PERL [A-Z]+ [[:alpha:]] ( \b | _ )
119+
)
120+
121+
# The \d is for PERL5, for example
122+
| ( _ | \b ) PERL ( _ | \b | \d+ )
123+
124+
# This is for obsolete and deprecated uses
125+
| ( _ | \b ) CPERL (arg | scope) ( _ | \b )
126+
/x;
127+
107128
# This is a list of symbols that are not documented to be available for
108129
# modules to use, but are nevertheless currently not kept by embed.h from
109130
# being visible to the world.
110131
#
111132
# Strive to make this list empty.
112133
#
113134
# The list does not include symbols that we have documented as being reserved
114-
# for perl's use, namely those that begin with 'PL_' or contain qr/perl/i.
135+
# for perl's use, namely those that match the pattern just above.
115136
# There are two parts of the list; the second part contains the symbols which
116137
# have a trailing underscore; the first part those without.
117138
#
@@ -1336,6 +1357,8 @@ BEGIN
13361357
is_XDIGIT_high
13371358
isXDIGIT_LC_utf8
13381359
isXDIGIT_uni
1360+
is_XPERLSPACE_cp_high
1361+
is_XPERLSPACE_high
13391362
IV_DIG
13401363
IV_MAX_P1
13411364
JE_OLD_STACK_HWM_restore
@@ -4729,9 +4752,7 @@ sub find_undefs {
47294752
# Just the symbol, no arglist nor definition
47304753
$name =~ s/ (?: \s | \( ) .* //x;
47314754

4732-
# These are reserved for Perl's use, so not a problem.
4733-
next if $name =~ / ^ PL_ /x;
4734-
next if $name =~ /perl/i;
4755+
next if $name =~ $names_reserved_for_perl_use_re;
47354756

47364757
next unless $line->reduce_conds($constraints_re,
47374758
\%constraints);

0 commit comments

Comments
 (0)