Linux cli command Jcodepm
7 minute read
NAME 🖥️ Jcodepm 🖥️
Japanese Charset Handler
SYNOPSIS
use Jcode; # # traditional Jcode::convert(\str, $ocode, $icode, “z”); # or OOP! print Jcode->new($str)->h2z->tr($from, $to)->utf8;
DESCRIPTION
<Japanese document is now available as Jcode::Nihongo. >
Jcode.pm supports both object and traditional approach. With object approach, you can go like;
$iso_2022_jp = Jcode->new($str)->h2z->jis;
Which is more elegant than:
$iso_2022_jp = $str; &jcode::convert(\iso_2022_jp, jis, &jcode::getcode(\str), “z”);
For those unfamiliar with objects, Jcode.pm still supports getcode()
and convert().
If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the standard charset handler module for Perl 5.8 or later.
Methods
Methods mentioned here all return Jcode object unless otherwise mentioned.
Constructors
$j = Jcode->new($str [, $icode])
Creates Jcode object $j
from $str
. Input code is automatically checked unless you explicitly set $icode
. For available charset, see getcode below. For perl 5.8.1 or better, $icode
can be any encoding name that Encode understands. $j = Jcode->new($european, iso-latin1); When the object is stringified, it returns the EUC-converted string so you can <print $j
> instead of <print $j
->euc>.
Passing Reference
Instead of scalar value, You can use reference as Jcode->new(\str); This saves time a little bit. In exchange of the value of $str
being converted. (In a way, $str
is now tied to jcode object).
$j->set($str [, $icode])
Sets $j
’s internal string to $str
. Handy when you use Jcode object repeatedly (saves time and memory to create object). # converts mailbox to SJIS format my $jconv = new Jcode; $/ = 00; while(<>){ print $jconv->set(\_)->mime_decode->sjis; }
$j->append($str [, $icode]);
Appends $str
to $j
’s internal string.
$j = jcode($str [, $icode]);
shortcut for Jcode->new() so you can go like;
Encoded Strings
In general, you can retrieve encoded string as $j
->encoded.
$sjis = jcode($str)->sjis
$euc = $j->euc
$jis = $j->jis
$sjis = $j->sjis
$ucs2 = $j->ucs2
$utf8 = $j->utf8
What you code is what you get :)
$iso_2022_jp = $j->iso_2022_jp
Same as $j->h2z->jis
. Hankaku Kanas are forcibly converted to Zenkaku. For perl 5.8.1 and better, you can also use any encoding names and aliases that Encode supports. For example: $european = $j->iso_latin1; # replace - with _ for names. FYI: Encode::Encoder uses similar trick.
$j->fallback($fallback)
For perl is 5.8.1 or better, Jcode stores the internal string in UTF-8. Any character that does not map to ->encoding are replaced with a ‘?’, which is Encode standard. my $unistr = “\x{262f}”; # YIN YANG my $j = jcode($unistr); # $j->euc is ? You can change this behavior by specifying fallback like Encode. Values are the same as Encode. Jcode::FB_PERLQQ
, Jcode::FB_XMLCREF
, Jcode::FB_HTMLCREF
are aliased to those of Encode for convenice. print $j->fallback(Jcode::FB_PERLQQ)->euc; # \x{262f} print $j->fallback(Jcode::FB_XMLCREF)->euc; # ☯ print $j->fallback(Jcode::FB_HTMLCREF)->euc; # ☯ The global variable $Jcode::FALLBACK
stores the default fallback so you can override that by assigning the value. $Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
[@lines =] $jcode->jfold([$width, $newline_str, $kref])
folds lines in jcode string every $width
(default: 72) where $width
is the number of halfwidth character. Fullwidth Characters are counted as two. with a newline string spefied by $newline_str
(default:
). Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better.
$length = $jcode->jlength();
returns character length properly, rather than byte length.
Methods that use MIME::Base64
To use methods below, you need MIME::Base64. To install, simply
perl -MCPAN -e CPAN::Shell->install(“MIME::Base64”)
If your perl is 5.6 or better, there is no need since MIME::Base64 is bundled.
$mime_header = $j->mime_encode([$lf, $bpl])
Converts $str
to MIME-Header documented in RFC1522. When $lf
is specified, it uses $lf
to fold line (default:
). When $bpl
is specified, it uses $bpl
for the number of bytes (default: 76; this number must be smaller than 76). For Perl 5.8.1 or better, you can also encode MIME Header as: $mime_header = $j->MIME_Header; In which case the resulting $mime_header
is MIME-B-encoded UTF-8 whereas $j->mime_encode()
returnes MIME-B-encoded ISO-2022-JP. Most modern MUAs support both.
$j->mime_decode;
Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can also do the same as: Jcode->new($str, MIME-Header)
Hankaku vs. Zenkaku
$j->h2z([$keep_dakuten])
Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When $keep_dakuten
is set, it leaves dakuten as is (That is, ka + dakuten is left as is instead of being converted to ga) You can retrieve the number of matches via $j
->nmatch;
$j->z2h
Converts X208 kana (Zenkaku) to X201 kana (Hankaku). You can retrieve the number of matches via $j
->nmatch;
Regexp emulators
To use ->m()
and ->s()
, you need perl 5.8.1 or better.
$j->tr($from, $to, $opt);
Applies tr/$from/$to/
on Jcode object where $from
and $to
are EUC-JP strings. On perl 5.8.1 or better, $from
and $to
can also be flagged UTF-8 strings. If $opt
is set, tr/$from/$to/$opt
is applied. $opt
must be ‘c’, ’d’ or the combination thereof. You can retrieve the number of matches via $j
->nmatch; The following methods are available only for perl 5.8.1 or better.
$j->s($patter, $replace, $opt);
Applies s/$pattern/$replace/$opt
. $pattern
and replace
must be in EUC-JP or flagged UTF-8. $opt
are the same as regexp options. See perlre for regexp options. Like $j->tr()
, $j->s()
returns the object itself so you can nest the operation as follows; $j->tr(“a-z”, “A-Z”)->s(“foo”, “bar”);
[@match = ] $j->m($pattern, $opt);
Applies m/$patter/$opt
. Note that this method DOES NOT RETURN AN OBJECT so you can’t chain the method like $j->s()
.
Instance Variables
If you need to access instance variables of Jcode object, use access methods below instead of directly accessing them (That’s what OOP is all about)
FYI, Jcode uses a ref to array instead of ref to hash (common way) to optimize speed (Actually you don’t have to know as long as you use access methods instead; Once again, that’s OOP)
$j->r_str
Reference to the EUC-coded String.
$j->icode
Input charcode in recent operation.
$j->nmatch
Number of matches (Used in $j
->tr, etc.)
Subroutines
($code, [$nmatch]) = getcode($str)
Returns char code of $str
. Return codes are as follows ascii Ascii (Contains no Japanese Code) binary Binary (Not Text File) euc EUC-JP sjis SHIFT_JIS jis JIS (ISO-2022-JP) ucs2 UCS2 (Raw Unicode) utf8 UTF8 When array context is used instead of scaler, it also returns how many character codes are found. As mentioned above, $str
can be \str instead. jcode.pl Users: This function is 100% upper-conpatible with jcode::getcode() Ω- well, almost; * When its return value is an array, the order is the opposite; jcode::getcode() returns $nmatch first. * jcode::getcode() returns undef when the number of EUC characters is equal to that of SJIS. Jcode::getcode() returns EUC. for Jcode.pm there is no in-betweens.
Jcode::convert($str, [$ocode, $icode, $opt])
Converts $str
to char code specified by $ocode
. When $icode
is specified also, it assumes $icode
for input string instead of the one checked by getcode(). As mentioned above, $str
can be \str instead. jcode.pl Users: This function is 100% upper-conpatible with jcode::convert() !
BUGS
For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning Jcode is subject to bugs therein.
ACKNOWLEDGEMENTS
This package owes a lot in motivation, design, and code, to the jcode.pl for Perl4 by Kazumasa Utashiro <[email protected]>.
Hiroki Ohzaki <[email protected]> has helped me polish regexp from the very first stage of development.
JEncode by [email protected] has inspired me to integrate Encode to Jcode. He has also contributed Japanese POD.
And folks at Jcode Mailing list <[email protected]>. Without them, I couldn’t have coded this far.
SEE ALSO
Encode
Jcode::Nihongo
<http://www.iana.org/assignments/character-sets>
COPYRIGHT
Copyright 1999-2005 Dan Kogai <[email protected]>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
█║▌│║█║▌★ KALI ★ PARROT ★ DEBIAN 🔴 PENTESTING ★ HACKING ★ █║▌│║█║▌
██╗ ██╗ ██████╗ ██████╗ ██╗ ██╗███████╗██████╗
████████╗██╔══██╗██╔═══██╗╚██╗██╔╝██╔════╝██╔══██╗
╚██╔═██╔╝██║ ██║██║ ██║ ╚███╔╝ █████╗ ██║ ██║
████████╗██║ ██║██║ ██║ ██╔██╗ ██╔══╝ ██║ ██║
╚██╔═██╔╝██████╔╝╚██████╔╝██╔╝ ██╗███████╗██████╔╝
╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚═════╝
█║▌│║█║▌ WITH COMMANDLINE-KUNGFU POWER █║▌│║█║▌
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.