2009年5月25日星期一

[PerlChina] Re: 多字节字符utf的问题

On 5/25/09, wqphoenix <wqphoenix@gmail.com> wrote:
想请教一个问题,我现在用perl提供一些数据时,发现有多字节字符。怎么判断和去除这些宽字符啊。

 Sorry, cannot type Chinese in my Opera :)

The most naive way might be:

    use Encode 'decode';
    my $s = 'some multi-byte chars here...';
    ($s = decode($charset, $s)) =~ s/[^[:ascii:]]+//g;
 
Here's a realistic example:

    use Encode 'decode';
    my $s = '你好abc么';
    ($s = decode('utf8', $s)) =~ s/[^[:ascii:]]+//g;
    print $s, "\n";

And you'll get "abc" if your .pl file is saved in the UTF-8 format :)

Well, you might have GBK stuffs on your side though ;)

Ciao,
-agentzh

--~--~---------~--~----~------------~-------~--~----~
您收到此信息是由于您订阅了 Google 论坛"PerlChina Mongers 讨论组"论坛。
 要在此论坛发帖,请发电子邮件到 perlchina@googlegroups.com
 要退订此论坛,请发邮件至 perlchina+unsubscribe@googlegroups.com
 更多选项,请通过 http://groups.google.com/group/perlchina?hl=zh-CN 访问该论坛

-~----------~----~----~----~------~----~------~--~---

没有评论: