php 在文本文件中搜索特定字符串后创建数组(并在表中输出)

2vuwiymt  于 2022-11-28  发布在  PHP
关注(0)|答案(1)|浏览(111)

我有一个文本文件(大约10,000行),下面给出了其中的一些行

易混淆文件.txt

1F110 ; 0028 0041 0029 ;    MA  #* ( 🄐 → (A) ) PARENTHESIZED LATIN CAPITAL LETTER A → LEFT PARENTHESIS, LATIN CAPITAL LETTER A, RIGHT PARENTHESIS  # 
FF21 ;  0041 ;  MA  # ( A → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A   # →А→
FF22 ;  0042 ;  MA  # ( B → B ) FULLWIDTH LATIN CAPITAL LETTER B → LATIN CAPITAL LETTER B   # →Β→
212C ;  0042 ;  MA  # ( ℬ → B ) SCRIPT CAPITAL B → LATIN CAPITAL LETTER B   # 
1F110 ; 0028 0041 0029 ;    MA  #* ( 🄐 → (A) ) PARENTHESIZED LATIN CAPITAL LETTER A → LEFT PARENTHESIS, LATIN CAPITAL LETTER A, RIGHT PARENTHESIS  # 
1D435 ; 0042 ;  MA  # ( 𝐵 → B ) MATHEMATICAL ITALIC CAPITAL B → LATIN CAPITAL LETTER B # 
213B ;  0046 0041 0058 ;    MA  #* ( ℻ → FAX ) FACSIMILE SIGN → LATIN CAPITAL LETTER F, LATIN CAPITAL LETTER A, LATIN CAPITAL LETTER X  #

我想根据搜索字符串(例如上面第4行中的'LATIN CAPITAL LETTER B'中的)获取每行括号后的第一个字符(Unicode,original char),我可以使用以下代码来实现

<?php

/**
 * @return Generator
 */

// read file
$fileData = function () {
    $file = fopen(__DIR__ . './confusables.txt', 'r');

    if (!$file) {
        return;
    }
    while (($line = fgets($file)) !== false) {
        yield $line;
    }

    fclose($file);
};

// output array
$output_string = [
    'uni-code' => '',
    'original' => '',
    'des' => '',
];

$search_string = 'LATIN CAPITAL LETTER A';

$initial_line_count = 1; // variable to count lines before we start slicing
$final_count = 0; // final line count
// loop to get final count
foreach ($fileData() as $line) {
    // $line contains current line
    if (preg_match_all("/{$search_string}/i", $line)) {
        $initial_line_count++;
        $final_count = $initial_line_count;
        // echo $final_count.'<br>';
    }
}

$line_count = 1; // loop termination counter
$html = '<table>
            <tr>
                <th style="border:1px solid #000">ORIGINAL LETTERS</th>
                <th style="border:1px solid #000">UNICODE CHARACTER</th>
                <th style="border:1px solid #000">Description</th>
            </tr>';
// loop to slice and append in array
foreach ($fileData() as $line) {
    // $line contains current line
    if (preg_match_all("/{$search_string}/i", $line)) {
        // start slicing
        $slice_after = substr($line, 0, strpos($line, ' ) ')); // slice everything after )
        $slice_before = ltrim(stristr($slice_after, '('), '('); // slice everything upto (
        $first_char = substr($slice_before, 0, strpos($slice_before, "→")); // get every first character
        $split_Real_char = ltrim(stristr($search_string, 'LETTER'), 'LETTER'); // get every real character
        $real_Char = $output_string['original'] .= $split_Real_char; // append to array
        $split_Unicode_char = $output_string['uni-code'] .= $first_char . ','; // append to array
        $line_count++; // loop termination counter
        // loop termination
        if ($line_count == $final_count) {
            $html .= '  <tr>
                            <td style=" border:1px solid black;"><pre>' . $split_Real_char . '</pre></td>
                            <td style=" border:1px solid black;"><pre>' . $split_Unicode_char . '</pre></td>
                            <td style=" border:1px solid black;"><pre>' . $search_string . '</pre></td>
                        </tr>';
            $html .= '</table>';
            echo $html;
            break;
        }
    }
}

我得到的输出如下

| ORIGINAL LETTER      | UNICODE CHARACTER         | Description                      |
| -------------------- | ------------------------- | -------------------------------- |
| B                    | B, ℬ ,  𝐵               | LATIN CAPITAL LETTER B           |

对于单个(硬编码字符串)输出看起来很好,但我必须自动化该过程(对于整个10,000行),到目前为止我已经尝试过了

<?php

/**
 * @return Generator
 */

// read file
$fileData = function () {
    $file = fopen(__DIR__ . './confusables.txt', 'r');

    if (!$file) {
        return;
    }
    while (($line = fgets($file)) !== false) {
        yield $line;
    }

    fclose($file);
};

$searchStringArray = array();

// loop to generate search strings
foreach (range('A', 'B') as $alphabet) {
    $alphabets = 'LATIN CAPITAL LETTER ' . $alphabet . "";
    array_push($searchStringArray, $alphabets);
}

// output array
$output_string = [
    'uni-code' => '',
    'original' => '',
    'des' => '',
];

$initial_line_count = 1; // variable to count lines before we start slicing
$final_count = 0; // final line count

for ($i = 0; $i < count($searchStringArray); $i++) {
    $search_string = $searchStringArray[$i];
    // loop to get final count
    foreach ($fileData() as $line) {
        // $line contains current line
        if (preg_match_all("/{$search_string}/i", $line)) {
            $initial_line_count++;
            $final_count = $initial_line_count;
            // echo $final_count.'<br>';
        }
    }
}
$line_count = 1; // loop termination counter
$html = '<table>
            <tr>
                <th style="border:1px solid #000">ORIGINAL LETTERS</th>
                <th style="border:1px solid #000">UNICODE CHARACTER</th>
                <th style="border:1px solid #000">Description</th>
            </tr>';
            
for ($i = 0; $i < count($searchStringArray); $i++) {
    $search_string = $searchStringArray[$i];
    // loop to slice and append in array
    foreach ($fileData() as $line) {
        // $line contains current line
        if (preg_match_all("/{$search_string}/i", $line)) {
            // start slicing
            $slice_after = substr($line, 0, strpos($line, ' ) ')); // slice everything after )
            $slice_before = ltrim(stristr($slice_after, '('), '('); // slice everything upto (
            $first_char = substr($slice_before, 0, strpos($slice_before, "→")); // get every first character
            $split_Real_char = ltrim(stristr($search_string, 'LETTER'), 'LETTER'); // get every real character
            $real_Char = $output_string['original'] .= $split_Real_char; // append to array
            $split_Unicode_char = $output_string['uni-code'] .= $first_char . ','; // append to array
            $line_count++; // loop termination counter
            // loop termination
            if ($line_count == $final_count) {
                $html .= '  <tr>
                            <td style=" border:1px solid black;"><pre>' . $split_Real_char . '</pre></td>
                            <td style=" border:1px solid black;"><pre>' . $split_Unicode_char . '</pre></td>
                            <td style=" border:1px solid black;"><pre>' . $search_string . '</pre></td>
                        </tr>';
                $html .= '</table>';
                echo $html;
                break;
            }
        }
    }
}

然后我得到输出

| ORIGINAL LETTER      | UNICODE CHARACTER         | Description                 |
| -------------------- | ------------------------- | --------------------------- |
| B                    | A, 🄐, B, ℬ,  𝐵, ℻     | LATIN CAPITAL LETTER B      |

我得到了所有的Unicode字符,但原始字母和搜索字符串出现了问题。所有的Unicode字符不应该落在一个表单元格中,尽管循环运行多次,我只得到了一行。

预期输出

| ORIGINAL LETTER      | UNICODE CHARACTER | Description            |                 
| -------------------- | ------------------| ---------------------- |
| A                    | A, 🄐, ℻        | LATIN CAPITAL LETTER A |      
| B                    | B, ℬ,  𝐵        | LATIN CAPITAL LETTER B |

有什么建议我如何才能做到这一点?

0yycz8jy

0yycz8jy1#

回答我自己的问题。我能够使用一个函数和array_map()来让这个工作。

<?php

// html table to output data
$html_table = '<table>
                    <thead>
                        <tr>
                            <th style="border:1px solid #000">Original Letters</th>
                            <th style="border:1px solid #000">Unicode Characters</th>
                            <th style="border:1px solid #000">Description</th>
                        </tr>
                    </thead>
                    <tbody>';

function my_func($search_map)
{
    global $html_table;

    // read file
    $fileData = function () {
        $file = fopen(__DIR__ . './confusable.txt', 'r');

        if (!$file) {
            return; // die() is a bad practice, better to use return
        }
        while (($line = fgets($file)) !== false) {
            yield $line . '<br />';
        }

        fclose($file);
    };

    // store unicode in array
    $store_unicode_array = [
        'uni-code' => '',
    ];

    $initial_line_count = 1; // variable to count lines before we start slicing
    $final_count = 0; // final line count
    // loop to get final count
    foreach ($fileData() as $line) {
        // $line contains current line
        if (preg_match("/{$search_map}/i", $line)) {
            $initial_line_count++;
            $final_count = $initial_line_count;
        }
    }

    $line_count = 1; // loop termination counter
    foreach ($fileData() as $line) {
        if (preg_match("/{$search_map}/i", $line)) {
            $slice_after = substr($line, 0, strpos($line, ' ) ')); // slice everything after )
            $slice_before = ltrim(stristr($slice_after, '('), '('); // slice everything upto (
            $unicode_char = substr($slice_before, 0, strpos($slice_before, "→")); // get every first character
            $real_char = ltrim(stristr($search_map, 'LETTER'), 'LETTER'); // get every real character
            $store_unicode = $store_unicode_array['uni-code'] .= $unicode_char . ','; // append each unicode char to $store_unicode_array
            $remove_comma_unicode_char = substr($store_unicode, 0, -1); // remove comma after nth last character in store_unicode
            $line_count++; // increment loop termination counter
            if ($line_count == $final_count) {
                $html_table .= '<tr>
                                    <td style=" border:1px solid black;"><pre>' . $real_char . '</pre></td>
                                    <td style=" border:1px solid black;"><pre>' . $remove_comma_unicode_char . '</pre></td>
                                    <td style=" border:1px solid black;"><pre>' . $search_map . '</pre></td>
                                </tr>';
                break;
            }
        }
    }
}

// initializing empty array for storing search strings
$search_string_array = array();

// some part of search string
$search_string_part = 'LATIN CAPITAL LETTER ';

// generate search strings based on $search_string_part
foreach (range('A', 'Z') as $alphabet) {
    $alphabets = $search_string_part . $alphabet;
    array_push($search_string_array, $alphabets);
}

array_map("my_func", $search_string_array);
// foreach ($search_string_array as $element) {
//     my_func($element);
// }

$html_table .= '    </tbody>
                </table>';
echo $html_table;

输出

相关问题