从2-3-4树中搜索带有电话号码的单词

问题描述

我在 2-3-4树中放有一个单词词典。单词的长度不同。我需要使用电话键盘找到所有可能的单词,这些单词可以响应特定的电话号码。给定键盘

keypad

例如,数字26678837可以是单词“ COmpuTER”,但也可以是另一个单词。鉴于我所有单词都在2-3-4树中的事实,什么是最好的算法或只是一种搜索方式,以便从给定的电话号码中查找所有可能的单词?

解决方法

这里的关键是要获取所有不同的字符组合,这些组合可以从该编号的从左到右顺序中与所提供编号相关的字符派生而来。显然,提供的数字越大,组合就越多。例如,您在帖子中提供的 26678837 数量可以生成 11,664 个唯一的字符组合(字符串)。

 2     6     6     7     8     8     3     7
abc   mno   mno   pqrs  tuv   tuv   def   pqrs

这11664个生成的字符串单词中的每个单词都需要通过字典进行排序,而字典本身可能包含170,000个(或更多)单词,并检查(不区分大小写)相等性。这不会是一个过快的过程,因为这相当于进行了将近20亿次平等检查。在我10岁的台式机Windows盒子上,这大约需要1分钟20秒。

在任何情况下,下面提供的方法都将使用提供的字符串号,生成不同的字符组合(字符词),并将每个字符与提供的字典(词)文本文件中的词进行比较。阅读方法附带的文档:

enter image description here

/**
 * Converts the supplied string telephone number (or any string of integer 
 * numbers) to possible words acquired from within the supplied dictionary 
 * text file.<br><br>
 * 
 * Specific digits relate to specific alpha characters as they do in a telephone
 * keypad,for example:<pre>
 * 
 *      1    2    3    4    5    6    7    8    9    0
 *          ABC  DEF  GHI  JKL  MNO  PQRS TUV  WXYZ
 * </pre>
 * 
 * Digits 1 and 0 do not relate to any alpha characters. By looking at the 
 * above scale you can see how a number like "26678837" can generate 11,664 
 * unique character combinations (string words). Each one of these combinations 
 * needs to be check against the supplied dictionary (or word) text file to see
 * if matches a specific dictionary word. If it does then we store that word 
 * and carry on to the next character combination. If the dictionary (word) 
 * text file contains 170,000 words and there are 11,664 different character 
 * combinations then a simple calculation (170,000 x 11,664) determines that 
 * there will need to be 1,982,880,000 (almost 2 billion) inspections done for 
 * equality. Because of this,many (sometimes thousands) of Executor Service 
 * threads are used to speed up the processing task. At the end of the day 
 * however,task speed completely relies on the speed of the Hard Disk Drive 
 * (HDD) and too many threads can actually degrade performance. It's a tough 
 * balance. Modify things as you see fit.
 * 
 * 
 * 
 * @param dictionaryFilePath (String) The path and file name of the dictionary 
 * file to gather possible words from. This dictionary (or word) file must 
 * contain a single word on each line of the file. There can not be more than 
 * one word on any given file line. Blank lines within the file are ignored. 
 * Letter case is always ignored during processing however any words found and 
 * returned will be in Upper Letter Case.<br><br>
 * 
 * If you don't currently have a dictionary or word text file available then 
 * you can easily acquire one from the Internet.<br>
 * 
 * @param telephoneNumber (String) The telephone number (or any number) to 
 * convert to a dictionary word. Any non-digit characters within the supplied 
 * string are automatically removed so that only digits remain. The longer 
 * the number string,the longer it takes to process this number to a word. 
 * There is no guarantee that the number you supply will produce a word from 
 * within the supplied dictionary (word) file.<br>
 * 
 * @param showProcess (boolean) If boolean 'true' is supplied then the processing 
 * is displayed within the console window. This however slows things down 
 * considerably but does give you an idea what is going on. Because larger 
 * string numbers can take some time to complete processing,a modal 'Please 
 * Wait' dialog with a progress bar is displayed in the middle of your screen. 
 * When processing has completed this dialog window is automatically closed.<br>
 * 
 * @return (String[] Array) A String[] Array of the word or words from the 
 * supplied dictionary (word) file which the supplied number correlates to. 
 */
public static String[] telephoneNumberToDictionaryWords(final String dictionaryFilePath,final String telephoneNumber,final boolean showProcess) {
    long timeStart = System.currentTimeMillis();
    if (dictionaryFilePath == null || dictionaryFilePath.isEmpty()) {
        System.err.println("The argument for dictionaryFilePath can not be null or empty!");
        return null;
    }
    else if (telephoneNumber == null || telephoneNumber.trim().isEmpty()) {
        System.err.println("The argument for telephoneNumber can not be null or empty!");
        return null;
    }
    else if (!new File(dictionaryFilePath).exists()) {
        System.err.println("File Not Found! ("
                + new File(dictionaryFilePath).getAbsolutePath() + ")");
        return null;
    }
    String digits = String.valueOf(telephoneNumber).replaceAll("[^\\d]","");
    if (digits.isEmpty()) {
        return null;
    }
    List<String> foundList = new ArrayList<>();
    
    // ================== The 'Please Wait' Dialog Window ==================
    // Depending on the number supplied,this process can take a while to 
    // complete therefore put a popup 'Please Wait...' window into play 
    // here.
    final javax.swing.JFrame iframe = new javax.swing.JFrame();
    iframe.setAlwaysOnTop(true);
    iframe.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
    iframe.setLocationRelativeTo(null);
    
    final JDialog workingDialog = new JDialog(iframe);
    workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.WAIT_CURSOR));
    final JPanel p1 = new JPanel(new BorderLayout());
    javax.swing.border.Border lineBorder = javax.swing.BorderFactory.createLineBorder(Color.red,4);
    p1.setBorder(lineBorder);
    p1.setBackground(Color.BLACK);
    p1.setPreferredSize(new java.awt.Dimension(255,150));
    
    final JLabel label = new JLabel();
    label.setHorizontalAlignment(JLabel.CENTER);
    label.setVerticalAlignment(JLabel.CENTER);
    label.setText("<html><font size='5',color=#14e5eb><b><center>Please Wait..."
                  + "</center></b></font><br><center>This can take a little "
                  + "while.</center></html>");
    label.setForeground(Color.WHITE);
    p1.add(label,BorderLayout.CENTER);
    
    final JLabel label2 = new JLabel();
    label2.setHorizontalAlignment(JLabel.CENTER);
    label2.setVerticalAlignment(JLabel.CENTER);
    label2.setText("Hello World");
    label2.setForeground(Color.WHITE);
    p1.add(label2,BorderLayout.BEFORE_FIRST_LINE);
           
    final javax.swing.JProgressBar progressBar = new javax.swing.JProgressBar();
    progressBar.setPreferredSize( new java.awt.Dimension (190,25));
    progressBar.setMaximumSize(new java.awt.Dimension(190,25));
    progressBar.setStringPainted(true);
    p1.add(progressBar,BorderLayout.SOUTH);
    
    workingDialog.setUndecorated(true);
    workingDialog.getContentPane().add(p1);
    workingDialog.pack();
    workingDialog.setLocationRelativeTo(iframe);
    workingDialog.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
    workingDialog.setModal(true);
    // =====================================================================

    SwingWorker<String,Void> worker = new SwingWorker<String,Void>() {
        @Override
        protected String doInBackground() throws InterruptedException {
            // Get the number of words contained within the supplied dictionary file.
            long numberOfDictionaryWords = 0L;
            // This will auto-close the stream.
            try (java.util.stream.Stream<String> linesStream
                    = java.nio.file.Files.lines(java.nio.file.Paths.get(dictionaryFilePath))) {
                numberOfDictionaryWords = linesStream.count();
            }
            catch (IOException ex) {
                System.err.println(ex);
            }
            if (showProcess) {
                System.out.println("The number of words in dictionary: --> " 
                                  + numberOfDictionaryWords);
            }

            // Map the alpha characters related to telephone digits
            HashMap<Character,String> map = new HashMap<>();
            map.put('0',"");
            map.put('1',"");
            map.put('2',"abc");
            map.put('3',"def");
            map.put('4',"ghi");
            map.put('5',"jkl");
            map.put('6',"mno");
            map.put('7',"pqrs");
            map.put('8',"tuv");
            map.put('9',"wxyz");

            // Get all the character combinations available for the
            // telelphone number (or just number) digits provided.
            List<String> phoneCharactersList = new ArrayList<>();
            phoneCharactersList.add("");
            for (int i = 0; i < digits.length(); i++) {
                if (digits.charAt(i) == '0' || digits.charAt(i) == '1') {
                    continue;
                }
                ArrayList<String> tempList = new ArrayList<>();

                String option = map.get(digits.charAt(i));

                for (int j = 0; j < phoneCharactersList.size(); j++) {
                    for (int p = 0; p < option.length(); p++) {
                        tempList.add(new StringBuilder(phoneCharactersList.get(j))
                                .append(option.charAt(p)).toString());
                    }
                }
                phoneCharactersList.clear();
                phoneCharactersList.addAll(tempList);
            }
            if (showProcess) {
                System.out.println("The number of generated words to process: --> "
                                  + phoneCharactersList.size());
            }
            
            progressBar.setMinimum(0);
            progressBar.setMaximum(phoneCharactersList.size());
            
            /* Iterate through all the character combinations now contained
               within the 'phoneCharactersList' and compare them to the words
               contained within the dictionary file. Store found words into 
               the foundList List Interface object. 
               Declare concurrent executor service threads (one for each possible 
               word to check for in dictionary file).      */
            java.util.concurrent.ExecutorService executor = 
                    java.util.concurrent.Executors.newFixedThreadPool(phoneCharactersList.size());
            for (int i = 0; i < phoneCharactersList.size(); i++) {
                String wordToFind = phoneCharactersList.get(i);
                label2.setText("<html><center>Processing word: <font color=yellow>" 
                               + wordToFind + "</font></center></html>");
                if (showProcess) {
                    System.out.println((i + 1) + ") Processing the possible word: " + wordToFind);
                }
                Runnable threadWorker = new Runnable() {
                    @Override
                    public void run() {
                        try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(dictionaryFilePath))) {
                            String line;
                            while ((line = br.readLine()) != null) {
                                line = line.trim();
                                if (line.isEmpty() || (line.length() != wordToFind.length())) {
                                    continue;
                                }
                                if (line.equalsIgnoreCase(wordToFind)) {
                                    foundList.add(wordToFind.toUpperCase());
                                    break;
                                }
                            }
                        }
                        catch (Exception ex) {
                            System.err.println(ex);
                        }
                    }
                };
                executor.execute(threadWorker);
                progressBar.setValue(i+1);
            }
            // Shut down Executor Service threads (when they are done)
            executor.shutdown();
            // Hold rest of code processing until are executors are terminated.
            while (!executor.isTerminated()) {
            }

            return null;
        }

        @Override
        protected void done() {
            workingDialog.dispose();
            iframe.dispose();
            workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.DEFAULT_CURSOR));
        }
    };
    worker.execute();
    workingDialog.setVisible(true);
    try {
        worker.get();
    }
    catch (Exception e1) {
        e1.printStackTrace();
    }
    
    java.util.Collections.sort(foundList);
    long timeEnd = System.currentTimeMillis();
    if (showProcess) {
        long seconds = (timeEnd - timeStart) / 1000;
        String timeSpan = new StringBuffer("").append(String.valueOf(seconds))
                                    .append(" seconds").toString();
        if (seconds > 60) {
            timeSpan = new StringBuffer("").append(String.valueOf(seconds/60))
                        .append(" minutes and ").append(String.valueOf(seconds%60))
                        .append(" seconds").toString();
        }
        System.out.println(new StringBuffer("It took ").append(timeSpan)
                .append(" to process the number: ").append(telephoneNumber)
                .toString());
    }
    return foundList.toArray(new String[foundList.size()]);
}

根据需要修改代码,以利用2-3-4树,B树或您喜欢的任何东西。要使用此方法,您可以这样做:

String phoneNumber = "26678837";
String[] words = telephoneNumberToDictionaryWords("dictionary.txt",phoneNumber,false);
if (words == null || words.length == 0) {
    System.out.println("There are no dictionary words available for "
            + "the Phone Number: " + phoneNumber);
}
else {
    for (String str : words) {
        System.out.println(str);
    }
}
,

Aho–Corasick_algorithm

在youtube上,我观看了一次非常有趣的Google采访,其中编码人员使用了这种算法。