Remove duplicate lines in a file java
CPallini Sep am. Probably a LinkedHashSet is better suited for the purpose. Kornfeld Eliyahu Peter Sep am. In case the order is important Thank you! Add your solution here. OK Paste as. Treat my content as plain text, not as HTML. Existing Members Sign in to your account. This email is in use. Do you need your password? Submit your solution! When answering a question please: Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar. If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome. Don't tell someone to read the manual. Chances are they have and don't get it. Another creative approach is to append to each line the number of the line, then sort all the lines, remove the duplicates ignoring the last token that should be the number , and then sort again the file by the last token and striping it out in the output.
This would iterate through your whole file and only pass each unique occurrence once per sed call. This way you're not doing a bunch of searches you've done before. There are two scalable solutions, where by scalable I mean disk and not memory based, depending whether the procedure should be stable or not, where by stable I mean that the order after removing duplicates is the same. For the non stable solution, first sort the file on the disk. This is done by splitting the file into smaller files, sorting the smaller chunks in memory, and then merging the files in sorted order, where the merge ignores duplicates.
The merge itself can be done using almost no memory, by comparing only the current line in each file, since the next line is guaranteed to be greater. The stable solution is slightly trickier. First, sort the file in chunks as before, but indicate in each line the original line number.
Then, during the "merge" don't bother storing the result, just the line numbers to be deleted. Does it matter in which order the lines come, and how many duplicates are you counting on seeing? If not, and if you're counting on a lot of dupes i. Based on these assumptions solution is: 1. Save the list as the entry in hashmap for all the lines having that length mentioned in key. Building this hashmap is O n. While mapping the offsets for each line in the hashmap,compare the line blobs with all existing entries in the list of lines offsets for this key length except the entry -1 as offset.
Since we assume we can compare blob , the m does not matter. That was worst case. In other cases we save on comparisons although we will have little extra space required in hashmap. Additionally we can use mapreduce on server side to split the set and merge results later.
And using length or start of line as the mapper key. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Deleting duplicate lines in a file using Java Ask Question. Asked 12 years, 7 months ago. Active 11 months ago. Viewed 47k times. I need an integrated solution for this, however. Preferably in Java. Any ideas? Peter Lawrey k 73 73 gold badges silver badges bronze badges.
Monster Monster 1, 5 5 gold badges 23 23 silver badges 35 35 bronze badges. Add a comment. Active Oldest Votes. Depending on the questioner's requirements, you may need to keep track of the line number, because iterating over a HashSet will return the lines in a pretty arbitrary order. Another possibility would be to create the HashSet with a fillgrade of 1. In this tutorial we will go over steps on how to remove duplicates from a CSV file and any other file. Hope you find this Java program useful find duplicate lines in CSV or any other file.
If you liked this article, then please share it on social media. Still have any questions about an article, leave us a comment. Signup for news, latest articles and special offers!! Additional menu. Crunchify , LLC. BufferedReader ;. FileNotFoundException ;. FileReader ;. IOException ;. HashSet ;. This class permits the null element. Processed line : name city number age.
0コメント