Text Manipulation in Python
Sort, deduplicate, wrap, and analyze text files in Python — with practical one-liners and scripts.
Published:
Tags: text manipulation Python, Python text processing, Python sort deduplicate lines
Text Manipulation in Python Python's standard library covers virtually every text line operation — sorting, deduplication, wrapping, numbering, and analysis — without any external packages. The Python documentation for the module, [](https://docs.python.org/3/library/collections.html#collections.Counter), and module are the authoritative references for these operations. --- Reading and Writing Text Files All text manipulation starts with reading a file into memory: Use explicitly — never rely on the platform default encoding on Windows where it may be . Sorting Lines Removing Duplicate Lines Wrapping Text Adding Line Numbers Finding and Replacing Word Frequency Analysis Text Statistics Batch Processing Multiple Files Performance: pandas for Massive Files For files in the tens or hundreds…
Frequently Asked Questions
How do I sort lines in a file with Python?
Read the file with splitlines(), then call sorted(lines) for alphabetical order, sorted(lines, key=str.lower) for case-insensitive, or sorted(lines, key=len) for length order. Write back with '\n'.join(sorted_lines).
How do I remove duplicate lines in Python?
Use an ordered seen-set loop: seen = set(); result = [l for l in lines if not (l in seen or seen.add(l))]. This preserves the original line order while removing all subsequent occurrences of duplicates.
How do I find word frequency in Python?
Tokenize with re.findall(r'\b\w+\b', text.lower()), then pass the list to collections.Counter. Counter.most_common(20) returns the top 20 words with their counts.
How do I add line numbers to text in Python?
Use enumerate: numbered = '\n'.join(f'{i+1}. {line}' for i, line in enumerate(lines)). For zero-padded numbers, use f'{i+1:0{len(str(len(lines)))}}'.
How do I wrap text in Python?
Use textwrap.fill(text, width=80) for a single paragraph. For multiple paragraphs, split on double newlines first: '\n\n'.join(textwrap.fill(p, width=80) for p in text.split('\n\n')).
All articles · theproductguy.in