Skip to content

Commit 18f931a

Browse files
committed
chore: Refactor with tests and classes
1 parent 663f372 commit 18f931a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+12554
-997
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,5 @@ target/
2222
*.war
2323
*.ear
2424
*.class
25+
26+
logs/file-compression.log

README.md

Lines changed: 70 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,80 @@
44
[![codecov](https://codecov.io/gh/ayonious/File-Compression/branch/master/graph/badge.svg)](https://codecov.io/gh/ayonious/File-Compression)
55
[![GitHub stars](https://img.shields.io/github/stars/ayonious/File-Compression?style=social)](https://github.com/ayonious/File-Compression/stargazers)
66

7-
A File Compression software that helps zip/Unzip files using these 2 algorihtms:
7+
A Java-based file compression application implementing two classic compression algorithms:
88

9-
1. Huffmans Code
10-
2. Lempel-Ziv-Wells algorithm
9+
1. **Huffman Coding** - Frequency-based compression
10+
2. **LZW (Lempel-Ziv-Welch)** - Dictionary-based compression
1111

12-
# About Huffmans Code
12+
## Compression Algorithms
1313

14-
The Huffmans algo creates a 1-1 mapping for each byte of the input file
15-
and replaces each byte with the mapped bit sequence. For this you need
16-
to store a dictionary that describes each 1-1 mapping of input byte and
17-
binary sequence.(which needs extraspace)
14+
### Huffman Coding
1815

19-
# About Lempel-Ziv-Wells
16+
**How it works:**
17+
Huffman coding is a **lossless compression algorithm** that assigns variable-length binary codes to characters based on their frequency. Characters that appear more frequently get shorter codes, while rare characters get longer codes.
2018

21-
Unlike Huffmans code LZW dont need an extra dictionary to be saved. Also
22-
LZW does not create a mapping to byte to bin sequence. It creates mapping
23-
of multiple byte to binary sequence.
19+
**Example:**
20+
```
21+
Input text: "aabbc"
22+
Frequency: a=2, b=2, c=1
23+
24+
Huffman codes assigned:
25+
a → 0
26+
b → 10
27+
c → 11
28+
29+
Compressed: "0 0 10 10 11" = "00101011" (8 bits)
30+
Original: 5 characters × 8 bits = 40 bits
31+
Compression ratio: 80% reduction
32+
```
33+
34+
**Key characteristics:**
35+
- Creates a frequency table and binary tree during compression
36+
- Requires storing the frequency table in the compressed file (overhead)
37+
- Works best with files that have **uneven character distribution**
38+
- Typical use cases: Text files, source code, log files
39+
40+
### LZW (Lempel-Ziv-Welch)
41+
42+
**How it works:**
43+
LZW is a **dictionary-based compression algorithm** that builds a dictionary of sequences on-the-fly. Instead of replacing individual bytes, it replaces repeated sequences of bytes with dictionary codes.
44+
45+
**Example:**
46+
```
47+
Input text: "ABABABA"
48+
49+
Initial dictionary (ASCII):
50+
256: A
51+
257: B
52+
...
53+
54+
Compression process:
55+
- Read 'A' → output 256, add "AB" to dictionary (512)
56+
- Read 'B' → output 257, add "BA" to dictionary (513)
57+
- Read "AB" (found in dict!) → output 512, add "ABA" to dictionary (514)
58+
- Read "ABA" (found in dict!) → output 514
59+
60+
Compressed: [256, 257, 512, 514]
61+
Original: 7 characters × 8 bits = 56 bits
62+
Compressed: 4 codes × ~9 bits = 36 bits
63+
Compression ratio: 35.7% reduction
64+
```
65+
66+
**Key characteristics:**
67+
- **No dictionary is stored** - both compressor and decompressor build the same dictionary
68+
- Replaces **sequences of bytes**, not individual bytes
69+
- Works best with files that have **repetitive patterns**
70+
- Typical use cases: Log files, structured data (JSON, XML), source code with repeated patterns
71+
72+
### Comparison
73+
74+
| Feature | Huffman Coding | LZW |
75+
|---------|---------------|-----|
76+
| **Method** | Frequency-based | Dictionary-based |
77+
| **Dictionary stored?** | Yes (frequency table) | No (built on-the-fly) |
78+
| **Best for** | Uneven character distribution | Repetitive patterns |
79+
| **Space overhead** | Medium (frequency table) | Low (no dictionary) |
80+
| **Example use case** | Text with skewed character frequency | Log files, structured data |
2481

2582
## Installation
2683

@@ -67,7 +124,7 @@ mvn exec:java
67124
### Using JAR directly
68125
After building with Maven, you can run the JAR:
69126
```bash
70-
java -jar target/file-compression-1.0-SNAPSHOT-jar-with-dependencies.jar
127+
java -jar target/file-compression-2.0-SNAPSHOT-jar-with-dependencies.jar
71128
```
72129

73130
![Outlook](/git_resource/outlook.png?raw=true "File Compression GUI")

logs/file-compression.log

Lines changed: 8975 additions & 0 deletions
Large diffs are not rendered by default.

pom.xml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>prog</groupId>
88
<artifactId>file-compression</artifactId>
9-
<version>1.0-SNAPSHOT</version>
9+
<version>2.0-SNAPSHOT</version>
1010

1111
<properties>
1212
<maven.compiler.source>21</maven.compiler.source>
@@ -17,12 +17,27 @@
1717
</properties>
1818

1919
<dependencies>
20+
<!-- JUnit for testing -->
2021
<dependency>
2122
<groupId>org.junit.jupiter</groupId>
2223
<artifactId>junit-jupiter</artifactId>
2324
<version>5.10.2</version>
2425
<scope>test</scope>
2526
</dependency>
27+
28+
<!-- SLF4J API for logging -->
29+
<dependency>
30+
<groupId>org.slf4j</groupId>
31+
<artifactId>slf4j-api</artifactId>
32+
<version>2.0.9</version>
33+
</dependency>
34+
35+
<!-- Logback Classic (includes logback-core) -->
36+
<dependency>
37+
<groupId>ch.qos.logback</groupId>
38+
<artifactId>logback-classic</artifactId>
39+
<version>1.4.14</version>
40+
</dependency>
2641
</dependencies>
2742

2843
<build>

0 commit comments

Comments
 (0)