Decoding Node.js Buffers: An In-Depth Exploration of Binary Data Handling
An In-Depth Exploration of Binary Data Handling Mastering Buffer Manipulation for Efficient Data Processing
To understand buffers, it's important to get the hang of binary data. Unlike our everyday decimal system with ten digits (0-9), computers operate using a binary system, which only has two digits: 1 and 0. This binary system is at the core of all computer operations.
Now, when your computer reads a file from its hard drive or sends data over the internet, it's dealing with binary data. However, JavaScript, while excellent at handling Unicode-encoded strings, isn't that great at handling pure binary data. That's where Node.js steps in, using something called "buffers" to manage binary data efficiently.
What is NodeJs Buffer?
A Buffer in Node.js is essentially a fixed-length array of hexadecimal numbers. Each element within a buffer can store 8 bits of data, ranging from 0 to 255. This limitation exists because a Buffer is a subclass of Uint8Array, which operates with these constraints. Buffer data is stored outside of the V8 heap.
Buffer Class
Node.js manages binary data through the Buffer class. This class encompasses all the necessary functionalities for working with buffer arrays. It offers various static methods such as from
, alloc
, allocUnsafe
, and a set of instance methods like write
, toString
, writeInt16BE
, and many more. from, alloc and allocUnsafe are methods to create a new buffer.
Buffer.from
Creates a new buffer that has a sequence of bytes. The decimal value of each element in the sequence has to be between 0-255 as an 8-bit unsigned integer value ranges between 0-255. The buffer, however, displays the hexadecimal numbers.
const new_buf = Buffer.from('Hello Developers!');
// <Buffer 48 65 6c 6c 6f 20 44 65 76 65 6c 6f 70 65 72 73 21>
Each element in the buffer can be converted into decimal and then binary to check the unit length of the binary. To do that we can write a for loop that iterates through each element.
for (b of new_buf) {
console.log(b, b.toString(2), b.toString(2).length);
}
/*
Dcimal Binary Units Length
72 1001000 7
101 1100101 7
108 1101100 7
108 1101100 7
111 1101111 7
32 100000 6
68 1000100 7
101 1100101 7
118 1110110 7
101 1100101 7
108 1101100 7
111 1101111 7
112 1110000 7
101 1100101 7
114 1110010 7
115 1110011 7
33 100001 6
*/
We can also create a buffer from array of numbers.
Buffer.from([72, 101, 108, 108, 111], 'utf8')
// <Buffer 48 65 6c>
If we convert this buffer to string then we will get the string Hello
Buffer.from([72, 101, 108, 108, 111], 'utf8').toString()
// 'Hello'
This illustrates the capability of Node.js to manage binary data through its Buffer class. Binary data stored in memory can be seamlessly transformed into various formats, such as strings or other data structures. It's important to note that hexadecimal representation in Buffers is used for visualization and doesn't alter the actual underlying data values.
Buffer.alloc
This method creates a new buffer of fixed length and fill the buffer withe value 0 by default.
Syntax: Buffer.alloc(length, value)
Buffer.alloc(10)
// <Buffer 00 00 00 00 00 00 00 00 00 00>
Buffer.alloc(10, 1)
// <Buffer 01 01 01 01 01 01 01 01 01 01>
Buffer.alloc makes sure that every element in the array is initialized.
Buffer.allocUnsafe
This method creates an uninitialized buffer. The newly created buffer may contain some unwanted previous data from the memory as buffer elements are not initialized.
console.log(Buffer.allocUnsafe(10000).toString('utf-8'))
/*
��TFZ�5pl/u#Y �C
*/
Every time a big buffer is created using allocUnsafe, it is extremely possible that some elements will have random data.
If we look at it from a low level, when we create a buffer, each element in the buffer is given a memory address. If we use allocUnsafe
to create the buffer, the data already present at the memory address isn't cleared. This makes allocUnsafe
faster than alloc
because it skips the extra step of initialization. However, it's important to note that using the safe process is recommended. Incorrectly handled data can cause significant issues down the road that can't be easily fixed.
buffer.write
This function writes a string into the buffer. If the string has more bytes than the buffer's array length can accommodate, only the part that fits within the buffer will be stored, and the excess will be ignored or rejected.
Syntax: buffer.write(string, offset, length, encoding)
write method returns the number of elements written/overwritten.
buffer.write('Hello, ')
// 7
buffer.write('World!', 7)
// 6
buffer.toString()
// Hello, World
buffer
// <Buffer 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21>
const short_buffer = Buffer.alloc(3)
// <Buffer 00 00 00>
short_buffer.write('Hello, ')
// 3 As buffer has only three elements
short_buffer.toString()
// 'Hel'
buffer.writeInt16BE
This function writes a value into the 'buf' at a specific position, following the big-endian format. The value being written must be a valid signed 16-bit integer. If the value is anything other than a signed 16-bit integer, what happens is not defined, meaning it might not work correctly.
Syntax: buffer.write(integer, offset)
const buff = Buffer.alloc(2)
// <Buffer 00 00>
buff.writeInt16BE(1000, 0)
// 2
buff
// <Buffer 03 e8>
parseInt(buff.toString('hex'), 16)
// 1000
buffer.toString
This function is used to convert the data in a 'buf' into a string, and it follows the rules of a specific character encoding mentioned as 'encoding.' You can also specify a starting point ('start') and an ending point ('end') to decode only a portion of the 'buf' rather than the entire thing.
For example, imagine 'buf' contains a sequence of bytes, and you want to extract only a part of it, like the text in a particular section of a book. You can use 'start' and 'end' to specify which part of the 'buf' you want to decode into a readable string using the specified character encoding.
Syntax: Buffer.toString(encoding, start, end)
const str_buff = Buffer.from('Hello Developers!')
// <Buffer 48 65 6c 6c 6f 20 44 65 76 65 6c 6f 70 65 72 73 21>
str_buff.toString()
// 'Hello Developers!'
str_buff.toString('hex')
// '48656c6c6f20446576656c6f7065727321'
str_buff.toString('base64')
// 'SGVsbG8gRGV2ZWxvcGVycyE='
buffer.buffer
Buffers can share memory as well. Shared memory is crucial in programming because it allows different parts of a program or even separate programs to efficiently exchange data and communicate with each other, making collaborative and parallel tasks easier to manage and faster to execute.
int_buf
// <Buffer 03 e8>
const buf2 = Buffer.from(int_buf.buffer)
// <Buffer 03 e8>
int_buf.writeInt16BE(1100)
// 2
int_buf
//<Buffer 04 4c>
buf2
//<Buffer 04 4c>