@Akisamb

Akisamb@programming.dev · 2 months ago

They’ve got thunderbird which is as far as I know the only serious alternative to outlook.

Akisamb@programming.dev · 4 months ago

Now instead of just querying the goddamn database, a one line fucking SQL statement, I have to deal with the user team

Exactly, you understand very well the purpose of microservices. You can submit a patch if you need that feature now.

Funnily enough I’m the technical lead of the team that handles the user service in an insurance company.

Due to direct access to our data without consulting us, we’re getting legal issues as people were using addresses to guess where people lived instead of using our endpoints.

I guess some people really hate the validation that service layers have.

Akisamb@programming.dev · 6 months ago

This is not true in France. Politicians that have proven fraud are arrested and charged. In France we have Sarkozy, Cahuzac, Fillon that were all charged with crimes.

They were president, minister and presidential candidate respectively. I’d be surprised if it was different in the USA. I’m seeing that trump is also being charged, the system seems to be working.

Akisamb@programming.dev · 8 months ago

They gave them a birth control shot without properly informing them of what it was. Still scandalous, but not what you are saying.

Akisamb@programming.dev · 9 months ago

Yes to your question, but that’s not what I was saying.

Here is one of the most popular training datasets : https://pile.eleuther.ai/

If you look at the pdf describing the dataset, you’ll find the mean length of these documents to be somewhat short with mean length being less than 20kb (20000 characters) for most documents.

You are asking for a model to retain a memory for the whole duration of a discussion, which can be very long. If I chat for one hour I’ll type approximately 8400 words, or around 42KB. Longer than most documents in the training set. If I chat for 20 hours, It’ll be longer than almost all the documents in the training set. The model needs to learn how to extract information from a long context and it can’t do that well if the documents on which it trained are short.

You are also right that during training the text is cut off. A value I often see is 2k to 8k tokens. This is arbitrary, some models are trained with a cut off of 200k tokens. You can use models on context lengths longer than that what they were trained on (with some caveats) but performance falls of badly.

Akisamb@programming.dev · 9 months ago

There are two issues with large prompts. One is linked to the current language technology, were the computation time and memory usage scale badly with prompt size. This is being solved by projects such as RWKV or mamba, but these remain unproven at large sizes (more than 100 billion parameters). Somebody will have to spend some millions to train one.

The other issue will probably be harder to solve. There is less high quality long context training data. Most datasets were created for small context models.

Akisamb@programming.dev · 1 year ago

Even if 99% of it would evaporate that would still be a ridiculous amount of power.

But Bill Gates proved that diversifying a stock of mainly one company while having that company keep all its value is possible. Elon Musk is horrifyingly rich like it or not. His power and the damage he can do is huge.